Working on scip-clang
February 17, 2026 ยท View on GitHub
- Install dependencies
- Building
- Running tests
- Formatting
- IDE integration
- Debugging
- Profiling
- Publishing releases
- Implementation notes
- Notes on Clang internals
Install dependencies
- Bazelisk: This handles Bazel versions transparently.
- (Linux only) On Ubuntu, install
libc6-devfor system headers likefeature.h.
Bazel manages the C++ toolchain and other tool dependencies like formatters, so they don't need to be downloaded separately.
Building
(The dev config is for local development.)
bazel build //... --config=dev
The indexer binary will be placed at bazel-bin/indexer/scip-clang.
Running the indexer
Example invocation for a CMake project:
# This will generate a compilation database under build/
# See https://clang.llvm.org/docs/JSONCompilationDatabase.html
cmake -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=ON <args>
# Invoked scip-clang from the project root (not the build root)
path/to/scip-clang --compdb-path build/compile_commands.json
Consult --help for user-facing flags, and --help-all for both user-facing and internal flags.
Running tests
Run all tests:
bazel test //test --config=dev
Update snapshot tests:
bazel test //test:update --config=dev
NOTE: When adding a new test case, you need to manually create
an empty .snapshot.cc file for recording snapshot output
(it's not automatically generated).
Examples of running subsets of tests (follows directory structure):
bazel test //test:test_index --config=dev
bazel test //test:test_index_aliases --config=dev
bazel test //test:update_index --config=dev
bazel test //test:update_index_aliases --config=dev
Indexing large projects
At the moment, we don't have any integration testing jobs which index large projects in CI. Before making a release, we typically manually test the indexer against one or more projects (instructions).
Formatting
Run ./tools/reformat.sh to reformat code and config files.
IDE integration
Run bazel run //tools:refresh_compile_commands to generate a compilation database
at the root of the repository. It will be automatically
picked up by clangd-based editor extensions (you may
need to reload the editor).
Debugging
Stacktraces
The default modes of ASan and UBSan do not print stack traces on failures.
I recommend maintaining a parallel build of LLVM
at the same commit as in fetch_deps.bzl.
Both sanitizers need access to llvm-symbolizer to print stack traces,
which can provided via the separate build.
# For ASan
ASAN_SYMBOLIZER_PATH="$PWD/../llvm-project/build/bin/llvm-symbolizer" ASAN_OPTIONS=symbolize=1 <scip-clang invocation>
# For UBSan
PATH="$PWD/../llvm-project/build/bin:$PATH" UBSAN_OPTIONS=print_stacktrace=1 <scip-clang invocation>
Anecdotally, on macOS, this can take 10s+ the first time around, so don't hit Ctrl+C if UBSan seems to be stuck.
Attaching a debugger
In the default mode of operation, the worker which runs semantic analysis and emits the index, runs in a separate process and performs IPC to communicate with the driver. This makes using a debugger tedious.
If you want to attach a debugger, run the worker directly instead.
- First, run the original
scip-clanginvocation with--log-level=debugand a short timeout (say--receive-timeout-seconds=10). This will print job ids (<compdb-index>.<subtask-index>) around when a task is being processed. - Subset out the original compilation database using
jqor similar.jq '[.[<compdb-index>]]' compile_commands.json > bad.json - Run
scip-clang --worker-mode=compdb --compdb-path bad.json(the originalscip-clanginvocation will have printed more arguments which were passed to the worker, but most of them should be unnecessary).
If you have not used LLDB before, check out this LLDB cheat sheet.
Debugging on Linux
There is a VM setup script available to configure a GCP VM for building scip-clang. We recommend using Ubuntu 20.04+ with 16 cores or more.
Testing CUDA/NVCC support
There is a CUDA-specific VM setup script which installs the CUDA SDK. Use it in a GCP VM which has a GPU attached.
You may need to restart your shell for changes to take effect.
Inspecting Clang ASTs
Print the AST nodes:
clang -Xclang -ast-dump file.c
clang -Xclang -ast-dump=json file.c
Another option is to use clang-query (tutorial).
NOTE: If running the above on CUDA code
leads to a Clang error suggesting that CUDA could not be found,
it's likely that the code is ill-formed. Adding flags like
-nocudainc or -nocudalib (sometimes suggested by Clang) will
lead to CUDAKernelCallExpr values not being parsed properly.
Automated test case reduction
In case of a crash, it may be possible to automatically reduce it using C-Reduce.
Important:
On macOS, use brew install --HEAD creduce,
as the default version is very outdated.
There is a helper script tools/reduce.py
which can coordinate scip-clang and creduce,
since correctly handling different kinds of paths in a compilation database
is a bit finicky in the general case.
It can be invoked like so:
# Pre-conditions:
# 1. CWD is project root
# 2. bad.json points to a compilation database with a single entry
# known to cause the crash
/path/to/tools/reduce.py bad.json
After completion, a path to a reduced C++ file will be printed out which still reproduces the crash.
See the script's --help text for information about additional flags.
Debugging preprocessor issues
The LLVM monorepo contains a tool pp-trace which can be used to understand the preprocessor callbacks being invoked without having to resort to print debugging inside scip-clang itself.
First, build pp-trace from source in your LLVM checkout,
making sure to include clang-tools-extra in LLVM_ENABLE_PROJECTS.
After that, it can be invoked like so:
# -isysroot is needed for pp-trace to find standard library headers
/path/to/llvm-project/build/bin/pp-trace mytestfile.cpp --extra-arg="-isysroot" --extra-arg="$(xcrun --show-sdk-path)" > pp-trace.yaml
See the pp-trace docs
or the --help text for information about other supported flags.
One can check that the structure of the YAML file matches what we expect
bazel build //tools:analyze_pp_trace
./bazel-bin/tools/analyze_pp_trace --yaml-path pp-trace.yaml
Debugging using a local Clang checkout
Sometimes, the best way to debug something is to be able to put print statements
inside Clang itself. For that, you can stub out the usage of llvm-raw in fetch_deps.bzl
# Comment out the corresponding http_archive call
native.new_local_repository(
name = "llvm-raw",
path = "/home/me/code/llvm-project",
build_file_content = "# empty",
)
After that, add print debugging statements inside Clang (e.g. using llvm::errs() <<),
and rebuild scip-clang like usual.
Profiling
Stack sampling
One can create flamegraphs using Brendan Gregg's flamegraph docs.
Two caveats on macOS:
- Invoking
dtracerequiressudo. - Once the stacks are folded, running
sed -e 's/scip-clang`//g'over the result should clean up the output a bit.
On macOS, if Xcode is installed, one can use xctrace for profiling.
Here's an example invocation:
xctrace record --template 'Time Profiler' --time-limit 60s --attach 'pid' --output out.trace
The resulting out.trace can be opened using Instruments.app.
Tracing using Perfetto
First, build the Perfetto tools from source in a separate directory.
git clone https://android.googlesource.com/platform/external/perfetto -b v33.1 && cd perfetto
tools/install-build-deps
tools/gn gen --args='is_debug=false' out/x
tools/ninja -C out/x tracebox traced traced_probes perfetto
Make sure that scip-clang is built in release mode
(using --config=release). In two different TTYs (e.g. tmux panes or iTerm tabs),
start traced and perfetto respectively:
# Terminal 1
out/x/traced
# Terminal 2
out/x/perfetto \
--txt --config ~/Code/scip-clang/tools/long_trace.pbtx \
--out "trace_$(date '+%Y-%m-%d_%H:%M:%S').pb"
Run the scip-clang invocation as usual in a separate terminal.
Once the scip-clang invocation ends,
kill the running perfetto process,
to flush any buffered data.
Open the saved trace file using the online Perfetto UI.
Publishing releases
- Manually double-check that indexing works on one or more large projects.
- Land a PR with the following:
- Once the PR is merged to main, run:
NEW_VERSION="vM.N.P" bash -c 'git checkout main && git tag "$NEW_VERSION" && git push origin "$NEW_VERSION"'
The release workflow can also be triggered against any branch in a "dry run" mode using the GitHub Actions UI.
Implementation notes
Some useful non-indexer specific logic is adapted from the Sorbet
codebase and is marked with a NOTE(ref: based-on-sorbet).
In particular, we reuse the infrastructure for ENFORCE macros,
which are essentially assertions which are instrumented so
that the cost can be measured easily.
We could technically have used assert,
but having a separate macro makes it easier to change
the behavior in scip-clang exclusively, whereas there is a
greater chance of mistakes if we want to separate out the
cost of assertions in Clang itself vs in our code.
Notes on Clang internals
See docs/SourceLocation.md for information about how source locations are handled in Clang.