KNighter: Transforming Static Analysis with LLM-Synthesized Checkers

August 29, 2025 · View on GitHub

About
Getting Started
- Docker Setup (Recommended)
- Manual Environment Setup (Alternative)
Running KNighter
Architecture Documentation

About

KNighter is an innovative checker synthesis tool that leverages Large Language Models (LLMs) to automatically generate static analysis checkers from historical patch commits.

Key Features

🤖 LLM-Powered Generation: Automatically synthesizes static analysis checkers using state-of-the-art language models
📊 Multi-step Pipeline: Employs a sophisticated generation → refinement → triage workflow for high-quality results
🔍 Historical Learning: Learns from real-world patch commits to understand common bug patterns
⚡ LLVM Integration: Built on top of LLVM for robust static analysis capabilities
🐧 Linux Kernel Focus: Specialized for finding bugs in large-scale C/C++ codebases like the Linux kernel

The detected bugs 🐛 can be found here.

Important

We are continuously improving the documentation and adding new features. Please stay tuned for updates.

Getting Started

Docker Setup (Recommended)

🐳 Docker Installation Options

Option 1: Docker Hub (Recommended)

docker pull knighterhub/knighter

Option 2: Build from Source

git clone https://github.com/ise-uiuc/KNighter.git KNighter
cd KNighter

docker build -t knighter .

🚀 Running the Container

# Pull from Docker Hub
docker run -it knighterhub/knighter

# Build from source
docker run -it knighter

⚙️ Environment Initialization

When running the container for the first time, initialize the environment:

cd /app
# This would take a while to download the dependencies and compile the LLVM
python3 scripts/init_docker.py

This downloads LLVM and Linux kernel source code into /data/llvm and /data/linux.

API Key Configuration:

echo 'openai_key: "YOUR_OPENAI_API_KEY"' > /app/llm_keys.yaml

Manual Environment Setup (Alternative)

Note: For detailed setup steps, refer to scripts/init_docker.py which contains the complete initialization process.

🔧 Manual Installation Steps

Step 1: Install Dependencies

Download and build LLVM-18.1.8:

wget https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-18.1.8.zip
unzip llvmorg-18.1.8.zip

Git clone the Linux kernel source code:

git clone https://github.com/torvalds/linux.git

Install Python dependencies:

# Option 1: Using uv (recommended for faster installs)
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.cargo/env
uv pip install -r requirements.txt

# Option 2: Using regular pip
pip3 install -r requirements.txt

git submodule update --init --recursive

Step 2: Configuration Files

Set up your config.yaml (see scripts/init_docker.py for reference):

result_dir: "result-checkers"
LLVM_dir: "/PATH/TO/LLVM_DIR"
checker_nums: 10
linux_dir: "/PATH/TO/LINUX_DIR"
key_file: "llm_keys.yaml"
model: "o3-mini"

Set up the llm_keys.yaml file (see llm_keys_example.yaml for reference):

openai_key: "sk-..."
claude_key: "sk-ant-..."
google_key: "AIza..."
deepseek_key: "sk-..."

# For local models (optional)
# In config, use "local:model_name" format to use local models
# Like "local:openai/gpt-oss-120b"
base_url: "http://localhost:8000/v1"
api_key: "dummy"

Step 3: LLVM Setup

python3 scripts/setup_llvm.py LLVM_PATH

Running KNighter

Quick Start (Docker)

For rapid evaluation, use the debug dataset:

cd /app/src

# Step 1: Generate checkers for debug commits
python3 main.py gen --config_file /app/config-generate.yaml --commit_file=/app/commits/commits-debug.txt

# Step 2: Refine generated checkers
python3 main.py refine --config_file /app/config-refine-debug.yaml /app/result-generate

# Step 3: Triage and analyze results
python3 main.py triage --config_file /app/config-triage-debug.yaml /app/result-refine-debug

📋 Pipeline Modes & Usage

Available Operation Modes:

Mode	Purpose	Description
`gen`	Generation	Generate new checkers from commit patches
`refine`	Refinement	Improve and validate generated checkers
`scan`	Scanning	Scan the kernel with validated checkers
`triage`	Analysis	Analyze and categorize scan results

Basic Usage (Manual Setup):

cd src
python3 main.py <mode> --commit_file=<commits.txt> --config_file=<config.yaml>

Example:

python3 main.py gen --commit_file=../commits/commits-selected.txt --config_file=config.yaml

⚙️ Configuration Files

File	Purpose	Key Parameters
`config-generate.yaml`	Checker generation	`model`, `checker_nums`, `result_dir`
`config-refine.yaml`	Refinement process	`jobs`, `scan_timeout`, `scan_commit`
`config-triage.yaml`	Result analysis	Analysis parameters

Modify these files to experiment with different parameters from the paper evaluation.

Architecture Documentation

🏗️ System Architecture Overview

KNighter implements a multi-stage pipeline for automated checker synthesis:

Commit Analysis: Extract bug patterns from historical patches
Checker Generation: Use LLMs to synthesize static analysis checkers
Refinement: Validate and improve generated checkers through compilation and testing
Deployment: Apply refined checkers to target codebases
Triage: Analyze and categorize detected issues

For comprehensive architecture documentation, see ARCHITECTURE.md.

Citation: If you use KNighter in your research, please cite our paper:

@inproceedings{knighter,
    title = {KNighter: Transforming Static Analysis with LLM-Synthesized Checkers},
    author = {Yang, Chenyuan and Zhao, Zijie and Xie, Zichen and Li, Haoyu and Zhang, Lingming},
    year = {2025},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3731569.3764827},
    doi = {10.1145/3731569.3764827},
    booktitle = {Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles},
    location = {Seoul, Republic of Korea},
    series = {SOSP '25}
}