KNighter: Transforming Static Analysis with LLM-Synthesized Checkers

August 29, 2025 ยท View on GitHub

Framework

Table of Contents

About

KNighter is an innovative checker synthesis tool that leverages Large Language Models (LLMs) to automatically generate static analysis checkers from historical patch commits.

Key Features

  • ๐Ÿค– LLM-Powered Generation: Automatically synthesizes static analysis checkers using state-of-the-art language models
  • ๐Ÿ“Š Multi-step Pipeline: Employs a sophisticated generation โ†’ refinement โ†’ triage workflow for high-quality results
  • ๐Ÿ” Historical Learning: Learns from real-world patch commits to understand common bug patterns
  • โšก LLVM Integration: Built on top of LLVM for robust static analysis capabilities
  • ๐Ÿง Linux Kernel Focus: Specialized for finding bugs in large-scale C/C++ codebases like the Linux kernel

The detected bugs ๐Ÿ› can be found here.

Important

We are continuously improving the documentation and adding new features. Please stay tuned for updates.

Getting Started

๐Ÿณ Docker Installation Options
docker pull knighterhub/knighter

Option 2: Build from Source

git clone https://github.com/ise-uiuc/KNighter.git KNighter
cd KNighter

docker build -t knighter .
๐Ÿš€ Running the Container
# Pull from Docker Hub
docker run -it knighterhub/knighter

# Build from source
docker run -it knighter
โš™๏ธ Environment Initialization

When running the container for the first time, initialize the environment:

cd /app
# This would take a while to download the dependencies and compile the LLVM
python3 scripts/init_docker.py

This downloads LLVM and Linux kernel source code into /data/llvm and /data/linux.

API Key Configuration:

echo 'openai_key: "YOUR_OPENAI_API_KEY"' > /app/llm_keys.yaml

Manual Environment Setup (Alternative)

Note: For detailed setup steps, refer to scripts/init_docker.py which contains the complete initialization process.

๐Ÿ”ง Manual Installation Steps

Step 1: Install Dependencies

Download and build LLVM-18.1.8:

wget https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-18.1.8.zip
unzip llvmorg-18.1.8.zip

Git clone the Linux kernel source code:

git clone https://github.com/torvalds/linux.git

Install Python dependencies:

# Option 1: Using uv (recommended for faster installs)
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.cargo/env
uv pip install -r requirements.txt

# Option 2: Using regular pip
pip3 install -r requirements.txt

git submodule update --init --recursive

Step 2: Configuration Files

Set up your config.yaml (see scripts/init_docker.py for reference):

result_dir: "result-checkers"
LLVM_dir: "/PATH/TO/LLVM_DIR"
checker_nums: 10
linux_dir: "/PATH/TO/LINUX_DIR"
key_file: "llm_keys.yaml"
model: "o3-mini"

Set up the llm_keys.yaml file (see llm_keys_example.yaml for reference):

openai_key: "sk-..."
claude_key: "sk-ant-..."
google_key: "AIza..."
deepseek_key: "sk-..."

# For local models (optional)
# In config, use "local:model_name" format to use local models
# Like "local:openai/gpt-oss-120b"
base_url: "http://localhost:8000/v1"
api_key: "dummy"

Step 3: LLVM Setup

python3 scripts/setup_llvm.py LLVM_PATH

Running KNighter

Quick Start (Docker)

For rapid evaluation, use the debug dataset:

cd /app/src

# Step 1: Generate checkers for debug commits
python3 main.py gen --config_file /app/config-generate.yaml --commit_file=/app/commits/commits-debug.txt

# Step 2: Refine generated checkers
python3 main.py refine --config_file /app/config-refine-debug.yaml /app/result-generate

# Step 3: Triage and analyze results
python3 main.py triage --config_file /app/config-triage-debug.yaml /app/result-refine-debug
๐Ÿ“‹ Pipeline Modes & Usage

Available Operation Modes:

ModePurposeDescription
genGenerationGenerate new checkers from commit patches
refineRefinementImprove and validate generated checkers
scanScanningScan the kernel with validated checkers
triageAnalysisAnalyze and categorize scan results

Basic Usage (Manual Setup):

cd src
python3 main.py <mode> --commit_file=<commits.txt> --config_file=<config.yaml>

Example:

python3 main.py gen --commit_file=../commits/commits-selected.txt --config_file=config.yaml
โš™๏ธ Configuration Files
FilePurposeKey Parameters
config-generate.yamlChecker generationmodel, checker_nums, result_dir
config-refine.yamlRefinement processjobs, scan_timeout, scan_commit
config-triage.yamlResult analysisAnalysis parameters

Modify these files to experiment with different parameters from the paper evaluation.

Architecture Documentation

๐Ÿ—๏ธ System Architecture Overview

KNighter implements a multi-stage pipeline for automated checker synthesis:

  1. Commit Analysis: Extract bug patterns from historical patches
  2. Checker Generation: Use LLMs to synthesize static analysis checkers
  3. Refinement: Validate and improve generated checkers through compilation and testing
  4. Deployment: Apply refined checkers to target codebases
  5. Triage: Analyze and categorize detected issues

For comprehensive architecture documentation, see ARCHITECTURE.md.


Citation: If you use KNighter in your research, please cite our paper:

@inproceedings{knighter,
    title = {KNighter: Transforming Static Analysis with LLM-Synthesized Checkers},
    author = {Yang, Chenyuan and Zhao, Zijie and Xie, Zichen and Li, Haoyu and Zhang, Lingming},
    year = {2025},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3731569.3764827},
    doi = {10.1145/3731569.3764827},
    booktitle = {Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles},
    location = {Seoul, Republic of Korea},
    series = {SOSP '25}
}