AI Agent Guide: Writing Coccinelle CVE Detection Rules
November 18, 2025 · View on GitHub
This guide is specifically designed for AI agents to systematically create accurate CVE detection rules for the CVEhound project. Follow this structured approach for consistent, high-quality results.
Mission
Create a Coccinelle semantic patch (.cocci file) that detects a specific Linux kernel CVE by matching vulnerable code patterns or identifying missing security fixes.
Prerequisites Checklist
Before writing a rule, gather this information:
- CVE ID (format: CVE-YYYY-NNNNN)
- Fix commit hash from Linux kernel git repository
- Commit that introduced the bug (optional but helpful)
- Complete diff of the vulnerable commit (
git show <hash>) - Affected file paths (relative to kernel root)
- Understanding of what makes the code vulnerable
Decision Tree: Detection Strategy
START: Analyze the fix commit diff
│
├─ Does the fix ADD new code?
│ │
│ ├─ YES → Use "Missing Fix Detection"
│ │ Pattern: Check for ABSENCE of the new code
│ │ Example: Missing validation check, missing initialization
│ │
│ └─ NO → Continue to next question
│
├─ Does the fix CHANGE a value (constant, flag, permission)?
│ │
│ ├─ YES → Use "Unfixed Code Detection"
│ │ Pattern: Match the OLD (vulnerable) value
│ │ Example: return 0444 → return 0400
│ │
│ └─ NO → Continue to next question
│
├─ Does the fix REMOVE code?
│ │
│ ├─ YES → Use "Unfixed Code Detection"
│ │ Pattern: Detect presence of removed code
│ │ Example: Removed vulnerable driver
│ │
│ └─ NO → Continue to next question
│
└─ Does the fix REFACTOR or CHANGE logic?
│
└─ YES → Use "Unfixed Code Detection" or "Hybrid Approach"
Pattern: Match distinctive vulnerable pattern
May require multiple rules with dependencies
Standard Rule Template
Every rule MUST follow this structure:
/// Files: <space-separated list of affected files>
/// Fix: <40-char git commit hash>
/// Fixes: <commit-hash> OR Detect-To: <commit-hash>
virtual detect
@err@
position p;
@@
<pattern>
@script:python depends on detect@
p << err.p;
@@
coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')
Required Metadata Fields
Files: (REQUIRED)
/// Files: drivers/net/wireless/ath/ath9k/htc_drv_main.c
/// Files: net/bluetooth/a2mp.c net/bluetooth/mgmt.c
- Use relative paths from kernel root
- Space-separated for multiple files
- Must match actual file paths in affected kernel versions
Fix: (REQUIRED)
/// Fix: 1b5e2ed9cf6d86a4a0c563bf5c31f48e6d7e53fc
- 40-character full commit hash
- The commit that fixed the vulnerability
- Get from CVEhound's internal database (generated by update_metadata.py)
Fixes: or Detect-To: (OPTIONAL)
/// Fixes: 4e7c22d447bb6d7e37bfe39ff658486ae78e8d77
/// Detect-To: 6b44d9b8d96b37f72ccd7335b32f386a67b7f1f4
- Fixes: Commit that introduced the vulnerability (if explicitly known)
- Detect-To: Used when the vulnerable commit is not marked explicitly in the commit message, or when we can only guess which commit is vulnerable. We assume the rule should detect the vulnerability up to this commit
Pattern Construction Rules
Rule 1: Always Use Position Markers
@err@
position p; // REQUIRED: Declare position variable
@@
* vulnerable_code@p(...); // REQUIRED: Mark with @p
Rule 2: Mark Vulnerable Lines with Asterisk (Optional)
* return 0444;@p // With asterisk for debugging
return 0444;@p // Without asterisk also works
Note: Asterisks (*) are optional and only serve debugging purposes for pattern matching.
Rule 3: Use Appropriate Metavariables
identifier func; // For function/variable names (unknown)
symbol kfree; // For specific known names (faster)
expression E; // For any expression
statement S; // For any statement
type T; // For any type
position p; // For location tracking (required)
Rule 4: Context Scoping
Always provide enough context to avoid false positives:
// BAD: Too generic
@err@
position p;
@@
* return -1;@p
// GOOD: Specific context
@err@
position p;
@@
vulnerable_function(...)
{
...
* return -1;@p
}
Rule 5: Use Ellipsis Correctly
func(...) // Match any function arguments
{
... // Match 0 or more statements
code();
... // More statements
}
Rule 6: Apply Constraints with when
@err@
identifier var;
position p;
@@
func(...)
{
struct foo var;
... when != memset(&var, 0, sizeof(var)); // Must NOT have init
when != var = ...; // Must NOT be assigned
* use_var(&var)@p;
}
Common Vulnerability Pattern Templates
Template 1: Uninitialized Variable
Use when: Fix adds memset() or initialization
@err@
identifier var;
position p;
@@
FUNCTION_NAME(...)
{
struct STRUCT_TYPE var;
... when != memset(&var, 0, sizeof(var));
* USAGE_FUNCTION(..., &var, ...)@p;
}
Template 2: Missing NULL/Bounds Check
Use when: Fix adds validation check
@err@
identifier var;
position p;
@@
FUNCTION_NAME(...)
{
... when != if (var >= MAX_VALUE) ...
when != if (var < 0 || var >= MAX_VALUE) ...
* USAGE_CONTEXT[var]@p;
}
Template 3: Incorrect Return Value
Use when: Fix changes a returned constant
@err@
position p;
@@
FUNCTION_NAME(...)
{
* return OLD_VALUE;@p
}
Template 4: Missing Function Call
Use when: Fix adds a required function call
@has_call@
@@
FUNCTION_NAME(...)
{
...
REQUIRED_CALL(...);
...
}
@err depends on !has_call@
position p;
@@
* FUNCTION_NAME@p(...)
{
...
}
Template 5: Use-After-Free
Use when: Fix adds checks after free
@err@
identifier x;
position p1, p2;
@@
* kfree@p1(x);
... when != x = ...
* x->field@p2
Template 6: Information Leak
Use when: Fix adds memset before copy_to_user
@err@
identifier var;
position p;
@@
FUNCTION_NAME(...)
{
struct STRUCT_TYPE var;
... when != memset(&var, 0, sizeof(var));
* copy_to_user(..., &var, ...)@p;
}
Template 7: Incorrect Permission
Use when: Fix changes permission bits
@err@
position p;
@@
sysfs_function(...)
{
* return VULNERABLE_MODE;@p
}
Template 8: Race Condition (Missing Lock)
Use when: Fix adds locking around critical section
@locked@
@@
FUNCTION_NAME(...)
{
spin_lock(...);
...
spin_unlock(...);
}
@err depends on !locked@
position p;
@@
FUNCTION_NAME(...)
{
* shared_variable@p = ...;
}
Systematic Development Process
Step 1: Analyze the Vulnerable Commit
# Get the diff showing the vulnerability fix
git show <fix_commit_hash>
# Understand what changed
# - Lines with - are old (vulnerable) code
# - Lines with + are new (fixed) code
Step 2: Identify Key Pattern
Extract the minimal distinguishing pattern:
Example Analysis:
// From git diff
- return 0444;
+ return 0400;
Key Pattern: return 0444; in specific function
Step 3: Write Initial Rule
Start with the basic template and fill in:
/// Files: drivers/hwmon/amd_energy.c
/// Fix: 60268b0e8258fdea9a3c9f4b51e161c123571db3
virtual detect
@err@
position p;
@@
amd_energy_is_visible(...)
{
* return 0444;@p
}
@script:python depends on detect@
p << err.p;
@@
coccilib.report.print_report(p[0], 'ERROR: CVE-2020-12912')
Step 4: Validate the Rule
# Test on vulnerable version (should detect)
git checkout <commit_before_fix>
spatch --no-includes --include-headers -D detect \
--cocci-file CVE-YYYY-NNNNN.cocci \
<affected_file>
# Test on fixed version (should NOT detect)
git checkout <fix_commit>
spatch --no-includes --include-headers -D detect \
--cocci-file CVE-YYYY-NNNNN.cocci \
<affected_file>
Step 5: Refine for Accuracy
If false positives occur:
- Add more context (function name, surrounding code)
- Add
whenconstraints - Use rule dependencies
If false negatives occur:
- Simplify pattern
- Use
existsconstraint - Add alternative patterns with
\(alt1\|alt2\)
Advanced Pattern Techniques
Technique 1: Function Alternatives
When the same bug exists in multiple related functions:
@err@
position p;
@@
\(function1\|function2\|function3\)(...)
{
* vulnerable_pattern@p;
}
Technique 2: Rule Dependencies
When detection requires multiple conditions:
@prerequisite@
@@
feature_function(...)
{
...
}
@err depends on prerequisite@
position p;
@@
* vulnerable_code@p(...);
Technique 3: Multiple Detection Points
When vulnerability appears in multiple locations:
@err@
position p;
@@
(
* vulnerable_call1@p(...);
|
* vulnerable_call2@p(...);
|
* vulnerable_call3@p(...);
)
@script:python depends on detect@
p << err.p;
@@
coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')
Technique 4: Repetitive Patterns
When the same pattern repeats across functions (see CVE-2020-12352):
@err_func1 exists@
identifier req;
position p;
@@
func1(...)
{
...
struct type req;
... when != memset(&req, 0, sizeof(req));
* send(..., &req)@p;
...
}
@err_func2 exists@
identifier rsp;
position p;
@@
func2(...)
{
...
struct type rsp;
... when != memset(&rsp, 0, sizeof(rsp));
* send(..., &rsp)@p;
...
}
// Repeat for each affected function
@script:python depends on detect@
p << err_func1.p;
@@
coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')
@script:python depends on detect@
p << err_func2.p;
@@
coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')
Testing Protocol
Test 1: Positive Detection
# Rule MUST detect on vulnerable code
cd /path/to/kernel
git checkout <commit_before_fix>
spatch --no-includes --include-headers -D detect \
--very-quiet --cocci-file CVE-YYYY-NNNNN.cocci \
<affected_file>
# Expected: Should print "ERROR: CVE-YYYY-NNNNN"
Test 2: Negative Detection (Fixed Code)
# Rule MUST NOT detect on fixed code
git checkout <fix_commit>
spatch --no-includes --include-headers -D detect \
--very-quiet --cocci-file CVE-YYYY-NNNNN.cocci \
<affected_file>
# Expected: No output
Test 3: False Positive Check
# Test on unrelated kernel files
spatch --no-includes --include-headers -D detect \
--cocci-file CVE-YYYY-NNNNN.cocci \
<unrelated_file>
# Expected: No output
Test 4: CVEhound Integration
cvehound --kernel /path/to/kernel --cve CVE-YYYY-NNNNN
# Expected: Should find CVE on vulnerable versions only
Error Prevention Checklist
Before finalizing, verify:
- File naming:
CVE-YYYY-NNNNN.cocci(exact format) - All metadata fields present (Files, Fix)
- Position variable declared:
position p; - Vulnerable lines marked with
*and@p - Python script references correct position:
p << err.p; - CVE ID correct in error message
- Rule tested on vulnerable code (detects)
- Rule tested on fixed code (does not detect)
- No syntax errors:
spatch --parse-cocci CVE-YYYY-NNNNN.cocci - Proper context to avoid false positives
- Follows patterns from similar CVEs in repository
Common Mistakes to Avoid
Mistake 1: Missing Position Marker
// WRONG
@err@
@@
vulnerable_code();
// CORRECT
@err@
position p;
@@
vulnerable_code@p();
Note: The asterisk (*) is optional for detection.
Mistake 2: Too Generic Pattern
// WRONG: Will match everywhere
@err@
position p;
@@
* return -1;@p
// CORRECT: Specific function context
@err@
position p;
@@
specific_function(...)
{
...
* return -1;@p
}
Mistake 3: Wrong Position Syntax
// WRONG: Multiple positions on same line
* vulnerable_code@p1()@p2;
// CORRECT: One position per statement
* vulnerable_code@p();
Mistake 4: Incorrect Python Script
// WRONG: Wrong variable name
@script:python depends on detect@
p << wrong_rule.p;
@@
// CORRECT: Must match rule name
@err@
position p;
@@
* code@p;
@script:python depends on detect@
p << err.p; // Matches @err@ rule name
@@
coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')
Mistake 5: Forgetting virtual detect
// WRONG: Missing virtual declaration
/// Files: foo.c
/// Fix: abc123
@err@
// CORRECT: Always include
/// Files: foo.c
/// Fix: abc123
virtual detect
@err@
File Naming Convention
Format: CVE-YYYY-NNNNN.cocci
- YYYY: 4-digit year
- NNNNN: 4-7 digit CVE number
- Extension: Always
.cocci - No prefixes or suffixes
Examples:
CVE-2020-12912.cocci✓CVE-2016-5195.cocci✓cve-2020-12912.cocci✗ (wrong case)CVE-2020-12912.patch✗ (wrong extension)rule_CVE-2020-12912.cocci✗ (has prefix)
Quality Metrics
A high-quality rule should:
- Accuracy: 100% detection on vulnerable code, 0% on fixed code
- Specificity: Minimal false positives on unrelated code
- Simplicity: As simple as possible while remaining accurate
- Robustness: Works across different kernel versions
- Documentation: Clear metadata explaining the vulnerability
- Performance: Executes quickly (avoid complex nested patterns)
Repository Integration
File Placement
cvehound/cve/CVE-YYYY-NNNNN.cocci
For Disputed CVEs
cvehound/cve/disputed/CVE-YYYY-NNNNN.cocci
Testing Integration
Add test case to tests/test_03_on_fix.py:
@pytest.mark.parametrize("cve,kernel,commit", [
("CVE-YYYY-NNNNN", "torvalds", "fix_commit_short"),
])
def test_cve_on_fix(cve, kernel, commit):
# Automatically tests that CVE is NOT detected on fixed commit
pass
Execution Model
CVEhound executes rules using:
spatch \
--no-includes \ # Don't process #include directives
--include-headers \ # But do process header files
-D detect \ # Enable "detect" virtual mode
--no-show-diff \ # Don't show diffs
--very-quiet \ # Minimal output
--cocci-file <rule> \ # Your rule file
-I <kernel_includes> \ # Kernel include paths
<target_file> # File to check
Your rule's output goes to stdout and is parsed by CVEhound.
Learning from Examples
When creating a new rule, find similar CVEs:
Find Similar Vulnerability Types
cd cvehound/cve
grep -l "memset" *.cocci # Initialization bugs
grep -l "copy_to_user" *.cocci # Information leaks
grep -l "when != if" *.cocci # Missing checks
grep -l "return 0" *.cocci # Return value bugs
Study Complex Rules
# Large comprehensive rules
wc -l *.cocci | sort -n | tail -5
# Rules with dependencies
grep -l "depends on" *.cocci
# Rules with alternatives
grep -l "\\\\(" *.cocci
Decision Matrix: Choosing Metavariables
| Need to Match | Use This | Example |
|---|---|---|
| Unknown function name | identifier func | identifier func; |
| Specific function | symbol func_name | symbol kfree; |
| Any expression | expression E | expression E; |
| Any statement | statement S | statement S; |
| Any type | type T | type T; |
| Location for reporting | position p | position p; |
| Specific constant | literal | 0444, -EINVAL |
| Any integer | constant C | constant C; |
Pattern Complexity Guidelines
Simple Pattern (Preferred)
- Single rule
- Direct pattern matching
- Minimal context
- Fast execution
@err@
position p;
@@
function(...)
{
* return 0444;@p
}
Medium Pattern
- 2-3 rules with simple dependencies
- Some constraints
- Moderate context
@has_feature@
@@
init_function(...)
@err depends on has_feature@
position p;
@@
* usage@p(...);
Complex Pattern (Use When Necessary)
- Multiple rules with dependencies
- Complex constraints
- Alternative patterns
- Multiple detection points
Only use complex patterns when:
- The vulnerability requires checking multiple conditions
- Simple patterns produce too many false positives
- The CVE affects many similar functions (like CVE-2020-12352)
Output Format
Your rule should produce exactly this format:
file.c:123: ERROR: CVE-YYYY-NNNNN
The coccilib.report.print_report() function handles this automatically.
Version Compatibility
CVEhound supports Coccinelle >= 1.0.7. If your rule requires newer features:
/// Files: foo.c
/// Fix: abc123
/// Version: 1.0.8
virtual detect
...
Summary: Quick Start Workflow
- Gather info: CVE ID, fix commit hash, affected files
- Analyze diff:
git show <fix_commit> - Choose strategy: Missing fix vs unfixed code detection
- Pick template: Use appropriate vulnerability pattern template
- Write rule: Fill in template with specific patterns
- Test positive: Verify detection on vulnerable code
- Test negative: Verify no detection on fixed code
- Refine: Adjust for false positives/negatives
- Document: Ensure all metadata is complete
- Integrate: Place in
cvehound/cve/directory
Resources
- Human-oriented guide:
docs/WRITING_RULES.md - Quick reference:
docs/COCCINELLE_CHEATSHEET.md - Enhanced template:
contrib/template.cocci - Minimal template:
contrib/blank.cocci - Example rules:
cvehound/cve/*.cocci - Coccinelle docs: https://coccinelle.gitlabpages.inria.fr/website/docs/
Agent Optimization Note: This guide is designed for systematic execution. Follow the decision trees and templates exactly for consistent results. When in doubt, study similar CVEs in the repository and replicate their approach.