AI Agent Guide: Writing Coccinelle CVE Detection Rules

November 18, 2025 · View on GitHub

This guide is specifically designed for AI agents to systematically create accurate CVE detection rules for the CVEhound project. Follow this structured approach for consistent, high-quality results.

Mission

Create a Coccinelle semantic patch (.cocci file) that detects a specific Linux kernel CVE by matching vulnerable code patterns or identifying missing security fixes.

Prerequisites Checklist

Before writing a rule, gather this information:

CVE ID (format: CVE-YYYY-NNNNN)
Fix commit hash from Linux kernel git repository
Commit that introduced the bug (optional but helpful)
Complete diff of the vulnerable commit (git show <hash>)
Affected file paths (relative to kernel root)
Understanding of what makes the code vulnerable

Decision Tree: Detection Strategy

START: Analyze the fix commit diff
│
├─ Does the fix ADD new code?
│  │
│  ├─ YES → Use "Missing Fix Detection"
│  │        Pattern: Check for ABSENCE of the new code
│  │        Example: Missing validation check, missing initialization
│  │
│  └─ NO → Continue to next question
│
├─ Does the fix CHANGE a value (constant, flag, permission)?
│  │
│  ├─ YES → Use "Unfixed Code Detection"
│  │        Pattern: Match the OLD (vulnerable) value
│  │        Example: return 0444 → return 0400
│  │
│  └─ NO → Continue to next question
│
├─ Does the fix REMOVE code?
│  │
│  ├─ YES → Use "Unfixed Code Detection"
│  │        Pattern: Detect presence of removed code
│  │        Example: Removed vulnerable driver
│  │
│  └─ NO → Continue to next question
│
└─ Does the fix REFACTOR or CHANGE logic?
   │
   └─ YES → Use "Unfixed Code Detection" or "Hybrid Approach"
            Pattern: Match distinctive vulnerable pattern
            May require multiple rules with dependencies

Standard Rule Template

Every rule MUST follow this structure:

/// Files: <space-separated list of affected files>
/// Fix: <40-char git commit hash>
/// Fixes: <commit-hash> OR Detect-To: <commit-hash>

virtual detect

@err@
position p;
@@

<pattern>

@script:python depends on detect@
p << err.p;
@@

coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')

Required Metadata Fields

Files: (REQUIRED)

/// Files: drivers/net/wireless/ath/ath9k/htc_drv_main.c
/// Files: net/bluetooth/a2mp.c net/bluetooth/mgmt.c

Use relative paths from kernel root
Space-separated for multiple files
Must match actual file paths in affected kernel versions

Fix: (REQUIRED)

/// Fix: 1b5e2ed9cf6d86a4a0c563bf5c31f48e6d7e53fc

40-character full commit hash
The commit that fixed the vulnerability
Get from CVEhound's internal database (generated by update_metadata.py)

Fixes: or Detect-To: (OPTIONAL)

/// Fixes: 4e7c22d447bb6d7e37bfe39ff658486ae78e8d77
/// Detect-To: 6b44d9b8d96b37f72ccd7335b32f386a67b7f1f4

Fixes: Commit that introduced the vulnerability (if explicitly known)
Detect-To: Used when the vulnerable commit is not marked explicitly in the commit message, or when we can only guess which commit is vulnerable. We assume the rule should detect the vulnerability up to this commit

Pattern Construction Rules

Rule 1: Always Use Position Markers

@err@
position p;        // REQUIRED: Declare position variable
@@

* vulnerable_code@p(...);  // REQUIRED: Mark with @p

Rule 2: Mark Vulnerable Lines with Asterisk (Optional)

* return 0444;@p         // With asterisk for debugging
  return 0444;@p         // Without asterisk also works

Note: Asterisks (*) are optional and only serve debugging purposes for pattern matching.

Rule 3: Use Appropriate Metavariables

identifier func;     // For function/variable names (unknown)
symbol kfree;        // For specific known names (faster)
expression E;        // For any expression
statement S;         // For any statement
type T;              // For any type
position p;          // For location tracking (required)

Rule 4: Context Scoping

Always provide enough context to avoid false positives:

// BAD: Too generic
@err@
position p;
@@

* return -1;@p

// GOOD: Specific context
@err@
position p;
@@

vulnerable_function(...)
{
    ...
*   return -1;@p
}

Rule 5: Use Ellipsis Correctly

func(...)              // Match any function arguments
{
    ...                // Match 0 or more statements
    code();
    ...                // More statements
}

Rule 6: Apply Constraints with `when`

@err@
identifier var;
position p;
@@

func(...)
{
    struct foo var;
    ... when != memset(&var, 0, sizeof(var));    // Must NOT have init
        when != var = ...;                        // Must NOT be assigned
*   use_var(&var)@p;
}

Common Vulnerability Pattern Templates

Template 1: Uninitialized Variable

Use when: Fix adds memset() or initialization

@err@
identifier var;
position p;
@@

FUNCTION_NAME(...)
{
    struct STRUCT_TYPE var;
    ... when != memset(&var, 0, sizeof(var));
*   USAGE_FUNCTION(..., &var, ...)@p;
}

Template 2: Missing NULL/Bounds Check

Use when: Fix adds validation check

@err@
identifier var;
position p;
@@

FUNCTION_NAME(...)
{
    ... when != if (var >= MAX_VALUE) ...
        when != if (var < 0 || var >= MAX_VALUE) ...
*   USAGE_CONTEXT[var]@p;
}

Template 3: Incorrect Return Value

Use when: Fix changes a returned constant

@err@
position p;
@@

FUNCTION_NAME(...)
{
*   return OLD_VALUE;@p
}

Template 4: Missing Function Call

Use when: Fix adds a required function call

@has_call@
@@

FUNCTION_NAME(...)
{
    ...
    REQUIRED_CALL(...);
    ...
}

@err depends on !has_call@
position p;
@@

* FUNCTION_NAME@p(...)
{
    ...
}

Template 5: Use-After-Free

Use when: Fix adds checks after free

@err@
identifier x;
position p1, p2;
@@

* kfree@p1(x);
  ... when != x = ...
* x->field@p2

Template 6: Information Leak

Use when: Fix adds memset before copy_to_user

@err@
identifier var;
position p;
@@

FUNCTION_NAME(...)
{
    struct STRUCT_TYPE var;
    ... when != memset(&var, 0, sizeof(var));
*   copy_to_user(..., &var, ...)@p;
}

Template 7: Incorrect Permission

Use when: Fix changes permission bits

@err@
position p;
@@

sysfs_function(...)
{
*   return VULNERABLE_MODE;@p
}

Template 8: Race Condition (Missing Lock)

Use when: Fix adds locking around critical section

@locked@
@@

FUNCTION_NAME(...)
{
    spin_lock(...);
    ...
    spin_unlock(...);
}

@err depends on !locked@
position p;
@@

FUNCTION_NAME(...)
{
*   shared_variable@p = ...;
}

Systematic Development Process

Step 1: Analyze the Vulnerable Commit

# Get the diff showing the vulnerability fix
git show <fix_commit_hash>

# Understand what changed
# - Lines with - are old (vulnerable) code
# - Lines with + are new (fixed) code

Step 2: Identify Key Pattern

Extract the minimal distinguishing pattern:

Example Analysis:

// From git diff
-    return 0444;
+    return 0400;

Key Pattern: return 0444; in specific function

Step 3: Write Initial Rule

Start with the basic template and fill in:

/// Files: drivers/hwmon/amd_energy.c
/// Fix: 60268b0e8258fdea9a3c9f4b51e161c123571db3

virtual detect

@err@
position p;
@@

amd_energy_is_visible(...)
{
*   return 0444;@p
}

@script:python depends on detect@
p << err.p;
@@

coccilib.report.print_report(p[0], 'ERROR: CVE-2020-12912')

Step 4: Validate the Rule

# Test on vulnerable version (should detect)
git checkout <commit_before_fix>
spatch --no-includes --include-headers -D detect \
    --cocci-file CVE-YYYY-NNNNN.cocci \
    <affected_file>

# Test on fixed version (should NOT detect)
git checkout <fix_commit>
spatch --no-includes --include-headers -D detect \
    --cocci-file CVE-YYYY-NNNNN.cocci \
    <affected_file>

Step 5: Refine for Accuracy

If false positives occur:

Add more context (function name, surrounding code)
Add when constraints
Use rule dependencies

If false negatives occur:

Simplify pattern
Use exists constraint
Add alternative patterns with \(alt1\|alt2\)

Advanced Pattern Techniques

Technique 1: Function Alternatives

When the same bug exists in multiple related functions:

@err@
position p;
@@

\(function1\|function2\|function3\)(...)
{
*   vulnerable_pattern@p;
}

Technique 2: Rule Dependencies

When detection requires multiple conditions:

@prerequisite@
@@

feature_function(...)
{
    ...
}

@err depends on prerequisite@
position p;
@@

* vulnerable_code@p(...);

Technique 3: Multiple Detection Points

When vulnerability appears in multiple locations:

@err@
position p;
@@

(
* vulnerable_call1@p(...);
|
* vulnerable_call2@p(...);
|
* vulnerable_call3@p(...);
)

@script:python depends on detect@
p << err.p;
@@

coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')

Technique 4: Repetitive Patterns

When the same pattern repeats across functions (see CVE-2020-12352):

@err_func1 exists@
identifier req;
position p;
@@

func1(...)
{
    ...
    struct type req;
    ... when != memset(&req, 0, sizeof(req));
*   send(..., &req)@p;
    ...
}

@err_func2 exists@
identifier rsp;
position p;
@@

func2(...)
{
    ...
    struct type rsp;
    ... when != memset(&rsp, 0, sizeof(rsp));
*   send(..., &rsp)@p;
    ...
}

// Repeat for each affected function

@script:python depends on detect@
p << err_func1.p;
@@

coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')

@script:python depends on detect@
p << err_func2.p;
@@

coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')

Testing Protocol

Test 1: Positive Detection

# Rule MUST detect on vulnerable code
cd /path/to/kernel
git checkout <commit_before_fix>
spatch --no-includes --include-headers -D detect \
    --very-quiet --cocci-file CVE-YYYY-NNNNN.cocci \
    <affected_file>

# Expected: Should print "ERROR: CVE-YYYY-NNNNN"

Test 2: Negative Detection (Fixed Code)

# Rule MUST NOT detect on fixed code
git checkout <fix_commit>
spatch --no-includes --include-headers -D detect \
    --very-quiet --cocci-file CVE-YYYY-NNNNN.cocci \
    <affected_file>

# Expected: No output

Test 3: False Positive Check

# Test on unrelated kernel files
spatch --no-includes --include-headers -D detect \
    --cocci-file CVE-YYYY-NNNNN.cocci \
    <unrelated_file>

# Expected: No output

Test 4: CVEhound Integration

cvehound --kernel /path/to/kernel --cve CVE-YYYY-NNNNN

# Expected: Should find CVE on vulnerable versions only

Common Mistakes to Avoid

Mistake 1: Missing Position Marker

// WRONG
@err@
@@

vulnerable_code();

// CORRECT
@err@
position p;
@@

vulnerable_code@p();

Note: The asterisk (*) is optional for detection.

Mistake 2: Too Generic Pattern

// WRONG: Will match everywhere
@err@
position p;
@@

* return -1;@p

// CORRECT: Specific function context
@err@
position p;
@@

specific_function(...)
{
    ...
*   return -1;@p
}

Mistake 3: Wrong Position Syntax

// WRONG: Multiple positions on same line
* vulnerable_code@p1()@p2;

// CORRECT: One position per statement
* vulnerable_code@p();

Mistake 4: Incorrect Python Script

// WRONG: Wrong variable name
@script:python depends on detect@
p << wrong_rule.p;
@@

// CORRECT: Must match rule name
@err@
position p;
@@

* code@p;

@script:python depends on detect@
p << err.p;    // Matches @err@ rule name
@@

coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')

Mistake 5: Forgetting `virtual detect`

// WRONG: Missing virtual declaration
/// Files: foo.c
/// Fix: abc123

@err@

// CORRECT: Always include
/// Files: foo.c
/// Fix: abc123

virtual detect

@err@

File Naming Convention

Format: CVE-YYYY-NNNNN.cocci

YYYY: 4-digit year
NNNNN: 4-7 digit CVE number
Extension: Always .cocci
No prefixes or suffixes

Examples:

CVE-2020-12912.cocci ✓
CVE-2016-5195.cocci ✓
cve-2020-12912.cocci ✗ (wrong case)
CVE-2020-12912.patch ✗ (wrong extension)
rule_CVE-2020-12912.cocci ✗ (has prefix)

Quality Metrics

A high-quality rule should:

Accuracy: 100% detection on vulnerable code, 0% on fixed code
Specificity: Minimal false positives on unrelated code
Simplicity: As simple as possible while remaining accurate
Robustness: Works across different kernel versions
Documentation: Clear metadata explaining the vulnerability
Performance: Executes quickly (avoid complex nested patterns)

Repository Integration

File Placement

cvehound/cve/CVE-YYYY-NNNNN.cocci

For Disputed CVEs

cvehound/cve/disputed/CVE-YYYY-NNNNN.cocci

Testing Integration

Add test case to tests/test_03_on_fix.py:

@pytest.mark.parametrize("cve,kernel,commit", [
    ("CVE-YYYY-NNNNN", "torvalds", "fix_commit_short"),
])
def test_cve_on_fix(cve, kernel, commit):
    # Automatically tests that CVE is NOT detected on fixed commit
    pass

Execution Model

CVEhound executes rules using:

spatch \
    --no-includes \           # Don't process #include directives
    --include-headers \       # But do process header files
    -D detect \               # Enable "detect" virtual mode
    --no-show-diff \          # Don't show diffs
    --very-quiet \            # Minimal output
    --cocci-file <rule> \     # Your rule file
    -I <kernel_includes> \    # Kernel include paths
    <target_file>             # File to check

Your rule's output goes to stdout and is parsed by CVEhound.

Learning from Examples

When creating a new rule, find similar CVEs:

Find Similar Vulnerability Types

cd cvehound/cve
grep -l "memset" *.cocci          # Initialization bugs
grep -l "copy_to_user" *.cocci    # Information leaks
grep -l "when != if" *.cocci      # Missing checks
grep -l "return 0" *.cocci        # Return value bugs

Study Complex Rules

# Large comprehensive rules
wc -l *.cocci | sort -n | tail -5

# Rules with dependencies
grep -l "depends on" *.cocci

# Rules with alternatives
grep -l "\\\\(" *.cocci

Decision Matrix: Choosing Metavariables

Need to Match	Use This	Example
Unknown function name	`identifier func`	`identifier func;`
Specific function	`symbol func_name`	`symbol kfree;`
Any expression	`expression E`	`expression E;`
Any statement	`statement S`	`statement S;`
Any type	`type T`	`type T;`
Location for reporting	`position p`	`position p;`
Specific constant	literal	`0444`, `-EINVAL`
Any integer	`constant C`	`constant C;`

Pattern Complexity Guidelines

Simple Pattern (Preferred)

Single rule
Direct pattern matching
Minimal context
Fast execution

@err@
position p;
@@

function(...)
{
*   return 0444;@p
}

Medium Pattern

2-3 rules with simple dependencies
Some constraints
Moderate context

@has_feature@
@@

init_function(...)

@err depends on has_feature@
position p;
@@

* usage@p(...);

Complex Pattern (Use When Necessary)

Multiple rules with dependencies
Complex constraints
Alternative patterns
Multiple detection points

Only use complex patterns when:

The vulnerability requires checking multiple conditions
Simple patterns produce too many false positives
The CVE affects many similar functions (like CVE-2020-12352)

Output Format

Your rule should produce exactly this format:

file.c:123: ERROR: CVE-YYYY-NNNNN

The coccilib.report.print_report() function handles this automatically.

Version Compatibility

CVEhound supports Coccinelle >= 1.0.7. If your rule requires newer features:

/// Files: foo.c
/// Fix: abc123
/// Version: 1.0.8

virtual detect
...

Summary: Quick Start Workflow

Gather info: CVE ID, fix commit hash, affected files
Analyze diff: git show <fix_commit>
Choose strategy: Missing fix vs unfixed code detection
Pick template: Use appropriate vulnerability pattern template
Write rule: Fill in template with specific patterns
Test positive: Verify detection on vulnerable code
Test negative: Verify no detection on fixed code
Refine: Adjust for false positives/negatives
Document: Ensure all metadata is complete
Integrate: Place in cvehound/cve/ directory

Resources

Human-oriented guide: docs/WRITING_RULES.md
Quick reference: docs/COCCINELLE_CHEATSHEET.md
Enhanced template: contrib/template.cocci
Minimal template: contrib/blank.cocci
Example rules: cvehound/cve/*.cocci
Coccinelle docs: https://coccinelle.gitlabpages.inria.fr/website/docs/

Agent Optimization Note: This guide is designed for systematic execution. Follow the decision trees and templates exactly for consistent results. When in doubt, study similar CVEs in the repository and replicate their approach.