AI Agent Guide: Writing Coccinelle CVE Detection Rules

November 18, 2025 · View on GitHub

This guide is specifically designed for AI agents to systematically create accurate CVE detection rules for the CVEhound project. Follow this structured approach for consistent, high-quality results.

Mission

Create a Coccinelle semantic patch (.cocci file) that detects a specific Linux kernel CVE by matching vulnerable code patterns or identifying missing security fixes.

Prerequisites Checklist

Before writing a rule, gather this information:

  • CVE ID (format: CVE-YYYY-NNNNN)
  • Fix commit hash from Linux kernel git repository
  • Commit that introduced the bug (optional but helpful)
  • Complete diff of the vulnerable commit (git show <hash>)
  • Affected file paths (relative to kernel root)
  • Understanding of what makes the code vulnerable

Decision Tree: Detection Strategy

START: Analyze the fix commit diff

├─ Does the fix ADD new code?
│  │
│  ├─ YES → Use "Missing Fix Detection"
│  │        Pattern: Check for ABSENCE of the new code
│  │        Example: Missing validation check, missing initialization
│  │
│  └─ NO → Continue to next question

├─ Does the fix CHANGE a value (constant, flag, permission)?
│  │
│  ├─ YES → Use "Unfixed Code Detection"
│  │        Pattern: Match the OLD (vulnerable) value
│  │        Example: return 0444 → return 0400
│  │
│  └─ NO → Continue to next question

├─ Does the fix REMOVE code?
│  │
│  ├─ YES → Use "Unfixed Code Detection"
│  │        Pattern: Detect presence of removed code
│  │        Example: Removed vulnerable driver
│  │
│  └─ NO → Continue to next question

└─ Does the fix REFACTOR or CHANGE logic?

   └─ YES → Use "Unfixed Code Detection" or "Hybrid Approach"
            Pattern: Match distinctive vulnerable pattern
            May require multiple rules with dependencies

Standard Rule Template

Every rule MUST follow this structure:

/// Files: <space-separated list of affected files>
/// Fix: <40-char git commit hash>
/// Fixes: <commit-hash> OR Detect-To: <commit-hash>

virtual detect

@err@
position p;
@@

<pattern>

@script:python depends on detect@
p << err.p;
@@

coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')

Required Metadata Fields

Files: (REQUIRED)

/// Files: drivers/net/wireless/ath/ath9k/htc_drv_main.c
/// Files: net/bluetooth/a2mp.c net/bluetooth/mgmt.c
  • Use relative paths from kernel root
  • Space-separated for multiple files
  • Must match actual file paths in affected kernel versions

Fix: (REQUIRED)

/// Fix: 1b5e2ed9cf6d86a4a0c563bf5c31f48e6d7e53fc
  • 40-character full commit hash
  • The commit that fixed the vulnerability
  • Get from CVEhound's internal database (generated by update_metadata.py)

Fixes: or Detect-To: (OPTIONAL)

/// Fixes: 4e7c22d447bb6d7e37bfe39ff658486ae78e8d77
/// Detect-To: 6b44d9b8d96b37f72ccd7335b32f386a67b7f1f4
  • Fixes: Commit that introduced the vulnerability (if explicitly known)
  • Detect-To: Used when the vulnerable commit is not marked explicitly in the commit message, or when we can only guess which commit is vulnerable. We assume the rule should detect the vulnerability up to this commit

Pattern Construction Rules

Rule 1: Always Use Position Markers

@err@
position p;        // REQUIRED: Declare position variable
@@

* vulnerable_code@p(...);  // REQUIRED: Mark with @p

Rule 2: Mark Vulnerable Lines with Asterisk (Optional)

* return 0444;@p         // With asterisk for debugging
  return 0444;@p         // Without asterisk also works

Note: Asterisks (*) are optional and only serve debugging purposes for pattern matching.

Rule 3: Use Appropriate Metavariables

identifier func;     // For function/variable names (unknown)
symbol kfree;        // For specific known names (faster)
expression E;        // For any expression
statement S;         // For any statement
type T;              // For any type
position p;          // For location tracking (required)

Rule 4: Context Scoping

Always provide enough context to avoid false positives:

// BAD: Too generic
@err@
position p;
@@

* return -1;@p

// GOOD: Specific context
@err@
position p;
@@

vulnerable_function(...)
{
    ...
*   return -1;@p
}

Rule 5: Use Ellipsis Correctly

func(...)              // Match any function arguments
{
    ...                // Match 0 or more statements
    code();
    ...                // More statements
}

Rule 6: Apply Constraints with when

@err@
identifier var;
position p;
@@

func(...)
{
    struct foo var;
    ... when != memset(&var, 0, sizeof(var));    // Must NOT have init
        when != var = ...;                        // Must NOT be assigned
*   use_var(&var)@p;
}

Common Vulnerability Pattern Templates

Template 1: Uninitialized Variable

Use when: Fix adds memset() or initialization

@err@
identifier var;
position p;
@@

FUNCTION_NAME(...)
{
    struct STRUCT_TYPE var;
    ... when != memset(&var, 0, sizeof(var));
*   USAGE_FUNCTION(..., &var, ...)@p;
}

Template 2: Missing NULL/Bounds Check

Use when: Fix adds validation check

@err@
identifier var;
position p;
@@

FUNCTION_NAME(...)
{
    ... when != if (var >= MAX_VALUE) ...
        when != if (var < 0 || var >= MAX_VALUE) ...
*   USAGE_CONTEXT[var]@p;
}

Template 3: Incorrect Return Value

Use when: Fix changes a returned constant

@err@
position p;
@@

FUNCTION_NAME(...)
{
*   return OLD_VALUE;@p
}

Template 4: Missing Function Call

Use when: Fix adds a required function call

@has_call@
@@

FUNCTION_NAME(...)
{
    ...
    REQUIRED_CALL(...);
    ...
}

@err depends on !has_call@
position p;
@@

* FUNCTION_NAME@p(...)
{
    ...
}

Template 5: Use-After-Free

Use when: Fix adds checks after free

@err@
identifier x;
position p1, p2;
@@

* kfree@p1(x);
  ... when != x = ...
* x->field@p2

Template 6: Information Leak

Use when: Fix adds memset before copy_to_user

@err@
identifier var;
position p;
@@

FUNCTION_NAME(...)
{
    struct STRUCT_TYPE var;
    ... when != memset(&var, 0, sizeof(var));
*   copy_to_user(..., &var, ...)@p;
}

Template 7: Incorrect Permission

Use when: Fix changes permission bits

@err@
position p;
@@

sysfs_function(...)
{
*   return VULNERABLE_MODE;@p
}

Template 8: Race Condition (Missing Lock)

Use when: Fix adds locking around critical section

@locked@
@@

FUNCTION_NAME(...)
{
    spin_lock(...);
    ...
    spin_unlock(...);
}

@err depends on !locked@
position p;
@@

FUNCTION_NAME(...)
{
*   shared_variable@p = ...;
}

Systematic Development Process

Step 1: Analyze the Vulnerable Commit

# Get the diff showing the vulnerability fix
git show <fix_commit_hash>

# Understand what changed
# - Lines with - are old (vulnerable) code
# - Lines with + are new (fixed) code

Step 2: Identify Key Pattern

Extract the minimal distinguishing pattern:

Example Analysis:

// From git diff
-    return 0444;
+    return 0400;

Key Pattern: return 0444; in specific function

Step 3: Write Initial Rule

Start with the basic template and fill in:

/// Files: drivers/hwmon/amd_energy.c
/// Fix: 60268b0e8258fdea9a3c9f4b51e161c123571db3

virtual detect

@err@
position p;
@@

amd_energy_is_visible(...)
{
*   return 0444;@p
}

@script:python depends on detect@
p << err.p;
@@

coccilib.report.print_report(p[0], 'ERROR: CVE-2020-12912')

Step 4: Validate the Rule

# Test on vulnerable version (should detect)
git checkout <commit_before_fix>
spatch --no-includes --include-headers -D detect \
    --cocci-file CVE-YYYY-NNNNN.cocci \
    <affected_file>

# Test on fixed version (should NOT detect)
git checkout <fix_commit>
spatch --no-includes --include-headers -D detect \
    --cocci-file CVE-YYYY-NNNNN.cocci \
    <affected_file>

Step 5: Refine for Accuracy

If false positives occur:

  • Add more context (function name, surrounding code)
  • Add when constraints
  • Use rule dependencies

If false negatives occur:

  • Simplify pattern
  • Use exists constraint
  • Add alternative patterns with \(alt1\|alt2\)

Advanced Pattern Techniques

Technique 1: Function Alternatives

When the same bug exists in multiple related functions:

@err@
position p;
@@

\(function1\|function2\|function3\)(...)
{
*   vulnerable_pattern@p;
}

Technique 2: Rule Dependencies

When detection requires multiple conditions:

@prerequisite@
@@

feature_function(...)
{
    ...
}

@err depends on prerequisite@
position p;
@@

* vulnerable_code@p(...);

Technique 3: Multiple Detection Points

When vulnerability appears in multiple locations:

@err@
position p;
@@

(
* vulnerable_call1@p(...);
|
* vulnerable_call2@p(...);
|
* vulnerable_call3@p(...);
)

@script:python depends on detect@
p << err.p;
@@

coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')

Technique 4: Repetitive Patterns

When the same pattern repeats across functions (see CVE-2020-12352):

@err_func1 exists@
identifier req;
position p;
@@

func1(...)
{
    ...
    struct type req;
    ... when != memset(&req, 0, sizeof(req));
*   send(..., &req)@p;
    ...
}

@err_func2 exists@
identifier rsp;
position p;
@@

func2(...)
{
    ...
    struct type rsp;
    ... when != memset(&rsp, 0, sizeof(rsp));
*   send(..., &rsp)@p;
    ...
}

// Repeat for each affected function

@script:python depends on detect@
p << err_func1.p;
@@

coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')

@script:python depends on detect@
p << err_func2.p;
@@

coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')

Testing Protocol

Test 1: Positive Detection

# Rule MUST detect on vulnerable code
cd /path/to/kernel
git checkout <commit_before_fix>
spatch --no-includes --include-headers -D detect \
    --very-quiet --cocci-file CVE-YYYY-NNNNN.cocci \
    <affected_file>

# Expected: Should print "ERROR: CVE-YYYY-NNNNN"

Test 2: Negative Detection (Fixed Code)

# Rule MUST NOT detect on fixed code
git checkout <fix_commit>
spatch --no-includes --include-headers -D detect \
    --very-quiet --cocci-file CVE-YYYY-NNNNN.cocci \
    <affected_file>

# Expected: No output

Test 3: False Positive Check

# Test on unrelated kernel files
spatch --no-includes --include-headers -D detect \
    --cocci-file CVE-YYYY-NNNNN.cocci \
    <unrelated_file>

# Expected: No output

Test 4: CVEhound Integration

cvehound --kernel /path/to/kernel --cve CVE-YYYY-NNNNN

# Expected: Should find CVE on vulnerable versions only

Error Prevention Checklist

Before finalizing, verify:

  • File naming: CVE-YYYY-NNNNN.cocci (exact format)
  • All metadata fields present (Files, Fix)
  • Position variable declared: position p;
  • Vulnerable lines marked with * and @p
  • Python script references correct position: p << err.p;
  • CVE ID correct in error message
  • Rule tested on vulnerable code (detects)
  • Rule tested on fixed code (does not detect)
  • No syntax errors: spatch --parse-cocci CVE-YYYY-NNNNN.cocci
  • Proper context to avoid false positives
  • Follows patterns from similar CVEs in repository

Common Mistakes to Avoid

Mistake 1: Missing Position Marker

// WRONG
@err@
@@

vulnerable_code();

// CORRECT
@err@
position p;
@@

vulnerable_code@p();

Note: The asterisk (*) is optional for detection.

Mistake 2: Too Generic Pattern

// WRONG: Will match everywhere
@err@
position p;
@@

* return -1;@p

// CORRECT: Specific function context
@err@
position p;
@@

specific_function(...)
{
    ...
*   return -1;@p
}

Mistake 3: Wrong Position Syntax

// WRONG: Multiple positions on same line
* vulnerable_code@p1()@p2;

// CORRECT: One position per statement
* vulnerable_code@p();

Mistake 4: Incorrect Python Script

// WRONG: Wrong variable name
@script:python depends on detect@
p << wrong_rule.p;
@@

// CORRECT: Must match rule name
@err@
position p;
@@

* code@p;

@script:python depends on detect@
p << err.p;    // Matches @err@ rule name
@@

coccilib.report.print_report(p[0], 'ERROR: CVE-YYYY-NNNNN')

Mistake 5: Forgetting virtual detect

// WRONG: Missing virtual declaration
/// Files: foo.c
/// Fix: abc123

@err@

// CORRECT: Always include
/// Files: foo.c
/// Fix: abc123

virtual detect

@err@

File Naming Convention

Format: CVE-YYYY-NNNNN.cocci

  • YYYY: 4-digit year
  • NNNNN: 4-7 digit CVE number
  • Extension: Always .cocci
  • No prefixes or suffixes

Examples:

  • CVE-2020-12912.cocci
  • CVE-2016-5195.cocci
  • cve-2020-12912.cocci ✗ (wrong case)
  • CVE-2020-12912.patch ✗ (wrong extension)
  • rule_CVE-2020-12912.cocci ✗ (has prefix)

Quality Metrics

A high-quality rule should:

  1. Accuracy: 100% detection on vulnerable code, 0% on fixed code
  2. Specificity: Minimal false positives on unrelated code
  3. Simplicity: As simple as possible while remaining accurate
  4. Robustness: Works across different kernel versions
  5. Documentation: Clear metadata explaining the vulnerability
  6. Performance: Executes quickly (avoid complex nested patterns)

Repository Integration

File Placement

cvehound/cve/CVE-YYYY-NNNNN.cocci

For Disputed CVEs

cvehound/cve/disputed/CVE-YYYY-NNNNN.cocci

Testing Integration

Add test case to tests/test_03_on_fix.py:

@pytest.mark.parametrize("cve,kernel,commit", [
    ("CVE-YYYY-NNNNN", "torvalds", "fix_commit_short"),
])
def test_cve_on_fix(cve, kernel, commit):
    # Automatically tests that CVE is NOT detected on fixed commit
    pass

Execution Model

CVEhound executes rules using:

spatch \
    --no-includes \           # Don't process #include directives
    --include-headers \       # But do process header files
    -D detect \               # Enable "detect" virtual mode
    --no-show-diff \          # Don't show diffs
    --very-quiet \            # Minimal output
    --cocci-file <rule> \     # Your rule file
    -I <kernel_includes> \    # Kernel include paths
    <target_file>             # File to check

Your rule's output goes to stdout and is parsed by CVEhound.

Learning from Examples

When creating a new rule, find similar CVEs:

Find Similar Vulnerability Types

cd cvehound/cve
grep -l "memset" *.cocci          # Initialization bugs
grep -l "copy_to_user" *.cocci    # Information leaks
grep -l "when != if" *.cocci      # Missing checks
grep -l "return 0" *.cocci        # Return value bugs

Study Complex Rules

# Large comprehensive rules
wc -l *.cocci | sort -n | tail -5

# Rules with dependencies
grep -l "depends on" *.cocci

# Rules with alternatives
grep -l "\\\\(" *.cocci

Decision Matrix: Choosing Metavariables

Need to MatchUse ThisExample
Unknown function nameidentifier funcidentifier func;
Specific functionsymbol func_namesymbol kfree;
Any expressionexpression Eexpression E;
Any statementstatement Sstatement S;
Any typetype Ttype T;
Location for reportingposition pposition p;
Specific constantliteral0444, -EINVAL
Any integerconstant Cconstant C;

Pattern Complexity Guidelines

Simple Pattern (Preferred)

  • Single rule
  • Direct pattern matching
  • Minimal context
  • Fast execution
@err@
position p;
@@

function(...)
{
*   return 0444;@p
}

Medium Pattern

  • 2-3 rules with simple dependencies
  • Some constraints
  • Moderate context
@has_feature@
@@

init_function(...)

@err depends on has_feature@
position p;
@@

* usage@p(...);

Complex Pattern (Use When Necessary)

  • Multiple rules with dependencies
  • Complex constraints
  • Alternative patterns
  • Multiple detection points

Only use complex patterns when:

  • The vulnerability requires checking multiple conditions
  • Simple patterns produce too many false positives
  • The CVE affects many similar functions (like CVE-2020-12352)

Output Format

Your rule should produce exactly this format:

file.c:123: ERROR: CVE-YYYY-NNNNN

The coccilib.report.print_report() function handles this automatically.

Version Compatibility

CVEhound supports Coccinelle >= 1.0.7. If your rule requires newer features:

/// Files: foo.c
/// Fix: abc123
/// Version: 1.0.8

virtual detect
...

Summary: Quick Start Workflow

  1. Gather info: CVE ID, fix commit hash, affected files
  2. Analyze diff: git show <fix_commit>
  3. Choose strategy: Missing fix vs unfixed code detection
  4. Pick template: Use appropriate vulnerability pattern template
  5. Write rule: Fill in template with specific patterns
  6. Test positive: Verify detection on vulnerable code
  7. Test negative: Verify no detection on fixed code
  8. Refine: Adjust for false positives/negatives
  9. Document: Ensure all metadata is complete
  10. Integrate: Place in cvehound/cve/ directory

Resources

  • Human-oriented guide: docs/WRITING_RULES.md
  • Quick reference: docs/COCCINELLE_CHEATSHEET.md
  • Enhanced template: contrib/template.cocci
  • Minimal template: contrib/blank.cocci
  • Example rules: cvehound/cve/*.cocci
  • Coccinelle docs: https://coccinelle.gitlabpages.inria.fr/website/docs/

Agent Optimization Note: This guide is designed for systematic execution. Follow the decision trees and templates exactly for consistent results. When in doubt, study similar CVEs in the repository and replicate their approach.