Agent-Driven Edge Case Testing

October 2, 2025 · View on GitHub

A production-ready Claude Code pipeline for agent-driven test generation with bounded auto-correction and defense-in-depth validation.

Why use this workflow?

LLM-generated test specifications often fail in production because:

  • Agent outputs don't match actual code behavior
  • Unbounded correction loops mask real bugs
  • No traceability when tests fail
  • Non-deterministic test generation

This pipeline provides a reusable pattern for agent-driven testing with safety guarantees.

Prerequisites

To make the most of the agent-hook workflow in this repo, you'll need Claude Code, Python, and a working bash environment with jq installed. See requirements.txt for pinned dependencies.

The code examples are written in Python and Bash, but the concepts remain the same across any programming language.

Repository Table of Contents

For Evaluators

Tool Use and Guides

Key Features

Highlights: Multi-layer validation, bounded auto-correction, and automatic code testing

  • Custom agent (EdgeCaseAgent) for structured test IR (intermediate representation, a structured JSON format that encodes test cases) generation
  • 3 validation layers catching syntax, schema, and semantic errors
  • Edge case tests generated for the example division.py function
  • Max 1 auto-repair attempt, then fail-fast classification
  • 3 observability outputs: pytest JSON, human summary, traceability manifest
%%{init: { "flowchart": { "defaultRenderer": "elk" } } }%%
flowchart TD
    classDef actor fill:#d4e6f1,stroke:#2980b9,stroke-width:2px;
    classDef process fill:#e8f8f5,stroke:#1abc9c,stroke-width:2px;
    classDef artifact fill:#fef9e7,stroke:#f39c12,stroke-width:2px;
    classDef hook fill:#f4ecf7,stroke:#8e44ad,stroke-width:2px;

    A(1. Developer Prompts Claude) --> B["2. Claude (via subagent EdgeCaseAgent)<br>Generates ir.json"];
    B --> C{"3. Stop Hook (post_code_gen.sh)<br>Validates, Tests & Auto-Corrects"};
    C --> D["4. Final Reports<br>(summary.md, run.meta.json)"];

    class A actor;
    class B process;
    class C hook;
    class D artifact;

    %% Force all text to black
    style A fill:#d4e6f1,stroke:#2980b9,stroke-width:2px,color:#000000;
    style B fill:#e8f8f5,stroke:#1abc9c,stroke-width:2px,color:#000000;
    style C fill:#f4ecf7,stroke:#8e44ad,stroke-width:2px,color:#000000;
    style D fill:#fef9e7,stroke:#f39c12,stroke-width:2px,color:#000000;

Quick Start

  1. Clone the Repository:
git clone https://github.com/abean23/claude-code-agent-hooks
cd claude-code-agent-hooks
  1. Set up the environment:
# Set up Python venv
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Ensure you have jq installed (system dependency, not Python)
jq --version

# If not installed:
# macOS: `brew install jq`
# WSL/Ubuntu/Debian: `sudo apt-get install -y jq`
  1. Run the example:
#Generate edge case tests for the example division function
claude "$(cat examples/sample_prompt.txt)"
#Close out of Claude Code (ctrl-C), then open the human-readable test summary
cat outputs/reports/summary.md
# Expected output:
# Status: PASS
# Total: N tests | Passed: N ✓ | Failed: 0 ✗
# All tests passed - code meets IR specifications

Why This Architecture?

Bounded Auto-Correction Prevents Infinite Loops
One-time correction with correction_attempted flag ensures the system never masks real bugs with infinite IR rewrites, while still fixing malformed IR JSON on first pass.

Defense-in-Depth Validation
Three validation layers (syntax → schema → semantic) catch different error types with actionable messages in output logs.

Full Traceability
SHA256 checksums and git state tracking in run.meta.json enable reproducible debugging and traceability.

Controlled Determinism
While agent outputs are inherently non-deterministic, we constrain all surrounding systems to be reproducible: test collection order is fixed, dependencies are pinned, and generated test code is formatted consistently with Black. This ensures that variation is isolated to the model outputs themselves, not the pipeline.

See full technical rationale →

Additional Resources

Some additional helpful links on Anthropic's Claude Code Agents and Hooks can be found here: