CTF Agent

March 24, 2026 · View on GitHub

Autonomous CTF (Capture The Flag) solver that races multiple AI models against challenges in parallel. Built in a weekend, we used it to solve all 52/52 challenges and win 1st place at BSidesSF 2026 CTF.

Built by Veria Labs, founded by members of .;,;. (smiley), the #1 US CTF team on CTFTime in 2024 and 2025. We build AI agents that find and exploit real security vulnerabilities for large enterprises.

Results

CompetitionChallenges SolvedResult
BSidesSF 202652/52 (100%)1st place ($1,500)

The agent solves challenges across all categories — pwn, rev, crypto, forensics, web, and misc.

How It Works

A coordinator LLM manages the competition while solver swarms attack individual challenges. Each swarm runs multiple models simultaneously — the first to find the flag wins.

                        +-----------------+
                        |  CTFd Platform  |
                        +--------+--------+
                                 |
                        +--------v--------+
                        |  Poller (5s)    |
                        +--------+--------+
                                 |
                        +--------v--------+
                        | Coordinator LLM |
                        | (Claude/Codex)  |
                        +--------+--------+
                                 |
              +------------------+------------------+
              |                  |                  |
     +--------v--------+ +------v---------+ +------v---------+
     | Swarm:          | | Swarm:         | | Swarm:         |
     | challenge-1     | | challenge-2    | | challenge-N    |
     |                 | |                | |                |
     |  Opus (med)     | |  Opus (med)    | |                |
     |  Opus (max)     | |  Opus (max)    | |     ...        |
     |  GPT-5.4        | |  GPT-5.4       | |                |
     |  GPT-5.4-mini   | |  GPT-5.4-mini  | |                |
     |  GPT-5.3-codex  | |  GPT-5.3-codex | |                |
     +--------+--------+ +--------+-------+ +----------------+
              |                    |
     +--------v--------+  +-------v--------+
     | Docker Sandbox  |  | Docker Sandbox |
     | (isolated)      |  | (isolated)     |
     |                 |  |                |
     | pwntools, r2,   |  | pwntools, r2,  |
     | gdb, python...  |  | gdb, python... |
     +-----------------+  +----------------+

Each solver runs in an isolated Docker container with CTF tools pre-installed. Solvers never give up — they keep trying different approaches until the flag is found.

Quick Start

# Install
uv sync

# Build sandbox image
docker build -f sandbox/Dockerfile.sandbox -t ctf-sandbox .

# Configure credentials
cp .env.example .env
# Edit .env with your API keys and CTFd token

# Run against a CTFd instance
uv run ctf-solve \
  --ctfd-url https://ctf.example.com \
  --ctfd-token ctfd_your_token \
  --challenges-dir challenges \
  --max-challenges 10 \
  -v

Coordinator Backends

# Claude SDK coordinator (default)
uv run ctf-solve --coordinator claude ...

# Codex coordinator (GPT-5.4 via JSON-RPC)
uv run ctf-solve --coordinator codex ...

Solver Models

Default model lineup (configurable in backend/models.py):

ModelProviderNotes
Claude Opus 4.6 (medium)Claude SDKBalanced speed/quality
Claude Opus 4.6 (max)Claude SDKDeep reasoning
GPT-5.4CodexBest overall solver
GPT-5.4-miniCodexFast, good for easy challenges
GPT-5.3-codexCodexReasoning model (xhigh effort)

Sandbox Tooling

Each solver gets an isolated Docker container pre-loaded with CTF tools:

CategoryTools
Binaryradare2, GDB, objdump, binwalk, strings, readelf
Pwnpwntools, ROPgadget, angr, unicorn, capstone
CryptoSageMath, RsaCtfTool, z3, gmpy2, pycryptodome, cado-nfs
Forensicsvolatility3, Sleuthkit (mmls/fls/icat), foremost, exiftool
Stegosteghide, stegseek, zsteg, ImageMagick, tesseract OCR
Webcurl, nmap, Python requests, flask
Miscffmpeg, sox, Pillow, numpy, scipy, PyTorch, podman

Features

  • Multi-model racing — multiple AI models attack each challenge simultaneously
  • Auto-spawn — new challenges detected and attacked automatically
  • Coordinator LLM — reads solver traces, crafts targeted technical guidance
  • Cross-solver insights — findings shared between models via message bus
  • Docker sandboxes — isolated containers with full CTF tooling
  • Operator messaging — send hints to running solvers mid-competition

Configuration

Copy .env.example to .env and fill in your keys:

cp .env.example .env
CTFD_URL=https://ctf.example.com
CTFD_TOKEN=ctfd_your_token
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...

All settings can also be passed as environment variables or CLI flags.

Requirements

  • Python 3.14+
  • Docker
  • API keys for at least one provider (Anthropic, OpenAI, Google)
  • codex CLI (for Codex solver/coordinator)
  • claude CLI (bundled with claude-agent-sdk)

Acknowledgements

  • es3n1n/Eruditus — CTFd interaction and HTML helpers in pull_challenges.py