Agentic Malware Analysis

March 22, 2026 · View on GitHub

Automated deep malware reverse engineering driven by AI agents. A Kali-based Docker environment pairs 50+ RE tools with MCP-connected disassembler backends (Binary Ninja or Ghidra) and a structured multi-phase orchestrator skill that turns a raw binary into a case directory of ranked evidence, validated hypotheses, component maps, and a prioritized deep-analysis plan -- with no human interaction required. Ready for Claude Code and Codex CLI.

See the companion blog post Building a Pipeline for Agentic Malware Analysis for background, a case study, and evaluation.

Why

Initial malware analysis involves a number of routine steps: collecting hashes and compiler artifacts, extracting strings, inspecting imports, running YARA and capa, correlating the results, and identifying code areas for closer inspection. These steps provide the basis for deeper analysis, but they are often repetitive and time-consuming.

This repository automates much of that workflow. The orchestrator skill collects and organizes analysis artifacts, highlights relevant signals, generates evidence-backed hypotheses, builds a basic component model, and prepares a prioritized deep-analysis plan. All intermediate results are stored in a per-sample case directory on disk, making the workflow easier to resume and review.

Via MCP, the agent can also use Binary Ninja or Ghidra to inspect functions, follow cross-references, and tie findings to concrete code locations. The result is a structured starting point for follow-up analysis rather than ad hoc triage alone.

Features

  • Kali Linux container with 50+ RE and malware analysis tools
  • Automatic MCP backend selection (Binary Ninja or Ghidra)
  • malware-analysis-orchestrator skill for Claude Code and Codex CLI
  • Helper scripts for strings, imports, YARA, capa, signal ranking, hypothesis generation
  • Bundled YARA rules (crypto, anti-debug/anti-VM, capabilities, packer -- from Yara-Rules/rules, GPL-2.0)
  • Wrapper scripts for Claude Code and Codex with aggressive defaults
  • Persistent state across container rebuilds (BN license, Claude auth, Codex auth)
  • PE, ELF, and Mach-O support
  • Content-hash-based image caching

Prerequisites

  • Docker with buildx
  • Anthropic API key or Claude account (for Claude Code) or OpenAI API key (for Codex)
  • Recommended: Binary Ninja Linux zip with a headless-capable license (see Setup below)

Quick Start

Important: run_docker.sh mounts your current working directory into the container at /agent. The agent wrappers run with full permissions (--dangerously-skip-permissions / --dangerously-bypass-approvals-and-sandbox) by design, so the agent can read, write, and execute anything in that directory. Clone the repository into a dedicated directory and place only the files you want the agent to access there.

git clone https://github.com/mrphrazer/agentic-malware-analysis.git
cd agentic-malware-analysis
cp /path/to/binaryninja_linux.zip ./binaryninja.zip   # recommended; without it Ghidra is used instead
./run_docker.sh

What happens:

  1. Prepares a Docker Buildx builder
  2. If binaryninja.zip is present, Binary Ninja and its MCP server are installed; otherwise Ghidra and its MCP server are installed as a fallback
  3. Clones the selected MCP server repo into mcp/
  4. Builds the image (or reuses a cached one based on content hash)
  5. Seeds BN license, Claude credentials, and Codex credentials from host directories
  6. Launches the container with the current directory mounted at /agent

Inside the container:

claude   # or: codex

Extract the included example sample and start an analysis with the orchestrator skill:

cd examples && unzip -P infected samples.zip && cd ..

Then prompt the agent:

Analyze the malware in examples/samples/mfc42ul.dll -- Give me a detailed overview
of the sample's functionality and features, together with the corresponding code
locations. Use the skill /agent/agent_helpers/claude/skills/malware-analysis-orchestrator/
for analysis.

See examples/README.md for sample details and background.

Setup

Binary Ninja provides substantially better analysis results than Ghidra through its headless MCP server and is the recommended disassembler backend. If no Binary Ninja zip is present, the environment falls back to Ghidra automatically.

To use Binary Ninja, place your Linux headless zip in the repository root before running run_docker.sh:

cp /path/to/binaryninja_linux.zip ./binaryninja.zip

The file must be named binaryninja.zip. Alternatively, point to it explicitly:

BINARY_NINJA_ZIP=/path/to/binaryninja.zip ./run_docker.sh

Binary Ninja requires a valid license.dat. On first run, run_docker.sh copies your existing license from ~/.binaryninja/license.dat into a Docker-specific directory (~/.binaryninja-docker/) that is mounted into the container at /home/agent/.binaryninja. The license is never baked into the image. To use a different host directory:

BINARY_NINJA_USER_DIR=/path/to/binaryninja-user-dir ./run_docker.sh

Claude Code Authentication

run_docker.sh maintains a persistent Claude state directory at ~/.claude-docker/ on the host, mounted into the container at /home/agent/.claude. On first run, it seeds credentials from ~/.claude/.credentials.json if available. You can also log in inside the container with claude auth; credentials persist across container rebuilds in the mounted directory. To use a different host directory:

CLAUDE_USER_DIR=/path/to/claude-dir ./run_docker.sh

Codex CLI Authentication

run_docker.sh maintains a persistent Codex state directory at ~/.codex-docker/ on the host, mounted into the container at /home/agent/.codex. On first run, it seeds credentials from ~/.codex/auth.json if available. You can also log in inside the container with codex login; credentials persist across container rebuilds in the mounted directory. To use a different host directory:

CODEX_USER_DIR=/path/to/codex-dir ./run_docker.sh

Docker Environment

Installed Tooling

Fingerprinting: file, sha256sum, md5sum, ssdeep, die/diec, yara, capa

String extraction: strings, floss (FLARE-FLOSS), rabin2

Disassembly and analysis: radare2, binwalk, gdb/gdb-multiarch, capstone (Python)

Binary utilities: objdump, readelf, nm, patchelf, elfutils

Hex editors: hexedit, bvi, xxd, ht, hexwalk

Dynamic analysis: strace, ltrace, qemu-user

Build tools: gcc, g++, clang, lldb, lld, llvm, cmake, nasm

Python: Python 3, pip, venv, ipython, ipdb, uv

Data processing: jq, yq

Extraction: upx, unblob, ropper

Node.js: Node.js 22

Agent CLIs: Claude Code, Codex CLI

Conditional: Binary Ninja headless + Python API (if zip provided) or Ghidra + pyghidra

Runtime Configuration

  • Working directory: /agent
  • User: agent (non-root, passwordless sudo)
  • Capabilities: SYS_PTRACE, seccomp=unconfined
  • Volume mounts:
    • Host ./agent
    • BN user dir → /home/agent/.binaryninja
    • Claude state dir → /home/agent/.claude
    • Codex state dir → /home/agent/.codex

MCP Integration

The environment automatically selects and configures one MCP backend. Binary Ninja is recommended; Ghidra serves as a fallback when no Binary Ninja zip is provided.

The selected repo is cloned at runtime by run_docker.sh into mcp/. On container start, configure-agent-mcp.sh writes the project-scoped .mcp.json (Claude Code) and Codex config with the correct MCP server entry.

Override the upstream repos:

BINJA_MCP_REPO_URL=https://github.com/mrphrazer/binary-ninja-headless-mcp.git ./run_docker.sh
GHIDRA_MCP_REPO_URL=https://github.com/mrphrazer/ghidra-headless-mcp.git ./run_docker.sh

Malware Analysis Orchestrator

The malware-analysis-orchestrator skill drives a structured, multi-phase malware analysis workflow. It is available for both Claude Code and Codex CLI.

Workflow Stages

  1. Intake and fingerprinting -- hashes, file type, packer/compiler detection, YARA scan, capa scan
  2. Raw strings collection -- strings, rabin2, floss with source tags
  3. Raw API/import collection -- rabin2, format-specific tools (PE/ELF/Mach-O)
  4. Signal filtering and ranking -- score and rank interesting strings and imports by capability
  5. Hypothesis generation -- cross-link evidence, generate behavior hypotheses with confidence and evidence
  6. Component inventory and interaction modeling -- infer components, map data/control flow
  7. Deep analysis planning and prioritization -- ordered tasks with target functions, expected findings, stop criteria
  8. Reporting -- executive summary, technical findings, IOCs, open questions

Case Directory

Each analysis creates a persistent per-sample directory at status/<NNN>-<filename>/ containing 13 required artifact files:

#FileContent
100_sample_profile.mdHashes, file type, packer, YARA, capa results
201_strings_raw.txtAll extracted strings with source tags
302_strings_interesting.mdRanked interesting strings with categories
403_imports_raw.txtFull import tables with source tags
504_imports_interesting.mdSuspicious API clusters by capability
605_behavior_hypotheses.mdHypotheses with confidence and evidence
706_component_inventory.mdInferred components with roles and evidence
807_interaction_model.mdData and control flow between components
908_deep_analysis_plan.mdOrdered deep-analysis tasks
1009_priority_queue.mdPriority queue with rationale and blockers
1110_reporting_draft.mdExecutive summary and technical findings
12INDEX.mdArtifact list, timestamps, missing items
13CURRENT_STATE.jsonMachine-readable phase and progress state

This externalized state is what makes the workflow resilient to context-window compaction: the agent reads case files back from disk instead of relying on conversation history.

Helper Scripts

ScriptPurpose
init_status_tree.shCreate or reuse a case directory for a sample
collect_strings.shRun strings, rabin2 -zz, floss and write raw output
collect_imports.shRun rabin2 -i and format-specific import tools
scan_yara.shScan sample with bundled YARA rules
scan_capa.shRun capa for ATT&CK/MBC capability identification
rank_signals.pyScore and rank interesting strings and imports
build_hypothesis.pyGenerate baseline behavior hypotheses (optional)
update_state.pyUpdate CURRENT_STATE.json with phase transitions
resolve_case.shResolve the latest case directory for a sample

Bundled YARA Rules

Four rule sets from Yara-Rules/rules (GPL-2.0):

  • crypto_signatures.yar -- cryptographic algorithm and constant detection
  • antidebug_antivm.yar -- anti-debug and anti-VM technique detection
  • capabilities.yar -- malicious capability detection
  • packer_compiler_signatures.yar -- packer and compiler signature detection

Roles

  • Orchestrator -- drives phases, enforces artifact completeness, schedules parallel collection
  • Planner -- consumes intermediate evidence, generates hypotheses, defines deep-analysis priorities
  • Reporter -- produces executive and technical summaries with traceable evidence

Agent Wrappers

Claude Code

  • Default model: opus
  • Default effort: high
  • Default permissions: --dangerously-skip-permissions
  • Management commands (auth, doctor, mcp, update, etc.) are passed through without defaults
  • See Claude Code Authentication for credential setup

Codex CLI

  • Default model: gpt-5.4
  • Default reasoning: xhigh
  • Default permissions: --dangerously-bypass-approvals-and-sandbox
  • Features: multi_agent, child_agents_md
  • Management commands (login, logout, mcp, features, etc.) are passed through without defaults
  • See Codex CLI Authentication for credential setup

Customization

Environment Variables

VariableDefaultPurpose
BINARY_NINJA_ZIP./binaryninja.zipPath to Binary Ninja Linux zip
BINARY_NINJA_USER_DIR~/.binaryninja-dockerBN license, settings, and plugins
CLAUDE_USER_DIR~/.claude-dockerClaude auth and state
CODEX_USER_DIR~/.codex-dockerCodex auth and state
BINJA_MCP_REPO_URLhttps://github.com/mrphrazer/binary-ninja-headless-mcp.gitBinary Ninja MCP repo
GHIDRA_MCP_REPO_URLhttps://github.com/mrphrazer/ghidra-headless-mcp.gitGhidra MCP repo

Running Specific Commands

./run_docker.sh claude
./run_docker.sh codex
./run_docker.sh python3 -c 'import sys; print(sys.version)'

Repository Layout

.
├── Dockerfile
├── compose.yaml
├── run_docker.sh
├── docker-bin/
│   ├── claude
│   ├── codex
│   └── configure-agent-mcp.sh
├── docker-config/
│   └── codex-config.toml
├── agent_helpers/
│   ├── claude/skills/malware-analysis-orchestrator/
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   ├── references/
│   │   └── assets/
│   │       ├── yara_rules/
│   │       └── status_templates/
│   └── codex/skills/malware-analysis-orchestrator/
│       ├── SKILL.md
│       ├── agents/
│       ├── scripts/
│       ├── references/
│       └── assets/
│           ├── yara_rules/
│           └── status_templates/
├── examples/
│   ├── README.md
│   └── samples.zip
├── blogs/
└── README.md

Note: mcp/ is cloned at runtime and is not part of the repository.

Limitations

  • Agent runs are non-deterministic -- repeated analyses of the same sample may produce different results
  • Context-window constraints limit single-pass depth; the orchestrator mitigates this with externalized state
  • Agents may produce overconfident or incorrect claims -- expert validation is required
  • The orchestrator focuses on static analysis; dynamic analysis tools are available but not orchestrated
  • The container runs with elevated permissions by design (see Security)

Security

  • SYS_PTRACE and seccomp=unconfined are required for debugging and dynamic analysis -- intentional
  • Agent wrappers default to full permissions inside the container sandbox -- by design for autonomous analysis
  • MCP communication is unauthenticated (stdio transport)
  • Do not expose the container to untrusted networks or users
  • The Binary Ninja license is stored on the host, not in the image

Contact

Tim Blazytko (@mr_phrazer)