AgentPack
July 3, 2026 ยท View on GitHub
Your agent starts cold. AgentPack hands it the map.
Ranked repo context for Codex, Claude Code, Cursor, Windsurf, Copilot, Cline, Kiro, OpenCode, MCP, CI, and markdown workflows.
local preflight
ranked files
skill routing
warm cache
tests + commands
receipts
no cloud index
You know the pattern. You ask an agent to fix one bug. It rgs half the repo, opens the wrong files, misses the test, then rediscovers the architecture you already had.
AgentPack does the repo-orientation pass first.
agentpack route --task "fix auth token expiry"
-> files that probably matter
-> why those files, and why common candidates were skipped
-> skills and rules that fit the task
-> tests that probably prove it
-> rules, commands, warnings
-> compact context before the agent edits
AgentPack is not another coding agent. It is the local context engine you put in front of the agents you already use.
The Pitch
Without AgentPack: agent explores first, edits later.
With AgentPack: agent starts near the right files.
No cloud index. No embeddings. No model calls for scan/rank/pack. Just local repo analysis, ranked context, and receipts for what got included or skipped.
It is not a repo dump. It is not a generic memory app. It is not a promise that your agent will be right.
It is a preflight map: likely files, likely tests, the right local skill or rule, commands, warnings, and a compact pack your agent can inspect before touching code.
The first run builds local summaries and repo signals. Later runs reuse that cache, so agents do less repeat discovery and spend more of their budget on the actual change.
What We Are Solving
AgentPack exists because developer-agent work has three recurring failure modes:
- Cold-start drift: every new chat repeats repo discovery, burns tokens, and may anchor on the wrong files.
- Session collision: two chats in the same repo can accidentally share stale task context and continue old work.
- Context inflation: agents ask for full repo context when a task, delta, or one related file would be enough.
The direction is a local developer control plane, not another autonomous agent.
quickstart, start, next, and doctor are the human-facing loop.
MCP readiness(), get_context(), get_delta_context(), and route/explain
tools are the agent-facing loop. Both read the same task/session/context/token
state, so AgentPack can answer "what now?" consistently across Codex, Claude,
Cursor, Windsurf, Antigravity, and generic agents.
The long-term vision is a practical second brain for development: local memory,
review evidence, AST/symbol structure, task history, and observer signals that
help the next agent orient faster. The shipped memory graph records task-start
maps, node refs, episodes, procedures, and memory edges under .agentpack/;
agentpack memory --timeline shows timestamps, hashes, confidence, stale-path
checks, and visible reasons. It remains advisory by design. Source files,
diffs, tests, runtime evidence, and PR review stay the source of truth.
Quick Start
pipx install agentpack-cli
agentpack --version
Inside your repo:
agentpack quickstart
agentpack start "fix auth token expiry"
agentpack next
agentpack doctor --agent auto
Then give .agentpack/context.md to your agent, or let MCP-capable agents call AgentPack tools directly.
Core onboarding is quickstart, start, next, and doctor. next is the
single "what now?" command: it checks setup, task/session state, context
freshness, thread overlap, and token guidance. Use route, pack, and
benchmark when you need deeper inspection or measurement. Everything else is
an advanced workflow or release/diagnostic helper.
For one-shot use without installing:
pipx run --spec agentpack-cli agentpack route --task "fix auth token expiry"
For JavaScript/TypeScript projects, npm wrapper is available:
npx @vishal2612200/agentpack --version
npx @vishal2612200/agentpack quickstart
npx @vishal2612200/agentpack start "fix auth token expiry"
npx @vishal2612200/agentpack next
Proof So Far
AgentPack's current public benchmark checks one narrow thing: whether selected context overlaps with files actually changed in historical commits. Treat it as evidence for a ranked starting map, not proof that any agent will finish every task faster or better.
| Signal | Result | Developer meaning |
|---|---|---|
| Public commit cases | 107 | real historical file-selection checks |
| Average recall | 65.7% | did AgentPack include files that mattered? |
| Token precision | 51.4% | how much of pack was useful instead of noise? |
| Pack p50 | 315 tokens | typical compact starting context |
| Pack p95 | 1,137 tokens | larger but still bounded starting context |
Source: benchmarks/results/2026-06-25-public.md. Benchmark guide: docs/benchmarking.md.
This is useful but not magic. It says AgentPack often gets meaningful files into a small pack. It does not replace source inspection, tests, runtime evidence, or review. Agent success A/B benchmarks should report task success, tool calls, token cost, validation quality, and time-to-first-correct-file.
E2E outcome proof is tracked separately in benchmarks/results/e2e-ab-status.md. Do not treat file-selection results as task-success or cost-savings proof.
Memory feedback has its own guardrail: compare ranking with memory off/on using
agentpack eval --memory-ab. Timestamped memory can explain or boost context,
but it is not task-success proof.
New Contributors
Start with good first issue or help wanted issues.
If this would be your first open-source contribution, use the smaller
first-timers-only queue.
Contribution setup and review expectations are in CONTRIBUTING.md.
Quick Demo
Start with the control-plane loop:
agentpack quickstart
agentpack start "fix billing webhook retry handling"
agentpack next
AgentPack writes local task/context state under .agentpack/, checks freshness,
and tells you the next safe action. MCP-capable agents use the same state
through readiness(), get_context(), and get_delta_context().
Use route and pack when you want deeper inspection:
agentpack route --task "fix billing webhook retry handling"
agentpack pack --task auto
route returns likely files, why-selected and why-not-selected notes, tests,
rules, commands, warnings, and matching skills without writing source files.
pack renders selected files, omitted-file receipts, freshness checks, token
stats, and citation provenance for packed claims.
AgentPack reuses cached file summaries and snapshot metadata so repeated packs do not start from zero.
Run agentpack doctor when an agent integration, MCP setup, hook, or installed CLI path looks stale.
Inspect advisory memory with agentpack memory --timeline; prune local history
with agentpack memory --prune.
Capability Map
| Area | What AgentPack provides |
|---|---|
| Orientation | ranked files, likely tests, commands, repo rules, skills, and why/why-not receipts |
| Control plane | next, status, guard, MCP readiness, thread state, freshness checks, and exact repair commands |
| Token control | budgeted packs, token contracts, delta-context guidance, cached summaries, and retrieval IDs |
| Review and proof | citation-backed review artifacts, review preflight, benchmark misses, and local validation guidance |
| Advisory memory | task-start maps, node refs, episodic/procedural links, timeline/staleness checks, and observer signals below source/test evidence |
Current Focus
- Make
quickstart,start,next, anddoctorthe default human loop. - Keep
next,quickstart,status,guard, and MCP readiness on one shared control-plane snapshot. - Use token contracts to recommend full context vs delta context.
- Keep repair output explicit: what failed, why it matters, the exact command, and whether work can safely continue.
- Keep review, TOON, route explainability, and MCP troubleshooting grounded in source, diff, test, and PR evidence.
- Keep advisory memory auditable with timestamps, provenance, confidence, hashes, stale checks, and visible reasons.
What We Want To Prove Next
AgentPack should eventually show:
- fewer agent file reads
- fewer tool calls
- faster first correct file
- lower total context cost
- equal or better task success
Works With
- Codex
- Claude Code
- Cursor
- Windsurf
- Antigravity
- MCP tools
- CI and PR review workflows
- generic markdown-based LLM workflows
See docs/integrations.md and docs/mcp-context-engine.md.
Agent And IDE Plugins
AgentPack can be used through thin plugin and IDE integration layers so agents start with ranked repo context. Codex has a packaged plugin skeleton; Cursor, Windsurf, Copilot, Cline, Kiro, OpenCode, Claude Code, Antigravity, and generic agents use the same local CLI/MCP engine through portable rules, hooks, and native integration stubs.
Inside Codex:
@agentpack-route fix auth token expiry
@agentpack-pack fix auth token expiry
@agentpack-review focus on backward compatibility
The Codex plugin calls the local AgentPack engine. Codex setup enables the
local agentpack@local bundle so commands like @agentpack-review match the
installed CLI version. Verify with agentpack doctor --agent codex after
upgrades.
The review flow prepares a local two-stage PR review bundle: preflight metadata,
a runbook, stage prompts, and branch-scoped understanding/findings JSON files.
It does not replace gh pr view, git diff, direct code reads, or tests.
AgentPack does not upload code and does not turn AgentPack into a coding agent.
See docs/agent-plugins.md and docs/codex-plugin.md.
When To Use It
Use AgentPack when:
- repo is large or split across multiple packages
- monorepo structure makes file discovery expensive
- agents repeat same discovery work across tasks
- CI or PR review needs reproducible context
- agents waste tool calls opening irrelevant files
- tasks often miss tests, config, generated rules, or repo conventions
- teams have useful skills/rules but agents do not reliably pick the right one
- repeated agent sessions keep rediscovering the same repo structure
Skip AgentPack or keep it as a light preflight when:
- repo is tiny
- question is one-shot and read-only
- you already know exact files to edit
- you need autonomous coding, not context preparation
- native IDE search is already enough for task
Boundaries
AgentPack is closest to a local preflight and control plane:
- unlike repo dumpers, it ranks and compresses by task
- unlike coding agents, it does not edit code
- unlike IDE search, it routes before the agent starts wandering
- unlike generic skills/rules, it recommends the ones that fit the task
- unlike generic memory, its observer signals stay advisory and local
Implementation deep dives: docs/architecture.md, docs/how-agentpack-works.md, and docs/commands.md.
Trust And Privacy
- local-first by default
- no cloud indexing
- no embeddings or API calls for scan, rank, pack, stats, or benchmark
- generated files live under
.agentpack/ - local task/memory artifacts can include task text, paths, hashes, reasons, timestamps, and confidence
- review packs before sharing them outside your machine
Details: docs/privacy.md, docs/threat-model.md, docs/data-flow.md, and SECURITY.md.
Install Notes
Requires Python 3.10+ and is tested on Python 3.10-3.14. PyPI package is agentpack-cli; command is agentpack.
Use pipx for normal installs because many macOS/Linux Python distributions block global pip install with PEP 668's externally-managed-environment error.
Install pipx first if needed:
# macOS
brew install pipx
# Ubuntu/Debian
sudo apt install pipx
# Fedora
sudo dnf install pipx
# Arch
sudo pacman -S python-pipx
pipx ensurepath
Docs
docs/index.md: docs homedocs/architecture.md: pipeline, data flow, package layout, and rendered-budget accountingdocs/commands.md: full CLI command referencedocs/configuration.md: config, scoring weights,.agentignore, and git integrationdocs/integrations.md: agent setup, MCP workflow, hooks, and native integration statusdocs/agent-plugins.md: plugin and IDE distribution layerdocs/codex-plugin.md: thin Codex plugin commands and local workflowdocs/mcp-context-engine.md: MCP tools and context workflowdocs/benchmarking.md: quality bar, release gate, and public artifactsdocs/limitations.md: project scope, known limits, and roadmap
Status
Alpha: 0.3.38.
Works, tested, and used in real sessions. Python and JavaScript/TypeScript have strongest support. APIs may change before 1.0.
Platform support targets macOS, Linux, and Windows PowerShell with Git for Windows. cmd.exe and bare Git setups are not supported yet.
Name note: PyPI package is agentpack-cli, npm package is @vishal2612200/agentpack, and command is agentpack. This project is unrelated to AgentPack dataset papers or other repos with the same name.
Contributing
See CONTRIBUTING.md for setup, validation, and PR expectations.
Community behavior is covered by CODE_OF_CONDUCT.md.
License
MIT