armory

June 17, 2026 · View on GitHub

License: MIT packages: 133 evals: 100% GitHub stars catalog

Curated, production-grade skills, agents, hooks, rules, commands, utilities, and presets for AI coding agents. No magic, no demos — battle-tested workflows built for developers who use AI seriously.


Overview

armory is a collection of packages for Claude Code and Claude.ai. Each package is a self-contained prompt or automation unit that extends Claude with a repeatable, opinionated workflow for a specific task domain. Packages span seven types: skills, agents, hooks, rules, commands, utilities, and presets.

Philosophy: Packages in this collection are practical and context-free. They define the how, not just the what — covering inputs, outputs, edge cases, and failure modes. They are tested in real workloads, not constructed as examples.

Intended for developers who treat AI coding agents as a serious part of their workflow.


Package Catalog

Agents — Orchestrators

Orchestrator agents compose skills and other agents into multi-phase workflows. Each can run solo or be spawned by another agent via the Agent tool.

AgentModelDescription
team-leadopusMeta-orchestrator — decomposes multi-domain requests, delegates to specialized agents, synthesizes results
codebase-auditorsonnetUnified quality assessment — spawns code-reviewer, security-reviewer, secret-scanner in parallel, merges report
project-architectopusPhased requirements discovery producing architecture documents with diagrams and tech stack justification
project-plannersonnetTask decomposition with dependency mapping, three-point estimates, milestone timelines, and risk logs
research-analystopusMulti-source investigation with parallel agents across web, academic, video, and competitive sources
idea-scoutopusBusiness idea validation — Lean Canvas, parallel market/competitive/feasibility research, weighted scorecard
full-stack-builderopusEnd-to-end implementation from spec — scaffolding, sprints, quality passes, documentation, pre-delivery review
release-captainsonnetShip lifecycle with quality gates — pre-flight, secret scan, changelog, version bump, PR creation
proposal-writeropusTechnical proposals with ROI calculations, three-tier pricing, and Problem-Agitate-Solve framing
content-strategistsonnetMulti-channel content creation with per-channel adaptation and automated quality passes
media-producersonnetVisual and video format router — selects the right skill based on concept type and output needs
skill-librariansonnetReflective write-phase orchestrator — turns completed task transcripts into skill proposals or augmentations

Agents — Analyzers

AgentModelDescription
code-reviewersonnetMulti-phase code review with severity-ranked findings
security-reviewersonnetOWASP Top 10 vulnerability scanning
secret-scannerhaikuPre-commit detection of hardcoded credentials
skill-routerhaikuOutcome-weighted package routing using historical eval results
test-engineersonnetCo-evolutionary skill evolution with generate-verify-refine loops

Model routing: Agents marked opus run on Claude Opus 4.7 with xhigh effort by default in Claude Code. Use max effort only for genuinely hard novel problems (diminishing returns, overthinking risk); high when running concurrent sessions or for cost-sensitive work. Opus 4.7 uses adaptive thinking — there is no fixed thinking budget to tune.

Skills — Development & Tooling

SkillDescription
agent-builderBuild AI agents using the Claude Agent SDK and headless CLI mode — covers tool definitions, MCP servers, and programmatic orchestration
githubGitHub CLI operations via gh — issues, PRs, CI/Actions, releases, search, REST/GraphQL API, with error handling and automation workflows
filesystemFile and directory operations via Claude Code built-in tools — replaces the Filesystem MCP server with native Read, Write, Edit, Glob, Grep
mcp-to-skillConvert MCP servers into on-demand skills to reduce active context window token usage
gpu-optimizerGPU optimization for consumer GPUs (8-24GB VRAM) — PyTorch, XGBoost, CuPy/RAPIDS, memory management, and CUDA tuning
tavilyAI-optimized web search and content extraction via Tavily API with structured output parsing
test-harnessComprehensive pytest suite generation — happy path, edge cases, error conditions, fixtures, mocks, async, parametrized tests
debug-investigatorSystematic debugging framework — hypothesis-driven investigation with bisection, log analysis, instrumentation, and minimal reproduction
project-context-setupScaffold repo-local agent context — issue tracker rules, triage labels, domain glossary layout, ADR lookup, agent brief conventions
stacked-prsManage dependent branch stacks and stacked pull requests — inspect, split, publish, sync, validate, merge, and clean up stack topology
to-markdownConvert any file or URL to clean Markdown via MarkItDown — PDF, DOCX, XLSX, PPTX, HTML, images, audio, CSV, JSON, XML, YouTube, EPub
web-fetchWeb content fetching via curl and WebFetch — replaces the Fetch MCP server with native HTTP operations and jq parsing
lightpanda-browserLightweight headless browser automation via Lightpanda + agent-browser CDP — 9x lower memory, 11x faster, for scraping, DOM extraction, and form automation
skill-libraryAgent-native catalog for browsing, installing, updating, syncing, and removing armory skills from within a Claude Code session
env-validatorValidate .env files against project requirements — missing vars, type mismatches, insecure defaults, .env.example drift
handoffMaintain .docs/handoff.md as a 200-line session-continuity runbook for in-flight work, blockers, decisions, validation state, and resume steps

Skills — Research & Analysis

SkillDescription
literature-reviewSystematic literature review — search, screen, extract, and synthesize academic research with gap analysis and structured citations
youtube-searchSearch YouTube by keyword via yt-dlp — returns structured metadata (title, URL, channel, views, duration, date) for discovery and source curation
youtube-analysisYouTube video transcript extraction and structured concept analysis — multi-level summaries, key concepts, takeaways, no API keys required
notebooklmGoogle NotebookLM automation via notebooklm-py — create notebooks, add sources, chat, generate podcasts, videos, infographics, quizzes, flashcards, and more
research-critiqueCritical analysis of research papers — methodology evaluation, claims-evidence alignment, contribution assessment with collegial analytical posture
immuneHybrid adaptive memory with Cheatsheet (positive patterns) and Immune (negative patterns) — Hot/Cold tiered memory, multi-domain antibody scanning, auto-learning

Skills — Review & Quality

SkillDescription
architecture-reviewerArchitecture reviews across 7 scored dimensions — structural integrity, scalability, security, performance, enterprise readiness, operations, data
codebase-advisorSenior repo advisor — audits codebase evidence, vets findings, and writes self-contained implementation plans for executor agents
code-refinerDeep code simplification and refactoring — structural complexity analysis, anti-pattern detection, idiomatic rewrites across Python, Go, TS, Rust
citation-auditCitation verification for manuscripts — checks that references are real, correctly attributed, and accurately described
figure-rhetoricFigure and plot communication audit — evaluates whether visuals support the claims they are meant to carry
figure-table-qualityFigure and table rendering audit — checks readability, label collisions, accessibility, formatting, and consistency
pr-reviewDiff-based PR review across 5 dimensions — code quality, test coverage, silent failures, type design, comment quality with severity-ranked output
pre-landing-reviewGate-oriented safety audit with two-pass severity triage — CRITICAL (SQL, races, trust) blocks landing, INFORMATIONAL is advisory
plan-reviewPre-implementation plan audit stress-testing scope, assumptions, risks, and failure modes with product and engineering lenses
manuscript-reviewPre-publication manuscript audit with 24 diagnostic dimensions, citation hygiene, and cross-element coherence
manuscript-provenanceComputational provenance audit verifying every number, table, and figure in a manuscript traces back to code
manuscript-typographyAcademic typography audit — booktabs, captions, units, references, page layout, visual hierarchy, and LaTeX polish
opus-4-7-migrationRepository scanner for Opus 4.7 migration issues — fixed thinking budgets, retired model aliases, and stale prompt assumptions
repo-sentinelSecurity audit and enforcement for public repos — 12 attack surfaces, pre-release readiness, history scrubbing, CI gates
package-evaluatorEvaluate package quality across 6 weighted dimensions with type-specific signals — frontmatter, triggers, structure, depth, consistency, compliance
devils-advocateChallenges AI-generated plans, code, designs, and decisions — pre-mortem, inversion, Socratic questioning with steel-manning and clear verdicts
dependency-auditDependency risk assessment — license compliance, maintenance health scoring, CVE detection, bloat identification, supply chain analysis
qa-systematicSystematic web QA testing with 8-category health scoring, issue taxonomy, and regression tracking — full, quick, and regression modes
usage-auditClaude Code setup audit for token waste and context bloat across MCP servers, CLAUDE.md, skills, and settings
ux-expertUX audit and redesign for B2B SaaS dashboards — 8-dimension analysis, wireframes, component recommendations, severity-ranked findings

Skills — Visualization & Documents

SkillDescription
architecture-diagramLayered architecture diagrams as self-contained HTML with inline SVG icons and CSS Grid layout
concept-to-imageTurn concepts into polished HTML visuals, export as PNG or SVG
concept-to-videoTurn concepts into animated explainer videos using Manim — MP4/GIF output with audio overlay, templates, multi-scene
remotion-videoProduction motion graphics using Remotion (React) — branded content, data-driven video, audio sync, TailwindCSS
html-presentationConvert documents and outlines into self-contained HTML slide presentations
marp-slidesAuthor MARP Markdown slide decks exportable to PDF, PPTX, and HTML via marp-cli
static-web-artifacts-builderSelf-contained interactive HTML artifacts — infographics, dashboards, diagrams
md-to-pdfMarkdown to styled PDF with Mermaid diagrams, KaTeX math, and syntax highlighting

Skills — Documentation & Release

SkillDescription
changelog-composerStructured changelogs from git history — conventional commit parsing, audience filtering, breaking change detection
ship-workflowAutomated release pipeline — merge main, run tests, pre-landing review, version bump, changelog, bisectable commits, PR
engineering-retroGit-based engineering retrospective — commit analysis, velocity metrics, session patterns, health scoring over time windows
adr-writerArchitecture Decision Records — context capture, alternatives analysis, consequence projection, status lifecycle
api-docs-generatorAPI documentation audit and enhancement — FastAPI docstrings, Pydantic examples, OpenAPI spec enrichment, coverage reports
arxiv-preflightarXiv submission readiness audit across TeX source, PDF, figures, metadata, bibliography, and file organization
arxiv-figuresOptimize and convert figures for arXiv processor constraints, file formats, and size limits
arxiv-packagePackage TeX/LaTeX projects into clean tarballs or zip archives ready for arXiv upload

Skills — Backend & Data

SkillDescription
sql-optimizerSQL performance analysis — EXPLAIN interpretation, anti-pattern detection, index recommendations, rewrites
migration-risk-analyzerDatabase migration risk assessment — lock analysis, downtime estimation, rollback strategies, validation
benchmark-runnerStructured benchmark design — metric selection, test case matrix, environment capture, statistical rigor

Skills — Business Validation

SkillDescription
idea-validatorFull business idea validation orchestrator — Lean Canvas, JTBD, parallel market/competitive/feasibility agents, SWOT/PESTLE, weighted scoring
market-analyzerMarket sizing and trend analysis — TAM/SAM/SOM calculation, Rogers adoption curve, data triangulation, timing assessment
competitive-analyzerCompetitive landscape analysis — Porter's Five Forces, feature/pricing matrices, positioning maps, moat taxonomy
feasibility-assessorFinancial and technical feasibility — unit economics (CAC/LTV), revenue modeling, break-even, technical risk scoring, build-vs-buy

Skills — AI/ML & Planning

SkillDescription
prompt-labSystematic prompt engineering — variant generation, evaluation rubrics, failure mode analysis, test suites
rag-auditorRAG pipeline evaluation — retrieval metrics, generation quality, failure taxonomy, diagnostic queries
task-decomposerFeature decomposition — phased task breakdown, dependency mapping, edge case enumeration, sizing
estimate-calibratorCalibrated three-point estimates — PERT ranges, unknown identification, confidence intervals, bias correction

Skills — Writing

SkillDescription
humanizeDetect and remove AI-generated writing patterns — 24 lexical patterns + 12 statistical signals, 6 domain profiles, 5-phase pipeline with semantic preservation
linkedin-post-styleWrite LinkedIn posts in a specific technical voice with visual companion support — carousels via md-to-pdf, images via concept-to-image, video via concept-to-video

Skills — Skill Evolution (EvoSkills)

SkillDescription
paper-to-skillConvert research papers into executable skill packages via methodology extraction and co-evolutionary refinement
skill-distillerDistill Opus-quality skill packages into deterministic, Haiku-executable workflows via trace-driven distillation
surrogate-verifierInformation-isolated verification generating structured test assertions and failure diagnostics for skills

Research lineage: the EvoSkills pipeline (arXiv 2604.01687) handles offline co-evolutionary refinement. The immune skill together with armory's auto-memory system implements the stateful-prompt concept from Memento-Skills (arXiv 2603.18743) — the read-write reflective loop for continual learning without parameter updates.

Skills — Deprecated

Skills below are superseded by base model capabilities. They remain installable but receive no further updates.

SkillReason
doc-condenserBase model handles summarization natively
regex-builderBase model generates regex at equivalent quality
sequential-thinkingBase model handles chain-of-thought natively

Rules

RuleDescription
adaptive-thinking-controlPrompt-level control for Opus 4.7 adaptive thinking and effort-level trade-offs
commit-standardsConventional commit format, branch naming
intent-disciplineSurface assumptions, minimum-viable code, surgical diffs, verifiable success criteria
test-standardsCoverage thresholds, test quality requirements
security-standardsSecret management, input validation, auth
token-efficiencyToken-efficient tool usage patterns

Commands

CommandDescription
tddTest-driven development workflow
security-scanSecurity vulnerability audit
refactorCode simplification workflow
evolveCo-evolutionary skill generation
handoffRefresh or scaffold .docs/handoff.md
routePackage discovery and task-to-package routing
stack-prStacked PR workflow command surface

Hooks

HookDescription
git-protectionBlock dangerous git operations
pre-edit-backupBackup files before edits
cost-trackerLog session cost/token usage
anatomy-indexMaintain project file index with token estimates
read-dedupWarn on duplicate file reads within a session
prompt-contextInject text file as additionalContext on every prompt
handoff-on-stopRefresh .docs/handoff.md on Stop when present
simplify-ignoreCollapse protected code regions before agent reads
stack-guardAdd stacked-PR-specific git safety checks

Utilities

UtilityDescription
arxiv-searchSearch arXiv for papers, output structured JSON metadata
dependency-treeVisualize project dependency graph
test-coverage-reportCoverage summary for changed files

Presets

Presets install curated bundles of passive packages (rules, hooks, commands) in one command. For active workflow orchestration, use agents instead.

PresetPackagesDescription
core3 skills, 1 hook, 1 ruleBaseline review-commit lifecycle. Start here.
sec-strict5 skills, 3 agents, 2 rules, 2 hooks, 1 commandAudit-grade security stack with codebase-auditor. Superset of core.
python-strict4 skills, 2 agents, 3 rules, 2 hooks, 2 commandsFull Python enforcement — TDD, type checking, test coverage, security standards.
ai-builder6 skillsAI/ML development toolkit — agent building, prompt engineering, GPU optimization, RAG auditing.
skill-evolution6 skills, 1 agent, 1 commandEvoSkills pipeline — co-evolutionary skill factory with paper-to-skill, distillation, and verification.
terse-mode1 hookTerse output enforcement via prompt-context hook with compaction-immune rule injection.
session-continuity1 skill, 1 command, 1 hookAtomic handoff install: /handoff, greenfield scaffold, and opt-in Stop refresh gated by .docs/handoff.md.
stack-workflow3 skills, 1 command, 2 hooksStacked PR workflow stack with topology management, guard hooks, and review gates.

Deprecated Presets

Superseded by orchestrator agents that provide autonomous workflow orchestration instead of manual skill invocation.

PresetReplacement
biz-validationidea-scout agent
media-craftmedia-producer agent
content-opscontent-strategist agent
researchresearch-analyst agent
eng-opsrelease-captain + full-stack-builder agents

Installation

Option 1 — Skills CLI (recommended)

Install any package directly using npx skills:

# Install all packages
npx skills add Mathews-Tom/armory

# Install a specific skill or agent
npx skills add Mathews-Tom/armory -s architecture-reviewer
npx skills add Mathews-Tom/armory -s codebase-advisor
npx skills add Mathews-Tom/armory -s codebase-auditor

# List available packages without installing
npx skills add Mathews-Tom/armory -l

Option 2 — Profile installer

git clone https://github.com/Mathews-Tom/armory.git
cd armory

# Install by profile
just install-profile core
just install-profile python-strict

# Install by type
uv run scripts/install.py --type skills
uv run scripts/install.py --type agents

# Interactive TUI
uv run scripts/install.py

Displays a version-aware table of all packages, detects installed versions, and lets you select which to install or upgrade. Profiles install curated bundles of packages across all types.

Option 3 — Claude Code plugin marketplace (skills, agents, commands only)

claude plugin marketplace add Mathews-Tom/armory
/plugin install armory

This uses Claude Code's native plugin system and loads a subset of armory's catalog.

Package typeSupported via plugin marketplace
skills✅ yes
agents✅ yes
commands✅ yes
hooks❌ no — requires npx skills or the profile installer
rules❌ no — armory-specific type, not a Claude Code plugin concept
utilities❌ no — armory-specific type, not a Claude Code plugin concept
presets❌ no — use just install-profile instead

For the full catalog across all seven package types, use Option 1 (Skills CLI) or Option 2 (profile installer).

Option 4 — Manual

Clone the repo and symlink individual package folders:

git clone https://github.com/Mathews-Tom/armory.git

# Skills
ln -s "$(pwd)/armory/skills/architecture-reviewer" ~/.claude/skills/architecture-reviewer
ln -s "$(pwd)/armory/skills/codebase-advisor" ~/.claude/skills/codebase-advisor

# Agents
ln -s "$(pwd)/armory/agents/codebase-auditor" ~/.claude/agents/codebase-auditor

Or download .skill / .agent archives from the Releases page.


Usage

Packages activate when Claude detects a matching intent. Each package defines trigger phrases in its frontmatter description — check the definition file (SKILL.md, AGENT.md, etc.) in each folder.

Example triggers:

"Run a security audit before I push this to GitHub"
-> activates: repo-sentinel (skill)

"Review this code for quality issues"
-> activates: code-reviewer (agent)

"Evaluate the quality of this package"
-> activates: package-evaluator (skill)

Commands are invoked explicitly via slash syntax:

/tdd calculate_discount    -> TDD workflow for a function
/security-scan src/        -> security vulnerability audit
/refactor src/utils.py     -> code simplification

Hooks fire automatically on Claude Code lifecycle events (PreToolUse, PostToolUse, Stop, UserPromptSubmit). Rules load as context when relevant. Presets install bundles via just install-profile.


Package Quality

Every package is evaluated against 6 shared dimensions using the package-evaluator, with type-specific signals for agents, hooks, rules, commands, utilities, and presets:

DimensionWeightWhat it measures
Frontmatter Quality20%Description length, trigger phrases, "Use when" clause
Trigger Coverage18%Synonym breadth, implied contexts, interrogative forms
Structural Completeness20%Workflow, error handling, output format, type-specific metadata
Content Depth22%Decision frameworks, multi-step workflows, type-specific signals
Consistency & Integrity12%Name matching, file references, description-body alignment
CONTRIBUTING Compliance8%Naming conventions, length limits, YAML validity

Eval Coverage

Every package has eval cases in {type}/<name>/evals/cases.yaml — positive triggers (should activate) and negative triggers (should not). Deprecated packages enforce 0 positive + 2 negative cases.

Validation:

uv run scripts/validate_evals.py    # Schema validation for all eval files
uv run scripts/generate_manifest.py # Regenerate manifest.yaml

CI pipeline (.github/workflows/evals.yml):

  • PR gate: validates manifest sync + eval schema on every pull request across all 7 type directories
  • Weekly cron: Monday runs for model drift detection

Pre-commit hook: auto-regenerates manifest.yaml when any package definition file changes.


MCP Server

An MCP server exposes armory packages as discoverable tools for any agent session. Register in your Claude Code config:

{
  "mcpServers": {
    "armory": { "command": "uv", "args": ["run", "mcp/server.py"] }
  }
}

Available tools:

ToolDescription
search_packagesKeyword search with type, category, and tag filters
get_packageFull metadata for a single package by name
recommend_packagesContext-aware recommendations by language, framework, or task
list_categoriesAll categories with package counts

Spec Compliance

Skills are validated against the agentskills.io open standard:

uv run scripts/validate_agentskills.py           # Warnings only (default)
uv run scripts/validate_agentskills.py --strict   # Extra fields are errors

All 76 skills pass with 0 errors. The validator checks the 6-field frontmatter spec (name, description, license, compatibility, metadata, allowed-tools) and flags Claude Code-specific fields as warnings.


Packaging

Each package can be archived for distribution. Archive type is auto-detected from the directory:

uv run scripts/package.py skills/architecture-reviewer  # produces .skill
uv run scripts/package.py agents/code-reviewer           # produces .agent
uv run scripts/package.py hooks/git-protection            # produces .hook

Cross-Platform Adapters

Packages are authored as Claude Code-native definitions. The adapter generator transforms them into platform-specific formats for Cursor, OpenAI Codex, and Gemini CLI.

Generate

# All platforms
uv run scripts/generate_adapters.py

# Single platform
uv run scripts/generate_adapters.py --platform cursor
uv run scripts/generate_adapters.py --platform codex
uv run scripts/generate_adapters.py --platform gemini

# Filter by package type
uv run scripts/generate_adapters.py --platform cursor --type skills --type rules

# Preview without writing
uv run scripts/generate_adapters.py --dry-run

Output lands in adapters/{platform}/ (gitignored — generated, not source).

Platform Mapping

Armory TypeCursorCodexGemini
Skills.cursor/rules/{name}.mdcskills/AGENTS.md.gemini/skills/{name}/SKILL.md
Agents.cursor/rules/{name}.mdcagents/AGENTS.md.gemini/agents/{name}.md
Rules.cursor/rules/{name}.mdc (alwaysApply)standards/AGENTS.mdSections in GEMINI.md
Commands.cursor/commands/{name}.mdworkflows/AGENTS.md.gemini/commands/workflow/{name}.toml
Hooks
UtilitiesWrapped as .gemini/skills/
Presets

Quick Install (no Python required)

Download pre-built adapter packages from the latest release:

# Cursor
npx @anthropic-armory/installer --target cursor

# Codex
npx @anthropic-armory/installer --target codex

# Gemini
npx @anthropic-armory/installer --target gemini --dir /path/to/project

Or download directly from GitHub Releases:

# Cursor — extract .cursor/ into your project root
curl -sL https://github.com/Mathews-Tom/armory/releases/download/latest/armory-cursor.tar.gz | tar -xz

# Codex — extract AGENTS.md + subdirectories into project root
curl -sL https://github.com/Mathews-Tom/armory/releases/download/latest/armory-codex.tar.gz | tar -xz

# Gemini — extract .gemini/ into your project root
curl -sL https://github.com/Mathews-Tom/armory/releases/download/latest/armory-gemini.tar.gz | tar -xz

Install via Python (with TUI)

The Python installer supports all targets with --target:

uv run scripts/install.py --target cursor --project-dir /path/to/project
uv run scripts/install.py --target codex --project-dir /path/to/project
uv run scripts/install.py --target gemini --project-dir /path/to/project

Generate Locally

Generate adapter output from source (requires Python 3.12):

uv run scripts/generate_adapters.py --platform cursor
uv run scripts/generate_adapters.py --platform codex
uv run scripts/generate_adapters.py --platform gemini

Output lands in adapters/{platform}/ (gitignored — generated, not source).

Platform Details

Cursor: Rules with alwaysApply: true (project standards) load on every prompt. Skills and agents load when Cursor matches the description or glob pattern.

Codex: The root AGENTS.md is a condensed index under the 32 KiB budget. Full content is in subdirectory AGENTS.md files, loaded via Codex's hierarchical discovery.

Gemini: Skills are a near 1:1 copy (references, scripts, and assets included). Rules become sections in GEMINI.md. Commands are converted to TOML format.

What's Lost

Not all package types have equivalents on every platform:

  • Hooks have no equivalent on Cursor or Codex. Gemini has hooks but uses a different event model.
  • Presets require a dependency resolver that no target platform provides.
  • Utilities with executable scripts are skipped on Cursor and Codex (passive context only). Gemini wraps them as skills.

Contributing

See CONTRIBUTING.md for guidelines on submitting new packages or improving existing ones.

Looking for something to build? Check WANTED.md for missing skill domains, requested agents, and infrastructure improvements.


Contributors

See CONTRIBUTORS.md for the full list.


Attributions

See ATTRIBUTIONS.md for the full list of upstream libraries, tools, and projects that armory packages wrap, depend on, or were inspired by.


License

MIT. See LICENSE for details.


Migrated from praxis-skills. If you had skills installed from the previous repo, re-run the installer to update paths. Existing skills continue to work — the content is unchanged.