README.md

April 27, 2026 · View on GitHub

AgentLint

The linter for your agent harness.

ESLint was for the code humans wrote.
AgentLint is for the context agents read.

CI Release License: MIT Checks npm Claude Code

🌐 Site · Blog · Install · Demo · Harness 101 · Checks · Evidence · FAQ · 中文


Agent = Model + Harness. The model isn't the bottleneck anymore — the harness is.

Your AGENTS.md, CLAUDE.md, CI config, hooks, and .gitignore are the harness. When they're wrong, Claude Code, Cursor, and Codex ship AI slop. When they're right, agents compound.

AgentLint scores your harness across 51 deterministic checks on 6 core dimensions, plus 7 opt-in extended checks (Deep + Session) that use AI sub-agents and local Claude Code session logs when available. Evidence-backed. Zero opinions.

📚 Full docs, 20+ long-form guides, and the complete check catalog live at agentlint.app. Highlights: Writing a Good CLAUDE.md · The 33-Check Catalog · AGENTS.md vs CLAUDE.md · 中文博客.

Install

npm install -g agentlint-ai           # CLI only — no Claude plugin yet
npx agentlint-ai install              # opt-in: register /al Claude Code plugin

The first command installs the agentlint CLI on $PATH and does not touch ~/.claude/. The second command (one-time, opt-in) detects Claude Code, copies the /al slash command into ~/.claude/commands/, and registers the marketplace plugin. Side-effect details and uninstall path in INSTALL.md.

Then in any git repo:

agentlint check

In Claude Code (after running npx agentlint-ai install): run /al for the interactive scan-fix-report flow.

Using an AI coding agent? Point it at INSTALL.md — it's written to be read once and acted on.

What you get

$ /al

AgentLint — Score: 72/100 (core)

Findability      ██████████████░░░░░░  7/10
Instructions     ████████████████░░░░  8/10
Workability      ████████████░░░░░░░░  6/10
Safety           ██████████░░░░░░░░░░  5/10
Continuity       ██████████████░░░░░░  7/10
Harness          ████████████████████  10/10
Deep             ░░░░░░░░░░░░░░░░░░░░  n/a   (opt-in)
Session          ░░░░░░░░░░░░░░░░░░░░  n/a   (opt-in)

Fix Plan (7 items):
  [guided]   Pin 8 GitHub Actions to SHA (supply chain risk)
  [guided]   Add .env to .gitignore (AI exposes secrets)
  [assisted] Generate HANDOFF.md
  [guided]   Reduce IMPORTANT keywords (7 found, Anthropic uses 4)

Select items → AgentLint fixes → re-scores → saves HTML report

The harness problem

In February 2026, Mitchell Hashimoto (HashiCorp) coined the term. OpenAI's Ryan Lopopolo formalized it days later. LangChain's Vivek Trivedy gave it the cleanest definition:

Agent = Model + Harness. If you're not the model, you're the harness.

The harness is every piece of code, configuration, and instruction that wraps an LLM and turns it into an agent. For coding agents, your harness includes:

  • AGENTS.md / CLAUDE.md — the persistent rules injected at session start
  • .cursor/rules/, .github/copilot-instructions.md — tool-specific instruction layers
  • CI, pre-commit hooks, .gitignore — the deterministic constraints the agent can't override
  • SECURITY.md, changelogs, handoff notes — the context that survives across sessions

Harness engineering is the discipline of designing those pieces so the agent stays reliable across hundreds of tool calls, not just the first ten.

The research is blunt:

  • Anthropic's 2026 Agentic Coding Trends Report found that teams maintaining a good context file report 40% fewer "bad suggestion" sessions
  • DORA 2025 State of AI-Assisted Software Development reached the same conclusion: AI is an amplifier — it accelerates teams with good harnesses and amplifies dysfunction in teams without them
  • An ETH Zurich study found that auto-generated context files actually reduce agent success rates in 5 of 8 tested settings, and increase inference cost by 20–23%
  • A randomized controlled trial found developers using AI were 19% slower on complex tasks — while believing they were 20% faster
  • LangChain's February 2026 report: 70% of agent performance lives outside the model. Same weights, different harness, different results.

Translation: a bad harness is worse than no harness. And almost nobody knows what a good one looks like.

AgentLint is the first linter for the harness itself.

What makes AgentLint different

Every check is backed by data, not opinions. The data comes from places most developers never look — and it's what lets us measure harness health rigorously:

  • 265 versions of Anthropic's own Claude Code system prompt — we tracked every single word they added, deleted, and rewrote. When they cut IMPORTANT from 12 uses to 4, we knew. When they removed every "You are a helpful assistant..." identity section, we knew.
  • Claude Code source code — which is where the harness hard limits live. 40,000-character entry files get silently truncated. 256 KB files can't be read at all. Pre-commit hooks that take too long cause commits to hang forever because Claude Code never uses --no-verify.
  • Real production audits across open-source codebases — the security gaps that agents walk straight into.
  • 6 academic papers on instruction compliance, context-file effectiveness, and documentation decay.

If a check can't cite a source, it doesn't ship.

What it checks

58 checks total: 51 deterministic core checks across 6 dimensions (always run), plus 7 opt-in extended checks (Deep: 3 AI-powered analysis checks; Session: 4 Claude Code log-reading checks). Default agentlint check and the GitHub Action only run the 51 core checks — the extended ones need AI sub-agents or local Claude Code session logs, so they're opt-in via /al inside Claude Code.

The total score is averaged only over dimensions that actually ran. A default CI run shows Score: NN/100 (core) and marks Deep/Session as n/a, never as 0/10. When extended checks do run, the header shows (core+extended).

🔍 Findability — can AI find what it needs? (20%)

CheckWhatWhy
F1Entry file existsNo CLAUDE.md / AGENTS.md = AI starts blind
F2Project description in first 10 linesAI needs context before rules
F3Conditional loading guidance"If working on X, read Y" prevents context bloat
F4Large directories have INDEX>10 files without index = AI reads everything
F5All references resolveBroken links waste tokens on dead-end reads
F6Standard file namingREADME.md, CLAUDE.md are auto-discovered
F7@include directives resolveMissing targets are silently ignored — you think it's loaded, it isn't
F8Rule file frontmatter uses globs.cursor/rules/ MDC files should match glob patterns, not exact paths
F9No unfilled template placeholders{{variables}} left in context files waste tokens and confuse the model

📝 Instructions — are your rules well-written? (25% — highest weight)

CheckWhatWhy
I1Emphasis keyword countAnthropic cut IMPORTANT from 12 to 4 across 265 versions
I2Keyword densityMore emphasis = less compliance. Anthropic: 7.5 → 1.4 per 1K words
I3Rule specificity"Don't X. Instead Y. Because Z." — Anthropic's golden formula
I4Action-oriented headingsAnthropic deleted all "You are a..." identity sections
I5No identity language"Follow conventions" removed — model already does this
I6Entry file length60–120 lines is the sweet spot. Longer dilutes priority
I7Under 40,000 charactersClaude Code hard limit. Above this, your file is truncated — silently
I8Total injected content within budgetAll auto-injected files stay within the 200K context budget

🔨 Workability — can AI build and test? (18%)

CheckWhatWhy
W1Build/test commands documentedAI can't guess your test runner
W2CI existsRules without enforcement are suggestions
W3Tests exist (not empty shell)A CI that runs pytest with 0 test files always "passes"
W4Linter configuredMechanical formatting frees AI from guessing style
W5No files over 256 KBClaude Code cannot read them — hard error
W6Pre-commit hooks are fastClaude Code never uses --no-verify. Slow hooks = stuck commits
W7Local fast test command documentedEntry file documents a fast (<30s) test command for mid-session verification
W8npm test script existsJS/Node repos need npm test so AI can run tests without guessing
W9Release workflow validates version consistencyAutomated drift detection across package.json, CHANGELOG, and badges
W10Test cost tiers defined (pytest markers)@pytest.mark.fast lets AI run the cheap subset, not the full 10-minute suite
W11feat/fix commits paired with test commitsGate that catches features landing without corresponding tests

🔄 Continuity — can the next session pick up? (12%)

CheckWhatWhy
C1Document freshnessStale instructions are worse than no instructions
C2Handoff file existsWithout it, every session starts from zero
C3Changelog has "why""Updated INDEX" says nothing. "Fixed broken path" says everything
C4Plans in repoPlans in Jira don't exist for AI
C5CLAUDE.local.md not in gitPrivate per-user file — must be in .gitignore
C6HANDOFF.md has verify conditionsNotes with evidence (score ≥ X, tests pass) let the next session skip full re-audit

🔒 Safety — is AI working securely? (15%)

CheckWhatWhy
S1.env in .gitignoreAI's Glob tool ignores .gitignore by default — secrets visible
S2Actions SHA pinnedAI push triggers CI. Floating tags = supply chain attack vector
S3Secret scanning configuredAI won't self-check for accidentally written API keys
S4SECURITY.md existsAI needs security context for sensitive code decisions
S5Workflow permissions minimizedAI-triggered workflows shouldn't have write access by default
S6No hardcoded secretsDetects sk-, ghp_, AKIA, private key patterns in source
S7No personal paths in sourceAbsolute home-dir paths leak machine identity and break on other machines
S8No pull_request_target triggerRuns in privileged context — supply chain attack vector for external PRs
S9No personal email in git historyPersonal email in commits is a privacy and identity leak

⚙️ Harness — is your Claude Code setup correct? (10%)

CheckWhatWhy
H1Hook event names validPoToolUse vs PostToolUse — typos silently prevent hooks from ever firing
H2PreToolUse hooks have matcherWithout a tool matcher, the hook runs before every tool call
H3Stop hook has circuit breakerStop hooks without an exit condition run forever
H4No dangerous auto-approve* or .* grant unlimited tool execution with no human check
H5Env deny coverage completeMissing deny patterns let secrets leak to untrusted tools
H6Hook scripts network accessOutbound calls from hooks can exfiltrate data triggered by the agent
H7Gate workflows are blockingWarn-only CI gates are effectively disabled — agents merge despite failures
H8Hook errors use structured formatwhat/rule/fix lets the agent self-correct; unstructured errors leave it stuck

🧠 Deep — AI-powered instruction analysis (opt-in, extended)

Spawns AI subagents to find what pattern-matching can't:

CheckWhatWhy
D1Contradictory rulesTwo rules that conflict cause the model to pick one — usually the wrong one
D2Dead-weight rulesRules the model would follow anyway waste tokens and dilute priority
D3Vague rules without decision boundary"Use good judgment" gives the model nothing to evaluate against

📊 Session — learn from your Claude Code logs (opt-in, extended)

Reads your session history to surface patterns you'd never notice manually:

CheckWhatWhy
SS1Repeated instructionsInstructions you type every session belong in CLAUDE.md
SS2Ignored rulesRules AI keeps bypassing need rewriting, not repeating
SS3Friction hotspotsWhich projects and tasks generate the most re-work
SS4Missing rule suggestionsCommon corrections that aren't captured anywhere yet

How is this different from /init?

/init generates a template CLAUDE.md from scratch. Useful on day one. Useless on day fifty — when the file is stale, bloated with emphasis keywords the model ignores, missing .env in .gitignore, and silently exceeds the 40K hard limit.

/init writes a file. AgentLint audits the whole system:

/initAgentLint
Generates template CLAUDE.md
Checks entry-file quality
Finds broken @include references
Enforces the 40K character hard limit
Audits CI, hooks, .gitignore, Actions SHA pinning
Detects instruction rot over time
Audits Claude Code hook configuration
Auto-fixes what it can
Every check backed by a cited data source

Who this is for

  • Solo developers using Claude Code, Cursor, or Codex who want the agent to stop ignoring your rules
  • Team leads who need every repo in the org to be AI-ready before agents ship to prod
  • OSS maintainers whose external contributors (and their agents) should write code in your style
  • Security-conscious engineers worried about agents exfiltrating .env or triggering vulnerable workflows

Compatibility

AgentLint ships as a Claude Code plugin and standalone CLI. When it runs, it audits any of the following if present in your repo:

  • CLAUDE.md (Anthropic's Claude Code)
  • AGENTS.md (the universal standard — used by OpenAI Codex, Cursor, Windsurf, Kilo, GitHub Copilot, Gemini CLI, and 60,000+ open-source repos)
  • .cursor/rules/
  • .github/copilot-instructions.md

Roadmap: native Cursor and Codex integrations. Star the repo to follow.

Update

npm install -g agentlint-ai

Or update the Claude Code plugin directly:

claude plugin update agent-lint@agent-lint

Evidence

Every check cites its source. No opinions, no best practices — data.

SourceType
Anthropic 265 prompt versionsPrimary dataset
Claude Code source codeHard limits and internal behavior
IFScale (NeurIPS)Instruction compliance at scale
ETH ZurichDo context files help coding agents?
Codified ContextStale content as #1 failure mode
Agent READMEsConcrete vs abstract effectiveness

Full citations in standards/evidence.json.

FAQ

What exactly is an "agent harness"?

The term got popular in early 2026 (Mitchell Hashimoto, OpenAI, LangChain). Shortest definition: Agent = Model + Harness. The harness is everything that wraps an LLM and turns it into an agent — tools, state management, feedback loops, and the persistent rules it reads at session start. For coding agents, that last part is your AGENTS.md, CLAUDE.md, .cursor/rules, CI, pre-commit hooks, and .gitignore. AgentLint is the first linter built specifically to audit that layer.

Why not just use /init and call it a day?

See the table above. /init writes a file; it doesn't audit your repo. AgentLint does 51 deterministic checks across 6 core dimensions (plus 7 opt-in extended checks) — and fixes what it finds.

Does this work with Cursor, Codex, or GitHub Copilot?

Today AgentLint runs inside Claude Code, but the checks apply to repo assets every agent reads: AGENTS.md, .cursor/rules, .github/copilot-instructions.md. A well-linted repo makes every agent better, not just Claude. Native Cursor and Codex integrations are on the roadmap.

Is my code sent anywhere?

It depends on which mode you run. The default (agentlint check and the GitHub Action) is local-only and runs zero AI. The two opt-in extended modes do touch AI or local session logs — we spell it out so there's no surprise:

ModeData accessedNetwork / AI
agentlint check (default)files in the repo being scannedLocal only, no AI
GitHub Actionfiles in the checked-out repo inside the runnerLocal only, no AI
/al (core dims only)git repos under the configured PROJECTS_ROOTLocal only, no AI
/al with Deep (opt-in)selected entry files (e.g. CLAUDE.md)Sends file contents to a Claude sub-agent
/al with Session (opt-in)~/.claude/projects/ logs on your machineLocal analyzer. Output is redacted by default; raw snippets require --include-raw-snippets

Deep is the only mode that transmits file contents off your machine, and it only runs when you explicitly ask for it inside Claude Code. Everything the default scan produces — the Score: NN/100 (core) output, the JSONL, the SARIF, the GitHub Action annotations — comes from pattern checks on disk, no API calls.

Does npm install write outside node_modules?

No. npm install -g agentlint-ai only installs the agentlint CLI to npm's global prefix (just like any other CLI tool). The Claude Code plugin install is opt-in: run npx agentlint-ai install (one-time) to detect Claude Code and register the /al slash command in ~/.claude/commands/. The CLI works without that step; the /al slash command does not.

Failure-mode fallbacks live in INSTALL.md.

Isn't this just "best practices"?

No. Every check cites a specific source — Anthropic's 265 prompt versions, Claude Code source code, peer-reviewed papers, or real production audits. If a check can't be backed by data, it doesn't ship.

Why do you lint AGENTS.md if this is a Claude Code plugin?

Because good context engineering is cross-tool. If you're using any combination of Claude Code, Cursor, and Codex, the same AGENTS.md serves all of them. AgentLint checks it against the same evidence base regardless of which agent ends up reading it.

How long does a scan take?

Under 5 seconds for most repos. The Deep and Session dimensions take longer because they spawn subagents or read session logs.

Requirements

  • Node 20+
  • jq
  • Claude Code (for /al plugin and Deep/Session analysis)

Contributing

Issues and PRs welcome. See CONTRIBUTING.md.

License

MIT


If AgentLint saved you from one bad agent session, please ⭐ star the repo — it's how we find out it's useful.

Built by @0xmariowu · agentlint.app