Context Development Life Cycle

July 2, 2026 · View on GitHub

TL;DR: AgentOps is an SDLC control plane for agentic software development. Its internal mechanism is the Context Development Life Cycle (CDLC): every phase of software delivery has a context counterpart, and every high-value context token follows the Context Density Rule: carry intent, boundary, evidence, decision, constraint, or next action.

Software engineering took 50 years to build the discipline that turned indeterministic teams into shippable software. AgentOps keeps the public category language people already understand - SDLC, DevOps, CI/CD, tests, review, release gates - and applies that same shape to context.

Packets, briefings, skills, verdicts, and learnings are artifacts. The deeper product is the practice layer: BDD/Gherkin, DDD, hexagonal architecture, TDD, CI/CD, SRE, ADRs, wikis, Agile/XP, and pragmatic engineering encoded into the runtime structure agents work inside.

The translation is direct. Each piece of the software-engineering stack has a coding-agent counterpart:

Software Engineering	Coding-Agent World
Source code	Context (corpus, planning rules, learnings)
SDLC	Context Development Life Cycle
Libraries (Maven, npm, crates.io)	Context libraries (the `.agents/` corpus)
Compilers	Context compilers (`ao compile` → wiki)
Code review	Multi-model councils
CI/CD	Validation gates (`/vibe`, `/pre-mortem`)
Postmortems	Automated postmortems (`/post-mortem` → learnings)
Runbooks	Skills + planning rules
Software factories	The in-session loop (`ao rpi`, `/evolve`) run out of session on an orchestration substrate (reference: NTM + MCP + managed-agents)
Markdown / Git / Linux (open primitives)	LLM Wiki of Markdown
Open-source corpus	Your private corpus (`.agents/` in your repo)

We call the internal lifecycle the Context Development Life Cycle (CDLC). You do not need to know that name to understand the product: AgentOps is the SDLC control plane, and CDLC is how it compiles context for agent work.

Companion docs

Operating loop — the operational discipline that runs inside these phases: BDD intent → vertical slices → conflict-free wave → bead acceptance → evidence. The CDLC describes the seven phases of context engineering; the operating loop describes how an agent actually executes work through them.
Wiki for agents — what .agents/ actually is and why agents can read it natively
Trust factory — how the validation gates and councils make agent output trustworthy

The Parallel

In 2009, DevOps asked: what if ops looked more like dev? The answer was CI/CD, infrastructure as code, and the SDLC infinity loop — Plan, Code, Build, Test, Release, Deploy, Operate, Monitor.

Inside an agentic SDLC, the CDLC asks the same question about context: what if the instructions, knowledge, and constraints we feed to coding agents were engineered with the same rigor as the code they produce?

The answer is the same shape. Different substrate.

     SDLC (code)                    CDLC (context)
    ┌──────────┐                   ┌──────────┐
    │   Plan   │                   │ Generate │
    │   Code   │                   │ Compile  │
    │  Build   │                   │   Test   │
    │   Test   │                   │Distribute│
    │ Release  │                   │ Deliver  │
    │  Deploy  │                   │ Observe  │
    │ Operate  │                   │  Adapt   │
    │ Monitor  │                   │          │
    └──────────┘                   └──────────┘
         ↕                              ↕
    infinity loop                  infinity loop

The SDLC produces deployable artifacts. The CDLC produces injectable context for the agents doing that work. Both compound through feedback loops. Both degrade without discipline.

The Narrow Waist

The CDLC has a narrow waist because LLM agents do not have infinite context:

historical software-engineering practice
        ↓
agent-context-limited constraint
        ↓
small verifiable slices
        ↓
dense intent + executable evidence
        ↓
less rediscovery, less drift, less hallucinated done

Four practices carry the highest density:

Practice	CDLC role
BDD / Gherkin	States what behavior matters in observable terms
DDD	Gives humans and agents shared names, aggregates, and bounded contexts
Hexagonal architecture	Keeps tools, model runtimes, and vendor adapters outside the core loop
TDD	Gives the agent an executable local done condition

Everything else plugs into that waist. CI/CD runs the proof repeatedly. SRE/DORA measures health. ADRs and provenance explain why decisions happened. Wikis and ratchets keep knowledge durable. Agile/XP keeps work in small vertical increments. Pragmatic engineering keeps the slice evidence-bearing and reversible.

The density invariant has a domain name: Context Density Rule. The domain entry lives at skills/domain/references/context-density-rule.md.

That is why waterfall is the wrong shape here. It spends context on large speculative artifacts before proof exists. CDLC prefers atomic process: one behavior, one bounded context, one first failing test, one write scope, one acceptance proof, and one learning only when it changes future behavior.

The Seven Phases

1. Generate

Create the context that agents will consume. Prompts, skills, instructions, specifications.

SDLC parallel	Plan + Code
What it means	Author skills, write agent.md instructions, pull documentation, create specs
Why it matters	Context that isn't created doesn't exist. Agents start from zero without it.

AgentOps implementation:

/research — investigate before writing context
/plan — decompose goals into structured implementation specs
SKILL.md authoring — reusable context packages with triggers, steps, and output contracts
ao context assemble — request skill- or phase-scoped context explicitly
MCP integrations — pull context from GitLab, GitHub, Slack, tickets

The generation phase is where most teams stop. They write a Claude.md, maybe a few rules, and call it done. CDLC says generation is one-seventh of the work.

2. Compile

Assemble raw context into phase-appropriate, role-scoped, freshness-weighted packets.

SDLC parallel	Build
What it means	Select, rank, trim, and package context for the current task
Why it matters	Raw context is too large, too stale, or too broad. Compilation makes it precise.

AgentOps implementation:

ao context assemble — build phase-scoped context packets
ao lookup — retrieve decay-ranked learnings on demand
ao inject — deprecated compatibility adapter for legacy retrieval paths
ao compile — rebuild the derived knowledge wiki (Mine → Grow → Defrag → Lint)
ao maturity --expire/--evict — remove stale context before it pollutes the window
Finding compiler — distill raw findings into prevention rules

This is the phase that separates a context compiler from a prompt builder. A prompt builder concatenates. A compiler selects, ranks, trims, and delivers the minimum viable context for the current phase.

3. Test

Validate that context produces the intended agent behavior.

SDLC parallel	Test
What it means	Run evals on context: does SKILL.md X produce behavior Y?
Why it matters	You change two lines in your Claude.md. Do you know the impact?

AgentOps implementation:

/pre-mortem — validate plans before implementation (LLM-as-judge)
/vibe — validate code after implementation (multi-model consensus)
/council — multi-judge adversarial review
ao eval run — deterministic eval suites with scoring dimensions
context_comprehension dimension — structural quality assessment of SKILL.md files
Baseline A/B — skill-on vs skill-off delta measurement

Testing context is fundamentally different from testing code. Evals are non-deterministic. You run them five times and measure pass rate. Error budgets replace pass/fail. This is the hardest phase to get right, and the one most teams skip entirely.

4. Distribute

Package and share context across projects, teams, and runtimes.

SDLC parallel	Release
What it means	Version context, resolve dependencies, publish to registries
Why it matters	Context that lives in one person's head (or one repo's Claude.md) doesn't scale.

AgentOps implementation:

Skills registry — 170+ skills as distributable context packages
/converter — export skills to Cursor rules, Codex format, OpenCode config
ao compile — package the knowledge wiki for distribution
Cross-runtime compatibility — same skills target Claude Code, Codex CLI, Cursor, and OpenCode
install.sh — one-line installation of the full context package

Distribution is where context becomes an organizational asset. One team fixes a testing pattern, packages it as a skill, and every other team gets the fix on next install.

5. Deliver

Inject the right context into the right session at the right time.

SDLC parallel	Deploy
What it means	Load context into the agent's window at session start
Why it matters	A compiled context packet is worthless if it doesn't reach the agent.

AgentOps implementation:

Explicit context packets — deliver the assembled phase context to the agent
Optional SessionStart hooks — runtime adapter profile, not the default path
ao lookup — on-demand knowledge search during a session
SkillLoadEvent — track which skills were loaded (citation pipeline)
Phase-scoped delivery — /research gets different context than /implement

Delivery is the moment where compilation meets the session. Right context, right window, right time. Phase-specific. Role-scoped. Freshness-weighted.

6. Observe

Monitor whether delivered context produces good outcomes.

SDLC parallel	Operate + Monitor
What it means	Track agent behavior, capture correction signals, measure session outcomes
Why it matters	Without observation, context quality is a guess.

AgentOps implementation:

quality-signals.sh — detect user corrections and repeated prompts in real time
SkillLoadEvent + session-outcome — link "what was loaded" to "how it went"
Citation tracking — .agents/ao/citations.jsonl records every artifact retrieval
Context monitor — track context window usage and budget
ao session-outcome — compute session reward signal from transcript patterns

Observation is the phase that closes the gap between "we shipped context" and "the context worked." Every PR rejection is feedback on context. Every user correction is a signal. Every production failure in generated code traces back to missing context.

7. Adapt

Feed observations back into context improvement. Close the loop.

SDLC parallel	Feedback → Plan (restart)
What it means	Use session outcomes to improve context for next session
Why it matters	Without adaptation, the same context produces the same mistakes forever.

AgentOps implementation:

MemRL feedback — cited artifacts receive session reward, updating utility scores
Quality-signal → flywheel wiring — user corrections reduce skill utility
ao forge transcript — extract learnings from completed sessions
ao flywheel close-loop — score, promote, and curate extracted knowledge
/evolve — autonomous reconciliation loop that fixes the worst fitness gap
/dream — overnight compounding that runs the full adapt cycle unattended

Adaptation is where the CDLC becomes a flywheel. Each session's outcomes improve the next session's context. Knowledge that works gets promoted. Knowledge that fails gets demoted. The system compounds.

SDLC → CDLC Mapping Table

SDLC Phase	CDLC Phase	Key Question	AgentOps Surface
Plan	Generate	What context should exist?	`/research`, `/plan`, SKILL.md
Code + Build	Compile	How is context assembled for this task?	`ao context assemble`, `ao lookup`, `ao compile`
Test	Test	Does this context produce the right behavior?	`/pre-mortem`, `/vibe`, `ao eval run`
Release	Distribute	How do others get this context?	Skills registry, `/converter`, `install.sh`
Deploy	Deliver	Did the right context reach the agent?	Explicit phase packets, optional `SessionStart` hooks, SkillLoadEvent
Operate	Observe	Is the context working in practice?	`quality-signals.sh`, citation tracking, session-outcome
Monitor → Plan	Adapt	What should change for next time?	MemRL feedback, `/curate --mode=forge`, `/evolve`, `/dream`

Operating loop within the phases

The seven phases describe what context engineering is. The operating loop describes how an agent executes work through them. They are not the same artifact.

A single turn of the operating loop touches every CDLC phase:

BDD-shaped intent issue            ← Generate (the intent is the spec; phase 1)
  → vertical slices                ← Compile (one slice per Given/When/Then; phase 2)
  → TDD per slice                  ← Test (first failing test before code; phase 3)
  → conflict-free parallel wave    ← Distribute + Deliver (workers receive scoped context; phases 4–5)
  → integrated bead completion     ← Observe (acceptance examples must pass; phase 6)
  → evidence + learning capture    ← Adapt (ratcheted promotion into the next loop turn; phase 7)

The loop is the unit of work that compounds. The phases are the layers it travels through. Every process skill in this repo (/discovery, /plan, /implement, /crank, /validation, /council, /pre-mortem, /vibe, /post-mortem, /curate --mode=forge, /retro) is one move in that loop, with the upstream artifact contracts and downstream evidence requirements pinned to the loop position — not to a free-floating phase number.

Canonical reference: Operating loop. Doctrine source: .agents/research/2026-05-15-cdlc-dojo-doctrine.md. Fitness gate: GOALS.md Directive #12.

The Leverage Hierarchy

Not all phases are equal. Donella Meadows ranked twelve places to intervene in a system, from weakest (#12: tweak a number) to strongest (#1: change the paradigm). The CDLC phases climb that ladder.

Leverage	Meadows Point	CDLC Phase	What It Means
Low	#12–#10: Parameters, buffers, structure	Generate	Writing a better prompt helps, but it's the lowest-leverage thing you can do. Most teams stop here.
Medium	#9–#8: Delays, balancing feedback	Compile, Test	Assembling the right context and validating it before delivery. Feedback loops that catch errors.
Threshold	#6: Information flows	Distribute, Deliver	Making context available where it's needed. The point where individual effort becomes organizational capability.
High	#5: Rules	Observe	Measuring what actually happens. Rules that govern what gets promoted, demoted, or discarded.
Highest	#4–#3: Self-organization, goals	Adapt	The system improves itself. Learnings promote automatically. Goals reconcile. The flywheel compounds without human intervention.

The pattern: the phases most teams skip are the ones Meadows says matter most. Writing a prompt is #12. Building a system that improves its own context based on what it observes is #4. That's an 8-level leverage gap.

Full leverage-point mapping: docs/leverage-points.md. Convergence map tying each CDLC phase to all five theoretical pillars: docs/the-science.md.

How the 12 Factors Build the Flywheel

The 12-factor doctrine is a build order — four tiers that construct the compounding product loop in sequence. The flywheel emerges when bookkeeping, context compilation, validation gates, and learning loops are running together.

Tier	Factors	Product Layer	What It Builds	Theory
Foundation (I–IV)	Context Is Everything, Track in Git, One Agent One Job, Research First	Context Compiler	The substrate — context exists, is versioned, is scoped, is researched	Cognitive science (40% load, lost-in-middle). Meadows #12–#6.
Flow (V–VI)	Validate Externally, Lock Progress Forward	Validation Gates	The filter — bad context gets caught, good context can't regress	Brownian Ratchet (chaos + filter + one-way gate). Meadows #8–#7.
Knowledge (VII–IX)	Extract Learnings, Compound Knowledge, Measure What Matters	Knowledge Flywheel	The engine — learnings extract, score, promote, inject. The loop closes.	MemRL (Zhang 2025). Self-organization (Meadows #4). Escape velocity: σ×ρ > δ.
Scale (X–XII)	Isolate Workers, Supervise Hierarchically, Harvest Failures	Infrastructure	The multiplier — all three layers across parallel agents. Failure becomes fuel.	Control theory (K8s reconciliation). SRE (SLOs + error budgets).

The flywheel doesn't exist until the Knowledge tier kicks in — but it can't function without the layers beneath it. Factor VIII (Compound Knowledge) is the climax: the moment the loop closes and starts compounding. Everything before it is setup. Everything after it is scale.

The theoretical threads

Each tier draws from a different body of theory:

Cognitive science (Sweller 1988, Liu 2023) constrains the Foundation: the 40% load rule, lost-in-middle attention mechanics, buffer-sizing. Without these constraints, you could dump everything into the window. You can't.
The Brownian Ratchet operates in the Flow tier: agents produce noisy output. Validation gates are the filter. The ratchet (Factor VI) is the one-way gate. Chaos + filter + gate = net forward progress.
MemRL (Zhang 2025) drives the Knowledge tier: reinforcement learning on episodic memory. Citation events become training signals. Utility scores update. The flywheel has its own learning algorithm.
Control theory enables the Scale tier: declared state (GOALS.md) + reconcile loop (/evolve) + error budgets (fitness gates). The system continuously reconciles actual state to desired state.
Systems dynamics (Meadows 2008) provides the leverage hierarchy: Foundation is necessary infrastructure (#12–#10), Flow adds feedback (#8–#7), Knowledge reaches self-organization (#4–#3). The highest-leverage phases are the ones most teams never build.

Full convergence map tying each CDLC phase to all five threads: The Science — Part 6.

Why This Matters

LLMs are engines. Context is fuel. You can't tune the engine — that's the model vendor's job. But you can engineer the fuel. AgentOps is the SDLC control plane; the CDLC is how it engineers the fuel.

DevOps proved that disciplined systems around indeterministic workers (humans) produce reliable output. SRE proved it again with SLOs and error budgets. Kubernetes proved it for infrastructure with control loops.

CDLC is the same proof for coding agents. The model stays the same. The context compounds. The system gets better with each use.