Context Development Life Cycle

July 2, 2026 · View on GitHub

TL;DR: AgentOps is an SDLC control plane for agentic software development. Its internal mechanism is the Context Development Life Cycle (CDLC): every phase of software delivery has a context counterpart, and every high-value context token follows the Context Density Rule: carry intent, boundary, evidence, decision, constraint, or next action.

Software engineering took 50 years to build the discipline that turned indeterministic teams into shippable software. AgentOps keeps the public category language people already understand - SDLC, DevOps, CI/CD, tests, review, release gates - and applies that same shape to context.

Packets, briefings, skills, verdicts, and learnings are artifacts. The deeper product is the practice layer: BDD/Gherkin, DDD, hexagonal architecture, TDD, CI/CD, SRE, ADRs, wikis, Agile/XP, and pragmatic engineering encoded into the runtime structure agents work inside.

The translation is direct. Each piece of the software-engineering stack has a coding-agent counterpart:

Software EngineeringCoding-Agent World
Source codeContext (corpus, planning rules, learnings)
SDLCContext Development Life Cycle
Libraries (Maven, npm, crates.io)Context libraries (the .agents/ corpus)
CompilersContext compilers (ao compile → wiki)
Code reviewMulti-model councils
CI/CDValidation gates (/vibe, /pre-mortem)
PostmortemsAutomated postmortems (/post-mortem → learnings)
RunbooksSkills + planning rules
Software factoriesThe in-session loop (ao rpi, /evolve) run out of session on an orchestration substrate (reference: NTM + MCP + managed-agents)
Markdown / Git / Linux (open primitives)LLM Wiki of Markdown
Open-source corpusYour private corpus (.agents/ in your repo)

We call the internal lifecycle the Context Development Life Cycle (CDLC). You do not need to know that name to understand the product: AgentOps is the SDLC control plane, and CDLC is how it compiles context for agent work.

Companion docs

  • Operating loop — the operational discipline that runs inside these phases: BDD intent → vertical slices → conflict-free wave → bead acceptance → evidence. The CDLC describes the seven phases of context engineering; the operating loop describes how an agent actually executes work through them.
  • Wiki for agents — what .agents/ actually is and why agents can read it natively
  • Trust factory — how the validation gates and councils make agent output trustworthy

The Parallel

In 2009, DevOps asked: what if ops looked more like dev? The answer was CI/CD, infrastructure as code, and the SDLC infinity loop — Plan, Code, Build, Test, Release, Deploy, Operate, Monitor.

Inside an agentic SDLC, the CDLC asks the same question about context: what if the instructions, knowledge, and constraints we feed to coding agents were engineered with the same rigor as the code they produce?

The answer is the same shape. Different substrate.

     SDLC (code)                    CDLC (context)
    ┌──────────┐                   ┌──────────┐
    │   Plan   │                   │ Generate │
    │   Code   │                   │ Compile  │
    │  Build   │                   │   Test   │
    │   Test   │                   │Distribute│
    │ Release  │                   │ Deliver  │
    │  Deploy  │                   │ Observe  │
    │ Operate  │                   │  Adapt   │
    │ Monitor  │                   │          │
    └──────────┘                   └──────────┘
         ↕                              ↕
    infinity loop                  infinity loop

The SDLC produces deployable artifacts. The CDLC produces injectable context for the agents doing that work. Both compound through feedback loops. Both degrade without discipline.


The Narrow Waist

The CDLC has a narrow waist because LLM agents do not have infinite context:

historical software-engineering practice

agent-context-limited constraint

small verifiable slices

dense intent + executable evidence

less rediscovery, less drift, less hallucinated done

Four practices carry the highest density:

PracticeCDLC role
BDD / GherkinStates what behavior matters in observable terms
DDDGives humans and agents shared names, aggregates, and bounded contexts
Hexagonal architectureKeeps tools, model runtimes, and vendor adapters outside the core loop
TDDGives the agent an executable local done condition

Everything else plugs into that waist. CI/CD runs the proof repeatedly. SRE/DORA measures health. ADRs and provenance explain why decisions happened. Wikis and ratchets keep knowledge durable. Agile/XP keeps work in small vertical increments. Pragmatic engineering keeps the slice evidence-bearing and reversible.

The density invariant has a domain name: Context Density Rule. The domain entry lives at skills/domain/references/context-density-rule.md.

That is why waterfall is the wrong shape here. It spends context on large speculative artifacts before proof exists. CDLC prefers atomic process: one behavior, one bounded context, one first failing test, one write scope, one acceptance proof, and one learning only when it changes future behavior.


The Seven Phases

1. Generate

Create the context that agents will consume. Prompts, skills, instructions, specifications.

SDLC parallelPlan + Code
What it meansAuthor skills, write agent.md instructions, pull documentation, create specs
Why it mattersContext that isn't created doesn't exist. Agents start from zero without it.

AgentOps implementation:

  • /research — investigate before writing context
  • /plan — decompose goals into structured implementation specs
  • SKILL.md authoring — reusable context packages with triggers, steps, and output contracts
  • ao context assemble — request skill- or phase-scoped context explicitly
  • MCP integrations — pull context from GitLab, GitHub, Slack, tickets

The generation phase is where most teams stop. They write a Claude.md, maybe a few rules, and call it done. CDLC says generation is one-seventh of the work.

2. Compile

Assemble raw context into phase-appropriate, role-scoped, freshness-weighted packets.

SDLC parallelBuild
What it meansSelect, rank, trim, and package context for the current task
Why it mattersRaw context is too large, too stale, or too broad. Compilation makes it precise.

AgentOps implementation:

  • ao context assemble — build phase-scoped context packets
  • ao lookup — retrieve decay-ranked learnings on demand
  • ao inject — deprecated compatibility adapter for legacy retrieval paths
  • ao compile — rebuild the derived knowledge wiki (Mine → Grow → Defrag → Lint)
  • ao maturity --expire/--evict — remove stale context before it pollutes the window
  • Finding compiler — distill raw findings into prevention rules

This is the phase that separates a context compiler from a prompt builder. A prompt builder concatenates. A compiler selects, ranks, trims, and delivers the minimum viable context for the current phase.

3. Test

Validate that context produces the intended agent behavior.

SDLC parallelTest
What it meansRun evals on context: does SKILL.md X produce behavior Y?
Why it mattersYou change two lines in your Claude.md. Do you know the impact?

AgentOps implementation:

  • /pre-mortem — validate plans before implementation (LLM-as-judge)
  • /vibe — validate code after implementation (multi-model consensus)
  • /council — multi-judge adversarial review
  • ao eval run — deterministic eval suites with scoring dimensions
  • context_comprehension dimension — structural quality assessment of SKILL.md files
  • Baseline A/B — skill-on vs skill-off delta measurement

Testing context is fundamentally different from testing code. Evals are non-deterministic. You run them five times and measure pass rate. Error budgets replace pass/fail. This is the hardest phase to get right, and the one most teams skip entirely.

4. Distribute

Package and share context across projects, teams, and runtimes.

SDLC parallelRelease
What it meansVersion context, resolve dependencies, publish to registries
Why it mattersContext that lives in one person's head (or one repo's Claude.md) doesn't scale.

AgentOps implementation:

  • Skills registry — 170+ skills as distributable context packages
  • /converter — export skills to Cursor rules, Codex format, OpenCode config
  • ao compile — package the knowledge wiki for distribution
  • Cross-runtime compatibility — same skills target Claude Code, Codex CLI, Cursor, and OpenCode
  • install.sh — one-line installation of the full context package

Distribution is where context becomes an organizational asset. One team fixes a testing pattern, packages it as a skill, and every other team gets the fix on next install.

5. Deliver

Inject the right context into the right session at the right time.

SDLC parallelDeploy
What it meansLoad context into the agent's window at session start
Why it mattersA compiled context packet is worthless if it doesn't reach the agent.

AgentOps implementation:

  • Explicit context packets — deliver the assembled phase context to the agent
  • Optional SessionStart hooks — runtime adapter profile, not the default path
  • ao lookup — on-demand knowledge search during a session
  • SkillLoadEvent — track which skills were loaded (citation pipeline)
  • Phase-scoped delivery — /research gets different context than /implement

Delivery is the moment where compilation meets the session. Right context, right window, right time. Phase-specific. Role-scoped. Freshness-weighted.

6. Observe

Monitor whether delivered context produces good outcomes.

SDLC parallelOperate + Monitor
What it meansTrack agent behavior, capture correction signals, measure session outcomes
Why it mattersWithout observation, context quality is a guess.

AgentOps implementation:

  • quality-signals.sh — detect user corrections and repeated prompts in real time
  • SkillLoadEvent + session-outcome — link "what was loaded" to "how it went"
  • Citation tracking — .agents/ao/citations.jsonl records every artifact retrieval
  • Context monitor — track context window usage and budget
  • ao session-outcome — compute session reward signal from transcript patterns

Observation is the phase that closes the gap between "we shipped context" and "the context worked." Every PR rejection is feedback on context. Every user correction is a signal. Every production failure in generated code traces back to missing context.

7. Adapt

Feed observations back into context improvement. Close the loop.

SDLC parallelFeedback → Plan (restart)
What it meansUse session outcomes to improve context for next session
Why it mattersWithout adaptation, the same context produces the same mistakes forever.

AgentOps implementation:

  • MemRL feedback — cited artifacts receive session reward, updating utility scores
  • Quality-signal → flywheel wiring — user corrections reduce skill utility
  • ao forge transcript — extract learnings from completed sessions
  • ao flywheel close-loop — score, promote, and curate extracted knowledge
  • /evolve — autonomous reconciliation loop that fixes the worst fitness gap
  • /dream — overnight compounding that runs the full adapt cycle unattended

Adaptation is where the CDLC becomes a flywheel. Each session's outcomes improve the next session's context. Knowledge that works gets promoted. Knowledge that fails gets demoted. The system compounds.


SDLC → CDLC Mapping Table

SDLC PhaseCDLC PhaseKey QuestionAgentOps Surface
PlanGenerateWhat context should exist?/research, /plan, SKILL.md
Code + BuildCompileHow is context assembled for this task?ao context assemble, ao lookup, ao compile
TestTestDoes this context produce the right behavior?/pre-mortem, /vibe, ao eval run
ReleaseDistributeHow do others get this context?Skills registry, /converter, install.sh
DeployDeliverDid the right context reach the agent?Explicit phase packets, optional SessionStart hooks, SkillLoadEvent
OperateObserveIs the context working in practice?quality-signals.sh, citation tracking, session-outcome
Monitor → PlanAdaptWhat should change for next time?MemRL feedback, /curate --mode=forge, /evolve, /dream

Operating loop within the phases

The seven phases describe what context engineering is. The operating loop describes how an agent executes work through them. They are not the same artifact.

A single turn of the operating loop touches every CDLC phase:

BDD-shaped intent issue            ← Generate (the intent is the spec; phase 1)
  → vertical slices                ← Compile (one slice per Given/When/Then; phase 2)
  → TDD per slice                  ← Test (first failing test before code; phase 3)
  → conflict-free parallel wave    ← Distribute + Deliver (workers receive scoped context; phases 4–5)
  → integrated bead completion     ← Observe (acceptance examples must pass; phase 6)
  → evidence + learning capture    ← Adapt (ratcheted promotion into the next loop turn; phase 7)

The loop is the unit of work that compounds. The phases are the layers it travels through. Every process skill in this repo (/discovery, /plan, /implement, /crank, /validation, /council, /pre-mortem, /vibe, /post-mortem, /curate --mode=forge, /retro) is one move in that loop, with the upstream artifact contracts and downstream evidence requirements pinned to the loop position — not to a free-floating phase number.

Canonical reference: Operating loop. Doctrine source: .agents/research/2026-05-15-cdlc-dojo-doctrine.md. Fitness gate: GOALS.md Directive #12.

The Leverage Hierarchy

Not all phases are equal. Donella Meadows ranked twelve places to intervene in a system, from weakest (#12: tweak a number) to strongest (#1: change the paradigm). The CDLC phases climb that ladder.

LeverageMeadows PointCDLC PhaseWhat It Means
Low#12–#10: Parameters, buffers, structureGenerateWriting a better prompt helps, but it's the lowest-leverage thing you can do. Most teams stop here.
Medium#9–#8: Delays, balancing feedbackCompile, TestAssembling the right context and validating it before delivery. Feedback loops that catch errors.
Threshold#6: Information flowsDistribute, DeliverMaking context available where it's needed. The point where individual effort becomes organizational capability.
High#5: RulesObserveMeasuring what actually happens. Rules that govern what gets promoted, demoted, or discarded.
Highest#4–#3: Self-organization, goalsAdaptThe system improves itself. Learnings promote automatically. Goals reconcile. The flywheel compounds without human intervention.

The pattern: the phases most teams skip are the ones Meadows says matter most. Writing a prompt is #12. Building a system that improves its own context based on what it observes is #4. That's an 8-level leverage gap.

Full leverage-point mapping: docs/leverage-points.md. Convergence map tying each CDLC phase to all five theoretical pillars: docs/the-science.md.


How the 12 Factors Build the Flywheel

The 12-factor doctrine is a build order — four tiers that construct the compounding product loop in sequence. The flywheel emerges when bookkeeping, context compilation, validation gates, and learning loops are running together.

TierFactorsProduct LayerWhat It BuildsTheory
Foundation (I–IV)Context Is Everything, Track in Git, One Agent One Job, Research FirstContext CompilerThe substrate — context exists, is versioned, is scoped, is researchedCognitive science (40% load, lost-in-middle). Meadows #12–#6.
Flow (V–VI)Validate Externally, Lock Progress ForwardValidation GatesThe filter — bad context gets caught, good context can't regressBrownian Ratchet (chaos + filter + one-way gate). Meadows #8–#7.
Knowledge (VII–IX)Extract Learnings, Compound Knowledge, Measure What MattersKnowledge FlywheelThe engine — learnings extract, score, promote, inject. The loop closes.MemRL (Zhang 2025). Self-organization (Meadows #4). Escape velocity: σ×ρ > δ.
Scale (X–XII)Isolate Workers, Supervise Hierarchically, Harvest FailuresInfrastructureThe multiplier — all three layers across parallel agents. Failure becomes fuel.Control theory (K8s reconciliation). SRE (SLOs + error budgets).

The flywheel doesn't exist until the Knowledge tier kicks in — but it can't function without the layers beneath it. Factor VIII (Compound Knowledge) is the climax: the moment the loop closes and starts compounding. Everything before it is setup. Everything after it is scale.

The theoretical threads

Each tier draws from a different body of theory:

  • Cognitive science (Sweller 1988, Liu 2023) constrains the Foundation: the 40% load rule, lost-in-middle attention mechanics, buffer-sizing. Without these constraints, you could dump everything into the window. You can't.
  • The Brownian Ratchet operates in the Flow tier: agents produce noisy output. Validation gates are the filter. The ratchet (Factor VI) is the one-way gate. Chaos + filter + gate = net forward progress.
  • MemRL (Zhang 2025) drives the Knowledge tier: reinforcement learning on episodic memory. Citation events become training signals. Utility scores update. The flywheel has its own learning algorithm.
  • Control theory enables the Scale tier: declared state (GOALS.md) + reconcile loop (/evolve) + error budgets (fitness gates). The system continuously reconciles actual state to desired state.
  • Systems dynamics (Meadows 2008) provides the leverage hierarchy: Foundation is necessary infrastructure (#12–#10), Flow adds feedback (#8–#7), Knowledge reaches self-organization (#4–#3). The highest-leverage phases are the ones most teams never build.

Full convergence map tying each CDLC phase to all five threads: The Science — Part 6.


Why This Matters

LLMs are engines. Context is fuel. You can't tune the engine — that's the model vendor's job. But you can engineer the fuel. AgentOps is the SDLC control plane; the CDLC is how it engineers the fuel.

DevOps proved that disciplined systems around indeterministic workers (humans) produce reliable output. SRE proved it again with SLOs and error budgets. Kubernetes proved it for infrastructure with control loops.

CDLC is the same proof for coding agents. The model stays the same. The context compounds. The system gets better with each use.


See Also