Postmortem and Regression Tests

March 6, 2026 · View on GitHub

🧭 Quick Return to Map

You are in a sub-page of OpsDeploy.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

Turn incidents into permanent fixes. This page gives a short postmortem template, evidence you must capture, and a drop-in regression suite so the same class of failure does not ship again.


Open these first


When to run a postmortem

  • ΔS drift p95 above 0.15 or coverage below 0.60 in any canary window.
  • λ flips above 0.20 or tool loops detected.
  • 5xx above 1 percent or sustained 429 storms.
  • Mixed answers across versions or index mismatch.
  • Duplicate side effects or data corruption risk.

Exit targets after recovery

  • Error rate below 0.5 percent within ten minutes.
  • ΔS and coverage match the last pinned baseline window.
  • p95 latency within plus 15 percent of baseline.
  • Duplicate side effects equal zero after reconciliation.

Evidence you must capture

  • Version pins: BUILD_ID, GIT_SHA, MODEL_VER, PROMPT_VER, EMBED_MODEL_VER, EMBED_DIM, NORM, metric, RERANK_CONF, TOK_VER, ANALYZER_CONF, CHUNK_SCHEMA_VER, INDEX_HASH.
  • Gold set window: ΔS(question,retrieved), ΔS(retrieved,anchor), coverage, λ states before and after.
  • Traffic weights or flags during the event.
  • Cache namespace keys that were active.
  • Side effect receipts and idempotency decisions.
  • Timeline of flips and breaker state.

Root cause classifier

ClassTypical signPrimary fix page
Pin driftanswers change after silent provider updateversion_pinning_and_model_lock.md
Index swap errormixed citations or stale blendsvector_index_build_and_swap.md
Cache namespace bugcross-arm answers after cutovercache_warmup_invalidation.md
Idempotency missingduplicate writes or refundsidempotency_dedupe.md
Load controls weak429 storms, tail spikesrate_limit_backpressure.md, retry_backoff.md
Boot orderingfirst call fails after deploybootstrap-ordering.md
Prompt regressionfluent but wrong citationsrollout_readiness_gate.md

60-second postmortem checklist

  1. Lock recovery window and freeze writes for reconciliation.
  2. Capture all pins and the gold set metrics before and after the flip.
  3. Identify the first bad change and the last known good set.
  4. Classify the root cause using the table above.
  5. Attach the exact fix page link that prevents this class.
  6. Add a regression test that fails on the pre-fix build and passes on the fixed build.
  7. File owners and due dates. Publish within 48 hours.

Postmortem template (paste-ready)

# Incident <slug> · severity <S1..S4>
## Summary
One paragraph in plain language.

## Timeline
- <time> first alerts
- <time> rollback lever pulled
- <time> green window confirmed

## Blast radius and impact
Users affected, error rate, CX notes, cost impact.

## Evidence
Pins: MODEL_VER, PROMPT_VER, INDEX_HASH, RERANK_CONF, TOK_VER, ANALYZER_CONF  
Metrics: ΔS, coverage, λ, latency, 429, 5xx  
Flags and weights: what was on and where

## Root cause
Classifier: <from table>  
Primary mechanism: one paragraph

## What went well
Short bullets.

## What went wrong
Short bullets.

## Permanent fixes
- Link to exact WFGY fix page
- Config or code diff
- Owner and due date

## Regression tests added
List each new test and where it runs.

## Appendix
Raw logs, dashboards, receipts, diffs.

Regression suite you should own

Retrieval quality

  • Gold set of 20 to 40 questions. Targets: ΔS(question,retrieved) ≤ 0.45 and coverage ≥ 0.70. λ convergent across two seeds.

Prompt pack invariants

  • Header checksum test. Fails when header order or critical clauses change without bumping PROMPT_VER.

Embedding and metric guard

  • Dimension, norm, and metric checks. Reject mismatches before index build.

Reranker stability

  • Deterministic top-k order for a frozen candidate set. Log p95 swap rate.

Tokenizer and analyzer checks

  • CJK and diacritics forms. Fullwidth and halfwidth. RTL punctuation. Compare token counts across builds.

Index alias swap rehearsal

  • Build docs_vB offline and run head-to-head against docs_vA. One operation flips alias and is reversible.

Feature flags and rollout gates

  • Simulated 5 to 25 to 50 to 100 percent ramp. Abort rules fire on ΔS p95, coverage, λ, error rate, latency.

Rate limit and retry harness

  • Token bucket and full jitter backoff. Ensure 429 p95 is below 2 percent in burst windows.

Idempotency fences

  • Webhook replay, API write retry, and consumer crash replay. Duplicate side effects equal zero.

Cache versioning and warmup

  • Keys must include INDEX_HASH and PROMPT_VER. Warm both L1 and L2. Negative cache TTL short with jitter.

Cold start and boot order

  • First call after deploy. Secrets ready, index mounted, health probe includes a gold QA.

Rollback rehearsal

  • Blue to Green and index alias back to A. Cache namespace rotates. System returns to baseline targets.

CI manifest example

# opsdeploy/regression_suite.yml
gates:
  retrieval:
    ds_max: 0.45
    coverage_min: 0.70
    lambda_convergent: true
  latency:
    p95_uplift_max: 0.15
  errors:
    rate_max: 0.005
invariants:
  pin_fields:
    - MODEL_VER
    - PROMPT_VER
    - EMBED_MODEL_VER
    - EMBED_DIM
    - NORM
    - metric
    - RERANK_CONF
    - TOK_VER
    - ANALYZER_CONF
    - CHUNK_SCHEMA_VER
    - INDEX_HASH
  forbid_mutation_during_rollout: true
scenarios:
  - name: index_alias_swap
    required: true
  - name: retry_replay_idempotency
    required: true
  - name: feature_flag_ramp
    required: true
decision:
  on_fail: block_rollout
  on_pass: ship
artifacts:
  - logs/regression_report.json

Observability fields to pin for every run

  • Version pins, flags, weights, and region.
  • ΔS metrics, coverage, λ states.
  • p50 and p95 latency per stage.
  • 429 and 5xx counts, queue wait, breaker state.
  • Dedupe decisions and effect receipts.

Common pitfalls

  • Postmortem without a regression test that fails first.
  • No owner or due date on action items.
  • Fix shipped without updating pins or cache namespace.
  • Region divergences that never reconverge.
  • Canary windows too short to detect drift.

🔗 Quick-Start Downloads (60 sec)

ToolLink3-Step Setup
WFGY 1.0 PDFEngine Paper1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)TXTOS.txt1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

Explore More

LayerPageWhat it’s for
⭐ ProofWFGY Recognition MapExternal citations, integrations, and ecosystem proof
⚙️ EngineWFGY 1.0Original PDF tension engine and early logic sketch (legacy reference)
⚙️ EngineWFGY 2.0Production tension kernel for RAG and agent systems
⚙️ EngineWFGY 3.0TXT based Singularity tension engine (131 S class set)
🗺️ MapProblem Map 1.0Flagship 16 problem RAG failure taxonomy and fix map
🗺️ MapProblem Map 2.0Global Debug Card for RAG and agent pipeline diagnosis
🗺️ MapProblem Map 3.0Global AI troubleshooting atlas and failure pattern map
🧰 AppTXT OS.txt semantic OS with fast bootstrap
🧰 AppBlah Blah BlahAbstract and paradox Q&A built on TXT OS
🧰 AppBlur Blur BlurText to image generation with semantic control
🏡 OnboardingStarter VillageGuided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.
GitHub Repo stars