Incident Review Layout

April 17, 2026 · View on GitHub

Layout: Postmortem card with timeline, root cause, impact, and remediation tracks Best for: Postmortems, outage reviews, incident summaries, reliability follow-ups, operational retrospectives

Template

Postmortem · Incident Review

Queue Saturation Incident Snapshot

Use this layout when the audience needs a compact operational reading: what happened, why it happened, who was affected, and what will change next.

Timeline

  • 03:14Consumer lag alert triggered after retry volume spiked on two partitions.
  • 03:19Malformed payload source isolated and new intake throttled.
  • 03:26Replay lane enabled for validated jobs with reduced concurrency.
  • 03:33Queue depth normalized and customer latency returned to baseline.

Root Cause

Retry amplification met insufficient consumer headroom. Earlier warning signals existed, but alert thresholds were tuned to absolute lag rather than unstable retry patterns.

Remediation

  • Alert on retry concentration, not only queue depth.
  • Pre-scale workers during replay conditions.
  • Add schema guardrails at intake instead of downstream quarantine.