Postmortem - [Incident name]
Date: YYYY-MM-DD
Severity: SEV1/2/3
Duration: HH:MM
Authors: [On-call engineers]
Status: Draft | Reviewed | Final
TL;DR
One-sentence summary.
Impact
- Customers affected: [N or %].
- Service degraded: [SIP signaling, CDR pipeline, billing, other].
- Revenue impact: [EUR X estimated].
- SLA breach: [Y/N].
Timeline
- HH:MM detection.
- HH:MM acknowledged.
- HH:MM mitigation applied.
- HH:MM resolved.
- HH:MM monitoring complete.
Root cause
Detailed technical cause. Use blameless language.
What went well
- Detection time.
- Communication.
- Mitigation effectiveness.
What went poorly
- Detection lag.
- Tooling gaps.
- Knowledge gaps.
Action items
| Action | Owner | Due | Status |
|---|---|---|---|
| Add monitoring for X | @name | YYYY-MM-DD | Open |
| Document runbook for Y | @name | YYYY-MM-DD | Open |
| Improve tooling Z | @name | YYYY-MM-DD | Open |
Lessons learned
- Bullet points from the blameless retrospective.