Passa al contenuto principale

Postmortem - [Incident name]

Date: YYYY-MM-DD
Severity: SEV1/2/3
Duration: HH:MM
Authors: [On-call engineers]
Status: Draft | Reviewed | Final

TL;DR

One-sentence summary.

Impact

  • Customers affected: [N or %].
  • Service degraded: [SIP signaling, CDR pipeline, billing, other].
  • Revenue impact: [EUR X estimated].
  • SLA breach: [Y/N].

Timeline

  • HH:MM detection.
  • HH:MM acknowledged.
  • HH:MM mitigation applied.
  • HH:MM resolved.
  • HH:MM monitoring complete.

Root cause

Detailed technical cause. Use blameless language.

What went well

  • Detection time.
  • Communication.
  • Mitigation effectiveness.

What went poorly

  • Detection lag.
  • Tooling gaps.
  • Knowledge gaps.

Action items

ActionOwnerDueStatus
Add monitoring for X@nameYYYY-MM-DDOpen
Document runbook for Y@nameYYYY-MM-DDOpen
Improve tooling Z@nameYYYY-MM-DDOpen

Lessons learned

  • Bullet points from the blameless retrospective.