Passa al contenuto principale

Akira - Operational Runbooks

This directory is the canonical index for Akira operational runbooks.

Quick Reference

SituationRunbook
Deploy new staging or production versiondeploy.md
PostgreSQL primary downdr.md#postgresql-primary-failover
App stack recoverydr.md#app-stack-full-recovery
Full region disasterdr.md#full-region-disaster
Frontend or API unreachableincident-response.md#sev2-degraded
Incident kickoffincident-response.md#kickoff-procedure-sev1sev2
Setup on-call shiftoncall.md
HTTPS certificate expired or renewal failedcert-renewal.md
Vault primary sealedvault-unseal.md

Conventions

  • RTO target: maximum target recovery time after a failure.
  • RPO target: maximum target data-loss window after a failure.
  • Escalation: who to notify after the stated elapsed time.
  • Prereq: credentials and tools required before touching production or staging.
  • Caveat: known side effects and operations to avoid.

Tools Required

  • SSH key Akira: ~/.ssh/akira_ed25519.
  • Tailscale up and connected to the Akira tailnet.
  • Ansible vault password file: ~/.akira-vault-pass.txt.
  • Hetzner Cloud Console access: https://console.hetzner.cloud/.
  • 1Password vault Akira Staging for break-glass secrets and Vault unseal material.
  • Telegram @AkiraOpsBot admin access for alert ack and fast state queries.
  • Local repository at /home/devcomm/akira on the VPS or ~/work/akira on an operator laptop.

Pilot Baselines

These runbooks reference the current pilot validation targets:

  • TASK-236: single SIP smoke path validates SIPp to Kamailio to RTPengine to FreeSWITCH to CDR.
  • TASK-237: pilot load target is 10 cps, 75s average call duration, 900 concurrent cap, ASR at least 95%, PDD p95 under 2s.
  • TASK-238: PostgreSQL failover target is RTO under 5 minutes and RPO under 30 seconds.

Deploy And Release

Incident, Support, And On-Call

Security And Access

Disaster Recovery And Backups

Infrastructure Operations