Runbook - Vault Auto-Unseal Recovery
Normal Operation
Akira primary Vault runs on akira-mgmt-01-staging and uses transit
auto-unseal from the transit Vault on akira-cache-01-staging.
RTO targets:
- Auto-unseal after restart: under 5 seconds.
- Manual transit unseal: 5 to 10 minutes if 1Password and quorum keys are available.
Prerequisites
- SSH access to
akira-mgmt-01-stagingandakira-cache-01-staging. - 1Password vault
Akira Stagingaccess for unseal keys and break-glass material. ~/.akira-vault-pass.txtavailable for redeploys.
Symptoms
- Backend or worker crash loop mentioning Vault sealed.
vault statusreportsSealed: true.- Secret reads fail during deploy or service startup.
ssh root@akira-mgmt-01-staging '
VAULT_ADDR=https://127.0.0.1:8200 vault status
systemctl status vault --no-pager
journalctl -u vault --since "30 min ago" --no-pager | tail -120
'
Cause A: Transit Vault Down Or Sealed
Diagnostics:
ssh root@akira-cache-01-staging '
systemctl status vault --no-pager
VAULT_ADDR=https://127.0.0.1:8200 vault status
'
Recovery:
ssh root@akira-cache-01-staging '
systemctl start vault
VAULT_ADDR=https://127.0.0.1:8200 vault status
'
If transit Vault is sealed, unseal it with 3 Shamir keys from 1Password:
ssh root@akira-cache-01-staging '
export VAULT_ADDR=https://127.0.0.1:8200
vault operator unseal <key_1>
vault operator unseal <key_2>
vault operator unseal <key_3>
vault status
'
Then restart primary Vault:
ssh root@akira-mgmt-01-staging '
systemctl restart vault
sleep 3
VAULT_ADDR=https://127.0.0.1:8200 vault status
'
Expected: Sealed is false.
Cause B: Transit Token Revoked Or Expired
Symptoms:
- Transit Vault is unsealed and reachable.
- Primary Vault remains sealed after restart.
- Logs mention permission denied or token invalid for transit seal.
Recovery:
- Recreate the transit token using the existing setup workflow.
- Update
vault_transit_tokenin encrypted Ansible vault. - Redeploy Vault configuration to management.
- Restart primary Vault and validate.
cd /home/devcomm/akira
TRANSIT_HOST=akira-cache-01-staging.tail5f9c92.ts.net \
scripts/setup-vault-transit.sh
ansible-vault edit infra/group_vars/all/vault.yml \
--vault-password-file ~/.akira-vault-pass.txt
cd /home/devcomm/akira/infra
ansible-playbook -i inventory/staging.yml playbooks/deploy_management.yml \
--vault-password-file ~/.akira-vault-pass.txt \
--tags vault
ssh root@akira-mgmt-01-staging '
systemctl restart vault
sleep 3
VAULT_ADDR=https://127.0.0.1:8200 vault status
'
Cause C: Primary Manual Recovery
Use this only if transit recovery is not possible and the original primary unseal material is available.
ssh root@akira-mgmt-01-staging '
export VAULT_ADDR=https://127.0.0.1:8200
vault operator unseal <key_1>
vault operator unseal <key_2>
vault operator unseal <key_3>
vault status
'
Caveat: if the primary was initialized with transit seal, compatible transit state is still part of the recovery model. Do not rotate or recreate transit keys during an incident unless you are following the documented setup path.
Validation
ssh root@akira-mgmt-01-staging '
VAULT_ADDR=https://127.0.0.1:8200 vault status | grep "Sealed.*false"
docker compose -f /opt/akira/docker-compose.yml restart backend cdr-worker
sleep 5
docker ps --filter "name=akira-backend\\|akira-cdr-worker" \
--format "{{.Names}} {{.Status}}"
'
Expected:
- Primary Vault reports unsealed.
- Backend and CDR worker stay running.
- Deploys can decrypt and read required secrets.
Escalation
| Elapsed Time | Action |
|---|---|
| T+5 min | Escalate if transit is sealed and keys are unavailable. |
| T+10 min | Escalate to Massimo if primary Vault remains sealed. |
| T+20 min | Escalate to Francesco and stop token/key changes until reviewed. |
Caveats
- Do not paste unseal keys into tickets, logs, or chat.
- Do not commit generated Vault tokens.
- Do not initialize a replacement transit Vault unless the current transit data is confirmed lost.
- Do not restart dependent services repeatedly while Vault remains sealed.