Passa al contenuto principale

Alertmanager setup

This runbook documents the secrets and manual checks required by the Akira Alertmanager on-call routing topology.

Vault variables

Store these values in the encrypted Ansible vault used by the staging inventory:

vault_telegram_alertmanager_token: "<dedicated-alertmanager-bot-token>"
vault_telegram_critical_chat_id: -1001234567890
vault_telegram_warning_chat_id: -1009876543210
vault_telegram_security_chat_id: -1001111111111
vault_slack_webhook_critical: "https://hooks.slack.com/services/..."
vault_alertmanager_smtp_host: smtp.asheep.it
vault_alertmanager_smtp_port: 587
vault_alertmanager_smtp_user: alertmanager@asheep.it
vault_alertmanager_smtp_password: "<smtp-password>"

vault_telegram_alertmanager_token is a dedicated Alertmanager bot token, separate from the AgentCore Telegram bot. Alertmanager-specific SMTP variables stay prefixed with vault_alertmanager_ to keep notification credentials separate from other SMTP integrations.

During deployment the Ansible role writes the three Telegram chat ids to these files and Alertmanager reads them through chat_id_file:

  • /etc/alertmanager/telegram_critical_chat_id
  • /etc/alertmanager/telegram_warning_chat_id
  • /etc/alertmanager/telegram_security_chat_id

Keep each value numeric, for example -1001234567890.

Routing tree

  • severity="critical" goes to Telegram critical, Slack #akira-critical and email oncall@asheep.it.
  • severity="warning" goes to Telegram warning only.
  • type="security" goes to Telegram security and email security@asheep.it.
  • severity="info" goes to the local non-paging webhook sink.

Telegram chat id

  1. Create these private Telegram groups: akira-critical, akira-warning and akira-security.
  2. Add the dedicated Alertmanager Telegram bot to each group.
  3. Send a test message in the group.
  4. Read updates with the bot token or use a trusted chat id helper bot.
  5. Copy each negative group id, usually shaped as -100..., into the matching vault variable.

For a direct API check:

curl -s "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/getUpdates" | jq

Slack webhook

Create a Slack app with an incoming webhook bound to #akira-critical and store the webhook URL in vault_slack_webhook_critical. The webhook is used only for critical alerts.

SMTP password

  1. Prepare SMTP credentials for alertmanager@asheep.it.
  2. Confirm SMTP submission on port 587 with TLS.
  3. Store it in vault_alertmanager_smtp_password.
  4. Keep vault_alertmanager_smtp_user equal to alertmanager@asheep.it.

Deploy check

Render and validate the role through Ansible. The role template task uses amtool check-config as its validation command before installing the configuration.

After deployment, confirm the service and UI:

systemctl status alertmanager --no-pager
sudo test -s /etc/alertmanager/telegram_critical_chat_id
sudo test -s /etc/alertmanager/telegram_warning_chat_id
sudo test -s /etc/alertmanager/telegram_security_chat_id
amtool --alertmanager.url=http://127.0.0.1:9093 config show

Smoke alert

Run the smoke script from a host with amtool and access to the local Alertmanager API:

tests/test_alertmanager_routing.sh

By default the script injects a critical alert named AkiraSmokeTest. Wait at least group_wait plus network latency, then confirm delivery in the Telegram critical group, Slack #akira-critical and the on-call mailbox. Resolve or silence the test alert after verification if needed.

Optional checks:

TEST_SEVERITY=warning tests/test_alertmanager_routing.sh
TEST_SEVERITY=warning TEST_TYPE=security tests/test_alertmanager_routing.sh

Info alerts

ADR-0014 routes severity="info" to a local webhook receiver at http://localhost:8888/log. That endpoint is the intended logging sink for /var/log/alertmanager-info.log; if the sink is not deployed yet, info alerts are intentionally non-paging but may fail delivery to the local webhook.