Passa al contenuto principale

Deploy Akira Signaling Layer

Pre-flight checks

  1. Verify all Trigger #2 VMs are online:
    hcloud server list | grep -c staging
  2. Verify Tailscale/SSH reachability:
    ansible trigger2 -i infra/inventory/staging.yml -m ping
  3. Verify Hetzner firewall rules are attached in the Hetzner console.
  4. Verify Magic DNS names resolve on Trigger #2 hosts:
    ansible trigger2 -i infra/inventory/staging.yml -m ansible.builtin.command -a "hostname -f"
  5. Build the Kamailio htable sync wheelhouse used by Step 4:
    scripts/build_kam_sync_wheelhouse.sh

Deploy

Run the full signaling orchestration from the repo root:

ansible-playbook -i infra/inventory/staging.yml infra/playbooks/deploy_signaling.yml \
--vault-password-file ~/.akira-vault-pass.txt

deploy_signaling.yml loads infra/group_vars/all/main.yml and infra/group_vars/all/vault.yml through vars_files; do not pass those files again with --extra-vars.

Expected duration is about 15-25 minutes, mostly package install, service restart, and healthcheck time.

Step-by-step debug

Each orchestration phase has a tag:

ansible-playbook -i infra/inventory/staging.yml infra/playbooks/deploy_signaling.yml \
--vault-password-file ~/.akira-vault-pass.txt --tags step1
ansible-playbook -i infra/inventory/staging.yml infra/playbooks/deploy_signaling.yml \
--vault-password-file ~/.akira-vault-pass.txt --tags step3,step4
ansible-playbook -i infra/inventory/staging.yml infra/playbooks/deploy_signaling.yml \
--vault-password-file ~/.akira-vault-pass.txt --tags smoke

Use --limit for a single host while debugging:

ansible-playbook -i infra/inventory/staging.yml infra/playbooks/deploy_signaling.yml \
--vault-password-file ~/.akira-vault-pass.txt --tags step4 --limit akira-sip-01

Rollback

For one problematic node, stop and disable the affected service:

ansible <node> -i infra/inventory/staging.yml -b -m ansible.builtin.systemd -a "name=<service> state=stopped enabled=false"

Service names used by this playbook:

  • fail2ban
  • rtpengine
  • freeswitch
  • fs-esl-gateway
  • kamailio
  • kamailio-htable-sync.timer

For a full Trigger #2 staging rollback, power off the Trigger #2 VMs in Hetzner. Firewall and tailnet configuration can remain in place for retry.

Smoke test post-deploy

Step 7 runs one SIPp UAC success call:

ansible-playbook -i infra/inventory/staging.yml infra/playbooks/deploy_signaling.yml \
--vault-password-file ~/.akira-vault-pass.txt --tags step7

Step 6 normally verifies backend health without restarting it. ADR-0011 moved FreeSWITCH ESL handling to fs-esl-gateway, deployed on the FreeSWITCH nodes in step 3. If a deploy also changes backend environment variables, explicitly request a Docker Compose backend reload:

ansible-playbook -i infra/inventory/staging.yml infra/playbooks/deploy_signaling.yml \
--vault-password-file ~/.akira-vault-pass.txt --tags step6 \
-e deploy_signaling_requires_backend_reload=true

If it fails, check these first:

ansible akira-sip-01 -i infra/inventory/staging.yml -b -m ansible.builtin.command -a "journalctl -u kamailio -n 50 --no-pager"
ansible akira-sip-01 -i infra/inventory/staging.yml -b -m ansible.builtin.command -a "kamcmd htable.dump destinations"
ansible akira-fs-01 -i infra/inventory/staging.yml -b -m ansible.builtin.command -a "journalctl -u freeswitch -n 50 --no-pager"
ansible akira-rtp-01 -i infra/inventory/staging.yml -b -m ansible.builtin.command -a "journalctl -u rtpengine -n 50 --no-pager"

For packet-level SIP debugging during a re-run:

ansible akira-sip-01 -i infra/inventory/staging.yml -b -m ansible.builtin.command -a "tcpdump -i any port 5060 -nn -c 50"

Alert post-deploy

After the full deploy, verify Grafana has the signaling dashboard imported for Kamailio CPS, FreeSWITCH sessions, and RTPengine media port usage.