Passa al contenuto principale

ADR-0020 - SIP dynamic registration with digest auth

  • Status: Accepted (2026-06-01)
  • Deciders: Massimo Bagnoli, Claude
  • Implementation tasks: TASK-400, TASK-401, TASK-402, TASK-403, TASK-404, TASK-405, TASK-406, TASK-407
  • Supersedes: nessuno
  • Superseded by: nessuno

Context

Akira must support SIP devices that register dynamically instead of being authorized only by static originator IP allowlists. Opening UDP/TCP 5060 to the public Internet without a working registration-auth and fail2ban chain would expose Kamailio directly to scanner and brute-force traffic.

The design needs a stable authentication realm, a shared registration state for Kamailio HA, and a security gate before the public firewall rule is enabled.

Decision

Use SIP digest authentication for dynamic device registration with a stable logical realm: akira.asheep.it. The realm must not depend on $td or node IPs.

Store the digest HA1 exported from Akira into Kamailio htable data, not Redis. Kamailio checks REGISTER credentials against that htable and stores successful registrations in usrloc with db_mode=3.

Failover must reload usrloc on master promotion with ul.reload.

D1 — htable population mechanism (amended 2026-06-02)

The original D1 said only "export the HA1 into htable". Commissioning on sip-02 revealed the real mechanism splits in two, for a security reason:

  • subscriber htable (digest HA1) → PUSH. kam-sync decrypts each dynamic device's sip_ha1_encrypted in-process and issues kamcmd htable.sets subscriber "<sip_username>@<sip_realm>" "<ha1_hex>". The decrypted HA1 must never be written to disk nor loaded through a db-backed reload — the HA1 is encrypted at rest (D3); a file/db-backed reload would re-expose it in cleartext (a subscriber.txt) or require loading the encrypted blob (useless to pv_auth_check). The lookup key matches the cfg exactly: REGISTER uses $au@<kamailio_register_auth_realm>, INVITE uses $au@$ar. So a device's sip_realm must equal the configured kamailio_register_auth_realm and its sip_username must equal the digest username the device sends.
  • Routing htables → file export + db-backed reload (DEFERRED follow-up). originators_by_ip, originators_by_aor, orig_meta, company_balance_status, destinations, device_limits are exported to files and applied with kamcmd htable.reload. That reload is db-backed and currently returns 500 - No htable db_url because the htable definitions carry no db_url/dbtable. Wiring a db-backed reload (or moving these to push) is a call-path follow-up that does not gate dynamic registration; kam-sync logs the deferred gap and continues.

Related fix: kamcmd exits 0 even when an RPC returns an error (it prints error: ... to stdout), which silently masked these failures. kam-sync now inspects kamcmd output, not just the exit code.

D4 — Registration stability & iOS background (amended 2026-06-03)

Commissioning the call path surfaced two things:

  • iOS-background reachability via SIP push is DEFERRED (product decision). Capture analysis of the Zoiper iOS client shows it requests Expires: 60 and sends no RFC 8599 push params (pn-provider/pn-prid/pn-param) and no +sip.instance. When iOS suspends the app there is no re-REGISTER, so the binding lapses (usrloc empties) — the server cannot wake a suspended app. A proper fix (SIP push: APNs + a push gateway + pn-* handling, and a client that actually sends pn-*, i.e. Zoiper Push) is a separate milestone, not a blocker for commissioning. Note: usrloc db_mode=3 is DB_ONLY — kamcmd ul.dump is always empty; the authoritative state is the location DB table.
  • Commissioning uses a stable HEADLESS registrant (option a): a dedicated test device (+390000000099, NOT Massimo's id=5) re-REGISTERs every 45s (Expires 120, fixed Contact port) via a systemd timer on a tailnet host (scripts/sip-test-registrant/). This gives a stably-registered originator to exercise the INVITE/LCR path without depending on a phone foreground.

Call-path cfg fix (routing.cfg LOOKUP_ORIGINATOR): the success path of LOOKUP_REGISTERED_DEVICE returns after setting orig_id, and the caller then fell through to an unconditional 403 "Unknown originator" — so every successfully-authenticated dynamic device was rejected. Guarded with if ($avp(s:orig_id) != $null) return;. With this, an INVITE from a registered dynamic device traverses originator-id -> INVITE digest auth -> admission -> LCR; it now stops at 503 only because no terminator/dispatcher is configured and destinations is not yet loaded (db-backed reload + terminator + RTP are the remaining call-path milestones).

The implementation chain is:

  • TASK-400: device SIP password/HA1 data path.
  • TASK-401: Kamailio location database prerequisite.
  • TASK-402: REGISTER digest auth and usrloc configuration.
  • TASK-403: kam-sync export of subscriber HA1 into htable.
  • TASK-404: fail2ban/Kamailio detection repair and ban gate.
  • TASK-405: public 5060 opening only after the fail2ban ban is demonstrated.
  • TASK-406: end-to-end REGISTER verification.
  • TASK-407: INVITE authorization for registered device AORs.

TASK-404 is a hard gate for TASK-405: public SIP must not be opened until fail2ban demonstrably bans a controlled scanner/auth-fail source.

Rationale

A stable realm keeps HA1 valid across nodes, DNS changes and failover. Using htable keeps the authentication data path local to Kamailio and consistent with the existing kam-sync pattern. usrloc with database mode 3 preserves registration state for the HA pair while allowing explicit reload on failover.

Fail2ban is part of the security design, not an optional monitoring layer. The Kamailio logs must emit stable tokens that the fail2ban filters match, and the jails must read the same log path where Kamailio writes.

Consequences

Positive

  • Registered SIP devices are no longer tied only to static source IPs.
  • HA1 remains stable because the realm is logical and node-independent.
  • The ban-before-open gate reduces exposure to scanner traffic before 5060 is made public.
  • usrloc state can survive node transitions through the database-backed mode.

Negative

  • Device authentication now depends on the kam-sync subscriber export being current on SIP nodes.
  • Operational rollout must keep TASK-405 blocked until TASK-404 has a real ban result from a controlled source.
  • Failover procedures must include ul.reload; otherwise registrations can lag on the promoted node.

References

  • TASK-400: device SIP password/HA1 data path.
  • TASK-401: Kamailio location table migration.
  • TASK-402: Kamailio registrar digest auth configuration.
  • TASK-403: kam-sync subscriber HA1 export.
  • TASK-404: fail2ban Kamailio detection repair.
  • TASK-405: public 5060 firewall rule.
  • TASK-406: Zoiper REGISTER E2E verification.
  • TASK-407: INVITE authorization for registered device AOR.