ADR-0026 - Benchmark external loop simulator

Status: Accepted (2026-06-04)
Deciders: Massimo Bagnoli
Implementation tasks: TASK-458, TASK-459, TASK-474
Depends on: ADR-0025, AX loop agreement and whitelist
Supersedes: nessuno
Superseded by: nessuno

Context

The Benchmark module was originally treated as an internal latency self-test for the signaling/media pipeline. That is not the product capability required by doc chapter 42.2 and the commissioning direction from Massimo.

Benchmark is the traffic Loop Simulator: an operational tool that creates traffic jobs, drives ASR and ACD targets, ramps CPS over a duration or 24-hour curve, and records runtime metrics so reports, billing, routing, and invoicing can be validated under volume.

The new requirement is stronger than a local synthetic smoke test. Akira must be able to generate real calls toward AX, let AX route those calls back into Akira as if they came from a customer, and then process them through the normal inbound path. This closed loop is the only realistic way to stress reporting, rating, billing, and invoicing with carrier-like volumes while preserving the same behavior as live traffic.

This ADR defines the canonical architecture for that capability. The engine is implemented separately in TASK-474; UI alignment is handled by TASK-458 and TASK-459.

Decision

Benchmark is a traffic test tool, not only a self-test.

The module supports two modes:
- synthetic_local: a local generator mode for smoke and low-volume diagnostics. This mode can validate runner wiring and basic reporting ingestion without involving AX.
- external_loop_ax: a closed-loop traffic mode. Akira originates calls through a dedicated test originator or trunk, routes them through the normal routing and LCR path to the AX terminator, AX routes the traffic back into Akira as customer inbound traffic, and Akira rates, counts, and reports those calls through the real production-like path.
The old internal latency self-test may exist as a separate diagnostic tool, but it is not the canonical Benchmark product surface.
SIPp is the canonical traffic generator.

The Benchmark engine uses SIPp on the origin side, with controlled scenarios and explicit CPS ramping. SIPp is preferred over FreeSWITCH originate/ESL for Benchmark load generation because it gives deterministic call profiles, scenario files, call-rate control, CSV-driven destinations, and predictable failure statistics.

FreeSWITCH originate/ESL remains useful for targeted diagnostics and operational single-call actions, but it is not the default Benchmark engine.

All generated traffic must use:
- a dedicated originator or trunk reserved for Benchmark traffic;
- recognizable caller IDs and numbering ranges;
- scenario metadata that allows correlation between SIPp stats, CDRs, and Benchmark job records.
Benchmark jobs are persistent operational jobs.

A job stores at least:
- mode: synthetic_local or external_loop_ax;
- target ASR and target ACD;
- CPS plan: fixed CPS, ramp, or 24-hour curve;
- duration and optional volume cap;
- destination set or destination profile;
- originator or trunk used for the test;
- status: scheduled, running, done, failed, aborted;
- runtime metrics: effective CPS, ASR, ACD, active calls, latency/PDD where available, completed calls, failed calls, and abort reason.
This model is the backend counterpart of TASK-458 and TASK-459: the main UI is a persistent job table, the creation flow is a wizard, and the job detail or row shows effective ASR/ACD/CPS/latency metrics.
Safety controls are mandatory before external loop execution.

The engine must refuse to start external-loop traffic unless all safety controls are configured and visible to operators:
- maximum CPS cap enforced server-side;
- maximum concurrent calls cap enforced server-side;
- maximum total call volume cap per job;
- immediate kill-switch that stops SIPp processes and marks the job aborted;
- timeout or lease on every running job so orphaned generators cannot run indefinitely;
- dedicated test originator/trunk and CDR tagging with is_test=true, or an equivalent canonical isolation mechanism;
- report and billing filters that can include or exclude test traffic explicitly;
- AX-approved test window, allowed volume, and loop routing configuration.
Test traffic must never be mixed indistinguishably with real customer traffic. If is_test=true is not yet available end to end, the dedicated originator/trunk and reserved numbering range are mandatory until the CDR tag is implemented.
ADR-0025 rating gate is a prerequisite for meaningful volume validation.

External-loop Benchmark traffic is intended to validate that admission, routing, rating, CDR aggregation, reports, billing, and invoicing behave correctly under load. Therefore ADR-0025 must be active before Benchmark results are treated as commercially valid.

Without the rating gate, the simulator may still be used for infrastructure smoke tests, but not as proof that billing and LCR admission are correct under volume.
AX loop configuration is an operational dependency, not an Akira-only toggle.

The external_loop_ax mode requires carrier coordination. AX must whitelist the Akira media/signaling sources, configure the return path into Akira, and agree on the allowed test window, CPS, concurrent calls, and destination profiles.

Known commissioning context:
- AXCOM terminator id 2, provider WIND;
- Akira SIP nodes 178.105.149.58 and 178.105.159.145;
- media IP already whitelisted in current notes;
- fail2ban protects signaling, so Benchmark source IPs and expected volumes must be aligned before high-volume tests.

Consequences

Positive

Benchmark becomes a realistic traffic simulator instead of a cosmetic latency check.
Reports, rating, billing, and invoicing can be tested with closed-loop traffic that follows the same path as customer calls.
SIPp gives repeatable CPS, ramp, scenario, and failure statistics.
Dedicated test identifiers and is_test=true isolation allow operators to include or exclude Benchmark traffic deliberately.
TASK-458 and TASK-459 have a stable backend contract to align to.

Negative

External-loop tests can create real carrier cost on AX/WIND routes.
Misconfigured CPS or concurrency can overload Akira, AX, or intermediate signaling/media components.
fail2ban or upstream anti-abuse controls can block test sources if the window and volume are not coordinated.
Billing and reporting queries must handle test-traffic inclusion and exclusion explicitly.
The engine must manage OS processes and cleanup; orphaned SIPp processes are an operational risk.

Operational notes

Operators must treat external-loop jobs like controlled load tests, not normal UI actions.
Default external-loop caps must be conservative until staging measurements prove higher limits.
The kill-switch must be available even when the normal job worker is degraded.
Every external-loop job must leave enough audit data to reconstruct who started it, which scenario ran, which originator/trunk was used, and which AX window was approved.
Test CDRs should be visible in operational reports by default only when the report explicitly opts into test traffic.

Alternatives considered

Keep Benchmark as an internal latency self-test

Rejected. A self-test can prove that a path is alive, but it does not validate ASR/ACD behavior, CPS ramps, CDR aggregation, rating, billing, invoicing, or carrier loop behavior under volume.

FreeSWITCH originate/ESL as the primary generator

Rejected as the default Benchmark engine. FreeSWITCH originate is useful for single-call diagnostics and FreeSWITCH-centric checks, but SIPp is a better fit for controlled scenarios, CPS ramps, CSV destination sets, and repeatable load statistics.

Purely internal synthetic generator only

Rejected for commercial validation. Internal-only traffic avoids carrier cost and coordination, but it skips the AX return path and therefore cannot prove that inbound customer-like traffic, reports, billing, and invoices behave like live traffic.

External AX loop without test isolation

Rejected. Mixing generated calls with live traffic without is_test=true, dedicated originators, or reserved numbering would make reports and invoices ambiguous and create unacceptable accounting risk.

References

ADR-0011: FreeSWITCH ESL bridge pattern.
ADR-0017: Billing rating engine integration in cdr-worker.
ADR-0025: Rating-gated admission and LCR.
TASK-458: Benchmark paradigm jobs table.
TASK-459: Benchmark wizard and runtime metrics.
TASK-474: Benchmark loop engine skeleton.

Amendment 2026-06-08 — Ring simulation, answered_at billing, FAS guard

7. Human-like ring simulation + answered_at billing (IMPLEMENTED, commit 73d05747)

Why this matters (was a recurring point of confusion — documented here for good):

In the loop the answerer is a robot (the VAS FreeSWITCH answer-bot), not a person. By default it answers (200) or rejects (486/480) instantly → there is no ringing → pdd_ms = 0 on benchmark CDRs. A real call rings a real phone for seconds before a human answers, so PDD > 0.
Billing is from ANSWER to hangup; ringing is never billed. On the benchmark, because ring ≈ 0, total duration == connected duration, so the old INVITE-based fallback happened to bill correctly. On a real call ring > 0, so billing from INVITE (the fallback used when answered_at is missing) would wrongly charge the ring seconds.

Decision (done):

Ring is simulated in the VAS to make the benchmark human-like: ANSWERED branch rings 3–12s (peak ~5s) before 200; NO_ANSWER rings 18–25s before 480; BUSY is immediate. The ring duration comes from a backend endpoint GET /api/v1/benchmark/ring (like /roll) — never FS ${rand} (unreliable). The VAS sleep ${curl(.../ring)} before answering/rejecting.
answered_at is captured in Kamailio (onreply_route[REPLY_HANDLER] on the 200 OK: $dlg_var(cdr_answered_ts)=$Ts, added to acc/acc_json cdr_extra; the kam-cdr-bridge + cdr-worker already consume answered_at). Billing = answered_at → hangup (ring excluded). Verified end-to-end: total ~14s (5s ring + 8.9s connected) → billsec 9 (ring NOT billed).
Side effect (positive): calls now occupy the channel during ring too → concurrency/curve are more realistic. answered_at capture also benefits real customers from day one (answer→hangup billing automatically once real ringing traffic arrives).

8. FAS / false-answer + loop-integrity guard (DESIGN — ADR-0029, TASK-516/517)

Loop invariant: every outbound benchmark call MUST return through Akira and be answered by the VAS. The VAS is the only legitimate answerer. Failure modes to kill fast:

FAS / false answer: AX (or an intermediate) answers the outbound leg (200) WITHOUT the call returning to the VAS → fake billing. Kill ≤1s.
Black-hole: AX routes the call somewhere that never returns → outbound leg rings forever or gets a fake 200. Kill (CANCEL 487 if ringing).
Should-have-answered-but-not-returned: ringing but the return leg never reached the VAS → kill.

The full mechanism (token in CLI that survives AX reflection; VAS Redis claim on answer; Kamailio outbound-200 onreply checks the claim → BYE if missing; ringing watchdog → CANCEL 487) is specified in ADR-0029 (TASK-516) and implemented in TASK-517.

Amendment 2026-06-08 — FAS guard IMPLEMENTED (in-path Kamailio)

The section-8 design is now implemented. Two iterations were needed; recording both so the dead-ends are not retried.

What failed: external dlg.list-polling watchdog with a `(CLI, B-number)` correlator

First implementation was a stdlib daemon polling kamcmd dlg.list, correlating each A-leg (orig 6) with its return B-leg (orig 1) by the (CLI, B-number) pair and dlg.end_dlg-ing only-A legs. It worked at low load and collapsed at 120ch: a 10-min 120-channel run let 92 FAS escape >1s, the longest 181s. Root causes:

Non-unique correlator. Only |A|×|B| pairs exist (16×8=128). Under sustained load the return-leg dialog set — kept alive 3–25s by the simulated ring — saturates the pair space, so a FAS A-leg gets matched to another call's return leg and is never seen as only-A. Detection silently stops.
kill-once + pre-ACK teardown. dlg.end_dlg (like dlg_bye) cannot build a BYE for a dialog in CONFIRMED_NA (200 seen, ACK not yet) → fails; the kill-once guard then never retries.

Lesson: a poll+pair correlator is the wrong tool. Correlation must be a per-call UNIQUE token and the teardown must run in-path.

What works: per-call unique CLI + in-path Kamailio teardown + `-rsa` B2BUA-lite

Unique CLI per call (sipp_orchestrator _unique_cli): keep the base number's realistic leading digits, overwrite a 7-digit tail with job%100 + call%100000. The CLI round-trips identically through AX, so it is a collision-proof correlator regardless of concurrency. (The (CLI,Bnum) pair pool is now only for route/realism variety, not detection.)
loopseen htable (size=14;autoexpire=120): the AX return leg (orig 1 → VAS) marks loopseen[<unique CLI>]. Causal invariant: the return-leg INVITE precedes the A-leg's 200 (the 200 propagates from the VAS answer), so a genuine answer always finds its marker → zero false positives; a FAS never sets it → detected.
In-path teardown in routing.cfg (all gated on the benchmark originator, real traffic untouched):
- event_route[dialog:start] + the in-dialog ACK hook: BYE a no-loop A-leg the instant it is ACK-confirmed (~0s billable);
- t_set_fr on the A-leg's ringing reply → CANCEL 487 during ring when no loop returned;
- onreply drop of the false 200 + a 1s dialog-timeout backstop.
-rsa (B2BUA-lite). A carrier running a FAS may strip our Record-Route on its fake 200; SIPp would then ACK/BYE the carrier Contact directly, bypassing Kamailio, leaving the dialog CONFIRMED_NA forever (no BYE buildable — this was the last residual: 1 FAS at 36s). Pinning SIPp's send address to Kamailio (-rsa <target>) keeps the proxy on the in-dialog path; a has_totag() handler relays the route-less sequential request (it would otherwise hit the 405). The ACK then confirms the dialog so every teardown can build its BYE. No-op for Record-Route-honoured (legit) calls.

The external watchdog stays as a secondary net (now reliable: unique CLI ends saturation, -rsa ends CONFIRMED_NA), mainly for the ring-forever-never-answer leak that has no 200 to react to.

Deploy: infra/playbooks/deploy_kamailio_config.yml (Kamailio-only, lint-gated) + the benchmark runner playbook (daemon). NB: group_vars/signaling.yml is now loaded explicitly by the signaling plays — it was never auto-loaded, so the kamailio_benchmark_* (and the dlg zombie-timeout) blocks had silently rendered empty until this fix.

Result: 120ch / 10-min / 10% AX FAS — escaped FAS >1s went 92 (max 181s) → 0, no false positives, no stuck SIPp slots.

Context​

Decision​

Consequences​

Positive​

Negative​

Operational notes​

Alternatives considered​

Keep Benchmark as an internal latency self-test​

FreeSWITCH originate/ESL as the primary generator​

Purely internal synthetic generator only​

External AX loop without test isolation​

References​

Amendment 2026-06-08 — Ring simulation, answered_at billing, FAS guard​

7. Human-like ring simulation + answered_at billing (IMPLEMENTED, commit 73d05747)​

8. FAS / false-answer + loop-integrity guard (DESIGN — ADR-0029, TASK-516/517)​

Amendment 2026-06-08 — FAS guard IMPLEMENTED (in-path Kamailio)​

What failed: external dlg.list-polling watchdog with a (CLI, B-number) correlator​

What works: per-call unique CLI + in-path Kamailio teardown + -rsa B2BUA-lite​