Capacity Planning - Akira pilot to GA

This baseline defines when Akira should scale capacity during the path from staging to pilot and GA. Values are planning targets: real validation is tied to SIPp load testing and production telemetry.

Sizing matrix

Phase	Target	VM types	Cost/m	Capacity target
Staging (Trigger #2 closed)	dev/test	12 VM mix cx23/cx33/cx43	EUR 82	50 cps, 500 concurrent
Pilot Phase 1 (single client)	1-3 customers	same staging	EUR 82	100 cps, 1000 concurrent
Pilot Phase 2 (5-10 customers)	grow + SLA	upgrade mgmt+db	EUR 150	250 cps, 2500 concurrent
GA Phase 1 (20-50 customers)	production	upgrade signaling+media	EUR 400	500 cps, 5000 concurrent
GA Phase 2 (100+ customers)	scale-out	multi-region nbg1+fsn1	EUR 1000+	1000+ cps, 10000+ concurrent

Component CPU/RAM ceiling

Scale actions should be planned when thresholds remain above target for the alert window, not on one-off spikes.

Component	CPU 70% threshold action	RAM 80% threshold action
Kamailio sip-01/02	Scale up cx33 to cx43 (4 to 8 vCPU)	Investigate htable size and OOM tuning
RTPengine rtp-01/02	Scale up and add rtp-03 node	Investigate active session count
FreeSWITCH fs-01/vas-01	Scale up cx43 to cpx41	Check OOM events and transcoding load
Postgres db-01	Investigate slow queries, indexes, and replica routing	Check pg_buffercache and tune shared_buffers
Redis cache-01	Investigate unusual CPU or command mix	Tune maxmemory policy and key TTL profile
NATS	Tune stream and consumer count	Move JetStream storage tier from memory to file when needed
Backend mgmt-01	Add second management node for HA	Tune asyncpg pool and worker concurrency

Capacity warning alerts

Capacity warnings are versioned in infra/roles/prometheus/files/rules/capacity.yml and installed with the Prometheus role.

Alert	Trigger	Initial action
NodeCpuHighSustained	Node CPU above 70% for 15 minutes	Check top processes and decide scale-up vs load redistribution
NodeMemoryHigh	Node memory above 80% for 10 minutes	Check service RSS, OOM risk, and cache pressure
NodeDiskHigh	Root disk above 80% for 5 minutes	Check logs, Prometheus/Loki/Timescale growth, backups
KamailioCpsHigh	INVITE CPS above 80 for 5 minutes	Compare to pilot target and SIPp results, plan signaling scale
RTPengineSessionsHigh	Active RTP sessions above 800 for 5 minutes	Prepare rtp-03 or node resize before 1000-session ceiling
PostgresConnectionsHigh	Connections above 80% of max for 5 minutes	Check pgbouncer pools and backend connection churn

Grafana dashboard

The capacity dashboard is provisioned as infra/roles/grafana/files/dashboards/akira-capacity-sizing.json.

Core panels:

CPU, RAM, and disk saturation by node with 70/80% threshold coloring.
Kamailio CPS 24h trend with pilot target 50 cps and breakpoint 80 cps.
RTPengine active sessions with 500 target and 1000 breakpoint.
Postgres connection saturation, slow-query proxy, and buffer hit ratio.
NATS message rate and consumer lag placeholders for post-deploy telemetry.
TimescaleDB/Postgres disk growth estimate in GB/day.
Static cost projection matrix for current capacity phase.

Runbook: docs/runbooks/capacity-scaling.md.

Weekly baseline report

scripts/capacity-baseline.sh writes reports/capacity-YYYY-Www.md with:

pg_stat_statements top 10 by total execution time.
Timescale hypertable sizes.
Kamailio CPS p95 over the last 7 days from Prometheus.
Manual cost-trend notes and weekly recommendations.

The Prometheus role can deploy a weekly systemd timer that runs every Sunday at 06:00 UTC. The timer is opt-in through prometheus_capacity_baseline_enabled.

Validation notes

Baseline numbers are projections until TASK-199 load testing produces SIPp evidence.
CPU/RAM thresholds are planning thresholds, not emergency thresholds.
pg_stat_statements must be present on Postgres. The baseline script attempts CREATE EXTENSION IF NOT EXISTS pg_stat_statements, but the DB still needs shared_preload_libraries configured before the extension can collect data.

Sizing matrix​

Component CPU/RAM ceiling​

Capacity warning alerts​

Grafana dashboard​

Weekly baseline report​

Validation notes​