Passa al contenuto principale

ADR-0019 - NATS placement: cache-01 stateful tier

  • Status: Accepted (2026-05-21)
  • Deciders: Massimo Bagnoli, Claude
  • Implementation tasks: TASK-111, TASK-202
  • Supersedes: ADR-0007 staging deployment note that placed NATS on mgmt-01
  • Superseded by: nessuno

Context

NATS broker deployment had two candidate hosts for the pilot:

  • akira-mgmt-01-staging, the app tier that runs backend FastAPI, Caddy, Vault and Prometheus.
  • akira-cache-01-staging, the stateful tier that already hosts Redis.

NATS JetStream is not a purely stateless broker in this design because it uses file storage for durable streams.

Decision

Place NATS on akira-cache-01-staging.

The pilot uses a single-node NATS deployment with a connection string of the form nats://akira:pass@100.x.x.x:4222. TASK-202 prepares the GA migration to a 3-node NATS cluster on distinct VMs. cache-01 can later evolve into a dedicated nats-01 node if the workload justifies it.

Rationale

Akira follows a 3-tier operational model:

  • Management/app tier: backend FastAPI, Caddy, Vault and Prometheus.
  • Signaling/media tier: Kamailio, RTPengine and FreeSWITCH.
  • Stateful tier: Postgres, Redis and NATS JetStream.

NATS belongs to the stateful tier because JetStream persists messages on file storage. Keeping it off mgmt-01 also preserves the app tier's future horizontal scaling path and isolates app-tier failures from the event bus.

cache-01 already has Docker installed for Qdrant, so the marginal pilot deployment cost is low.

Consequences

Positive

  • State-bearing services stay grouped on the stateful tier.
  • mgmt-01 remains closer to a stateless app and observability node.
  • Failure of mgmt-01 does not necessarily take NATS down.
  • Pilot cost stays lower than provisioning a dedicated NATS VM immediately.

Negative

  • cache-01 carries another stateful service and must be monitored for disk, CPU and memory contention.
  • Backup procedures for cache-01 must include JetStream state.
  • The pilot remains single-node until TASK-202 introduces HA.

Alternatives considered

Place NATS on mgmt-01

Rejected. It couples stateful event-bus storage to the app tier and conflicts with the intended scaling model.

Create a dedicated nats-01 VM immediately

Rejected for the pilot. The additional monthly cost is not justified before HA cluster work and production load require it.

References

  • ADR-0007: original CDR pipeline NATS decision.
  • ADR-0016: concrete CDR pipeline implementation.
  • TASK-111: NATS role on cache-01.
  • TASK-202: NATS cluster HA preparation.
  • feedback_runner_executes_code_not_deploy.md: 3-tier deployment pattern.