Passa al contenuto principale

ADR-0027 - Quality policy recipients/actions canonical model

  • Status: Accepted (2026-06-04)
  • Deciders: Massimo Bagnoli
  • Implementation tasks: TASK-453, TASK-475, TASK-498 (amendment), TASK-502 (follow-up implementation)
  • Supersedes: stopgap TASK-453 metadata in quality_policies.description
  • Superseded by: nessuno

Context

TASK-453 introduced the Quality/SLA policy CRUD and dry-run workflow with a temporary metadata encoding: notification recipients and breach actions were serialized into quality_policies.description under the marker __akira_policy_meta:{...}.

That approach overloaded a free-text field with structured product state. It repeated the same anti-pattern previously removed from offers metadata and made policy reads dependent on frontend-only parsing. Quality policy configuration needs a canonical database model because recipients and actions are operational state: they are queried, audited, rendered, and eventually executed by workers.

Decision

  1. Recipients and actions use dedicated tables.

    Akira stores notification recipients in quality_policy_notifications:

    • id
    • policy_id
    • channel: email | telegram | webhook | sms
    • target
    • is_active

    Akira stores breach actions in quality_policy_actions:

    • id
    • policy_id
    • kind: notify | webhook | escalate | open_ticket | mute_suggest
    • payload JSONB
    • sort_order

    This is option B from the implementation task: normalize the structured state alongside quality_policy_thresholds.

  2. quality_policies.description returns to free text.

    The description column is only human-authored policy text. It must not carry API-owned JSON metadata after the back-compat migration has run.

  3. Dry-run evaluation window is derived from thresholds.

    Quality dry-run evaluates the largest window_seconds present in the input thresholds. If no threshold window is available from older clients, the fallback remains 60 minutes. The dry-run remains non-persistent and keeps a bounded result set of 50 rows.

  4. Quality alert mute supports an optional expiration.

    quality_alerts.muted_until is nullable:

    • NULL means indefinite mute.
    • a timestamp means the mute expires at or after that instant.

    reason remains mandatory and is stored in quality_alerts.notes and the audit trail. When a mute expires, the alert can be surfaced as open again.

  5. Back-compat reader migrates TASK-453 stopgap metadata.

    Reads of quality policies must detect __akira_policy_meta:{...} in description, insert mapped recipients/actions into the canonical tables, and clean description to the free-text portion. This one-shot reader keeps existing TASK-453 data from being lost during deployment order changes.

Consequences

Positive

  • The backend owns the Quality policy contract instead of the frontend parsing a free-text field.
  • Recipients/actions can be listed, patched, deleted, audited, and used by workers without reparsing descriptions.
  • Policy descriptions are safe for user-facing text again.
  • Temporary mutes can expire without requiring a manual unmute.

Negative

  • Existing stopgap data needs a compatibility path during rollout.
  • Policy creation now involves additional rows after the base policy row.
  • Legacy stopgap values that are not in the canonical enum need an explicit mapping or lead clarification.

Back-compat / migrazione

La migrazione dello stopgap TASK-453 deve preservare i dati storici senza rietichettare intenti operativi come trasporti di notifica.

  • I recipient con channel in email | telegram | webhook | sms vengono migrati in quality_policy_notifications.
  • Il recipient legacy oncall non e' un trasporto e non deve diventare telegram: viene migrato in quality_policy_actions con kind=escalate e payload {"legacy_channel":"oncall","legacy_target":"..."}.
  • L'action legacy auto_mute viene migrata in kind=mute_suggest. Non esiste un auto-mute esecutivo in questa ADR.
  • I valori legacy non mappabili non vengono scartati: vengono messi in quarantena come action kind=escalate con payload JSONB legacy_needs_review=true e il valore originale (legacy_channel, legacy_target o legacy_action), oltre a un warning nel reader runtime.
  • Dopo la migrazione, quality_policies.description mantiene solo il testo libero precedente al marker __akira_policy_meta:.

Amendment 2026-06-08 - min_samples and dry-run lookback

TASK-498 lead review corrected two points where this ADR had collapsed separate concepts:

  1. quality_policy_thresholds must include min_samples INT NOT NULL DEFAULT 20. The field is canonical anti-noise state from cap.23.3, not frontend cruft. Policy evaluation must consider a threshold eligible only when the number of samples in its window_seconds bucket is at least min_samples.

  2. Quality dry-run has a simulation lookback that is separate from threshold evaluation windows. The cap.27.12 dry-run default lookback is 7 days and must be configurable. Within that lookback, thresholds are evaluated in buckets of their own window_seconds values. The previous rule "largest threshold window, fallback 60 minutes" is deprecated and remains only as the historical TASK-453/TASK-475 implementation note above.

Implementation is intentionally deferred to TASK-502: additive migration, SQLAlchemy model, Pydantic schemas, OpenAPI/API types, evaluation gating, dry-run lookback, and wizard field.

References

  • ADR-0023: Tariffs UI parity and stopgap-to-canonical cleanup pattern.
  • ADR-0024: Offers canonical metadata successor pattern.
  • TASK-453: Quality/SLA policy management dry-run audit.
  • TASK-475: Quality policy canonical model ADR-0027.
  • TASK-498: Quality/SLA parity verification and ADR amendment.