Postgres Performance Tuning Runbook
Scope
This runbook covers the baseline delivered by TASK-204: pg_stat_statements,
auto_explain, slow query logging, Prometheus alerts, Grafana dashboard and
weekly slow query reports.
Baseline
pg_stat_statementsis loaded throughshared_preload_librariesand enabled in theakiradatabase by Alembic and thetimescaledbAnsible role.auto_explainis loaded throughshared_preload_libraries; it is a preload module, not a SQL extension.- Query plans are logged for statements over 500ms with
ANALYZE,BUFFERSand JSON output. - Plain slow statement logging starts at 1000ms.
- DDL and writes are logged with
log_statement = 'mod'.
Changing shared_preload_libraries requires a PostgreSQL restart. Schedule a
maintenance window before deploying this role to an already running database.
Triage
- Open Grafana dashboard
Akira Postgres Performance. - Check
Top 10 slow queriesand sort by mean execution time. - Check cache hit ratio; sustained values below 95% usually mean table/index working set pressure or missing indexes.
- Check lock waits and long transactions before tuning indexes.
- Use Loki panel
Auto-explain log entries last hourto inspect JSON plans. - If the issue is CDR growth, compare hypertable size and chunk count before changing general Postgres memory settings.
Weekly Report
The timescaledb role installs:
/usr/local/bin/akira-postgres-slow-query-reportakira-postgres-slow-query-report.serviceakira-postgres-slow-query-report.timer
The timer runs every Sunday at 06:00 UTC and writes reports to
/opt/akira/reports.
Run manually on the primary DB node:
sudo -u postgres REPORT_DIR=/opt/akira/reports DB_NAME=akira \
/usr/local/bin/akira-postgres-slow-query-report
Run from a repository checkout against a remote DB node:
POSTGRES_SSH_TARGET=root@akira-db-01-staging \
REPORT_DIR=reports DB_NAME=akira \
scripts/postgres-slow-query-report.sh
Reset Baseline
Reset pg_stat_statements after a migration, an indexing change or a tuning
window:
sudo -u postgres psql -d akira -c "SELECT pg_stat_statements_reset();"
Record the reset timestamp in the weekly report or incident notes, otherwise pre/post tuning comparisons become misleading.
Alerts
PostgresQueryDurationHigh: high disk reads, likely slow query or cache miss.PostgresLongTransaction: active transaction over 60 seconds.PostgresDeadlockSpike: deadlocks are increasing.PostgresLowCacheHitRatio: buffer cache hit ratio below 95%.
Existing replica, connection and long-running-query alerts remain in
infra/roles/alertmanager/rules/postgres.yml.