Passa al contenuto principale

Postgres Performance Tuning Runbook

Scope

This runbook covers the baseline delivered by TASK-204: pg_stat_statements, auto_explain, slow query logging, Prometheus alerts, Grafana dashboard and weekly slow query reports.

Baseline

  • pg_stat_statements is loaded through shared_preload_libraries and enabled in the akira database by Alembic and the timescaledb Ansible role.
  • auto_explain is loaded through shared_preload_libraries; it is a preload module, not a SQL extension.
  • Query plans are logged for statements over 500ms with ANALYZE, BUFFERS and JSON output.
  • Plain slow statement logging starts at 1000ms.
  • DDL and writes are logged with log_statement = 'mod'.

Changing shared_preload_libraries requires a PostgreSQL restart. Schedule a maintenance window before deploying this role to an already running database.

Triage

  1. Open Grafana dashboard Akira Postgres Performance.
  2. Check Top 10 slow queries and sort by mean execution time.
  3. Check cache hit ratio; sustained values below 95% usually mean table/index working set pressure or missing indexes.
  4. Check lock waits and long transactions before tuning indexes.
  5. Use Loki panel Auto-explain log entries last hour to inspect JSON plans.
  6. If the issue is CDR growth, compare hypertable size and chunk count before changing general Postgres memory settings.

Weekly Report

The timescaledb role installs:

  • /usr/local/bin/akira-postgres-slow-query-report
  • akira-postgres-slow-query-report.service
  • akira-postgres-slow-query-report.timer

The timer runs every Sunday at 06:00 UTC and writes reports to /opt/akira/reports.

Run manually on the primary DB node:

sudo -u postgres REPORT_DIR=/opt/akira/reports DB_NAME=akira \
/usr/local/bin/akira-postgres-slow-query-report

Run from a repository checkout against a remote DB node:

POSTGRES_SSH_TARGET=root@akira-db-01-staging \
REPORT_DIR=reports DB_NAME=akira \
scripts/postgres-slow-query-report.sh

Reset Baseline

Reset pg_stat_statements after a migration, an indexing change or a tuning window:

sudo -u postgres psql -d akira -c "SELECT pg_stat_statements_reset();"

Record the reset timestamp in the weekly report or incident notes, otherwise pre/post tuning comparisons become misleading.

Alerts

  • PostgresQueryDurationHigh: high disk reads, likely slow query or cache miss.
  • PostgresLongTransaction: active transaction over 60 seconds.
  • PostgresDeadlockSpike: deadlocks are increasing.
  • PostgresLowCacheHitRatio: buffer cache hit ratio below 95%.

Existing replica, connection and long-running-query alerts remain in infra/roles/alertmanager/rules/postgres.yml.