PostgreSQL Administration: High Availability and Disaster Recovery Guide

PostgreSQL is the world’s most advanced open-source relational database, powering critical applications for organizations worldwide. This comprehensive guide covers enterprise-grade administration, high availability configurations, performance tuning, backup strategies, and disaster recovery planning for production PostgreSQL deployments.

Table of Contents

  1. Installation and Initial Setup
  2. Advanced Configuration
  3. Security and Authentication
  4. Performance Tuning
  5. Backup Strategies
  6. Replication and High Availability
  7. Monitoring and Maintenance
  8. Disaster Recovery
  9. Version Upgrades
  10. Best Practices

1. Installation and Initial Setup

Installation on Ubuntu/Debian

# Add PostgreSQL APT repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -

# Update and install PostgreSQL 15
sudo apt update
sudo apt install -y postgresql-15 postgresql-client-15 postgresql-contrib-15

# Verify installation
psql --version

# Check service status
sudo systemctl status postgresql
sudo systemctl enable postgresql

Installation on RHEL/CentOS

# Install PostgreSQL repository
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm

# Disable built-in PostgreSQL module
sudo dnf -qy module disable postgresql

# Install PostgreSQL 15
sudo dnf install -y postgresql15-server postgresql15-contrib

# Initialize database cluster
sudo /usr/pgsql-15/bin/postgresql-15-setup initdb

# Start and enable service
sudo systemctl enable postgresql-15
sudo systemctl start postgresql-15

Initial Database Setup

# Switch to postgres user
sudo -i -u postgres

# Access PostgreSQL prompt
psql

-- Change postgres user password
ALTER USER postgres WITH ENCRYPTED PASSWORD 'StrongPassword123!';

-- Create application database
CREATE DATABASE webapp_prod;

-- Create application user
CREATE USER webapp_admin WITH ENCRYPTED PASSWORD 'AppPassword123!';

-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE webapp_prod TO webapp_admin;

-- Grant schema privileges
c webapp_prod
GRANT ALL ON SCHEMA public TO webapp_admin;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO webapp_admin;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO webapp_admin;

-- Set default privileges for future objects
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO webapp_admin;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO webapp_admin;

-- Create read-only user
CREATE USER webapp_readonly WITH ENCRYPTED PASSWORD 'ReadOnlyPass123!';
GRANT CONNECT ON DATABASE webapp_prod TO webapp_readonly;
GRANT USAGE ON SCHEMA public TO webapp_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO webapp_readonly;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO webapp_readonly;

2. Advanced Configuration

PostgreSQL Configuration File

# /etc/postgresql/15/main/postgresql.conf (Ubuntu)
# /var/lib/pgsql/15/data/postgresql.conf (RHEL)

#---------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#---------------------------------------------------------------------------

# Listen on all interfaces (change for production)
listen_addresses = '*'
port = 5432

# Maximum number of connections
max_connections = 200

# Connection pooling (consider PgBouncer for production)
# reserved_superuser_connections = 3

#---------------------------------------------------------------------------
# RESOURCE USAGE (Memory)
#---------------------------------------------------------------------------

# Shared memory for caching (25% of total RAM)
shared_buffers = 8GB

# Memory for query operations (total RAM / max_connections)
work_mem = 32MB

# Memory for maintenance operations (vacuum, index creation)
maintenance_work_mem = 2GB

# Optimizer's assumption of effective cache size (50-75% of RAM)
effective_cache_size = 24GB

# Memory for each autovacuum worker
autovacuum_work_mem = 2GB

#---------------------------------------------------------------------------
# WRITE AHEAD LOG (WAL)
#---------------------------------------------------------------------------

# WAL level for replication
wal_level = replica

# Minimum WAL size to keep
min_wal_size = 1GB

# Maximum WAL size before checkpoint
max_wal_size = 4GB

# WAL segment size
wal_segment_size = 16MB

# Archive command for backup (customize for your backup solution)
archive_mode = on
archive_command = 'rsync -a %p /mnt/wal_archive/%f'

# Number of WAL sender processes for replication
max_wal_senders = 10

# Replication slots
max_replication_slots = 10

# Keep WAL for standby
wal_keep_size = 1GB

#---------------------------------------------------------------------------
# REPLICATION
#---------------------------------------------------------------------------

# Synchronous standby servers
synchronous_standby_names = 'standby1,standby2'

# Synchronous commit setting
synchronous_commit = on

# Hot standby configuration (on standby servers)
hot_standby = on
hot_standby_feedback = on

#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------

# Planner cost constants
random_page_cost = 1.1  # For SSD storage
effective_io_concurrency = 200  # For SSD storage

# Query planning
default_statistics_target = 100

# Enable parallel query execution
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
max_worker_processes = 8

#---------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#---------------------------------------------------------------------------

# Where to log
log_destination = 'stderr'
logging_collector = on
log_directory = '/var/log/postgresql'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600

# Log rotation
log_rotation_age = 1d
log_rotation_size = 100MB

# What to log
log_min_duration_statement = 1000  # Log queries > 1 second
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0
log_autovacuum_min_duration = 0

# Log line format
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '

#---------------------------------------------------------------------------
# AUTOVACUUM
#---------------------------------------------------------------------------

autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 30s
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05

#---------------------------------------------------------------------------
# STATEMENT STATISTICS
#---------------------------------------------------------------------------

# Enable pg_stat_statements extension
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all

#---------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#---------------------------------------------------------------------------

# Timezone
timezone = 'UTC'

# Locale
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'

# Default text search configuration
default_text_search_config = 'pg_catalog.english'

Connection Pooling with PgBouncer

# Install PgBouncer
sudo apt install pgbouncer -y

# /etc/pgbouncer/pgbouncer.ini
[databases]
webapp_prod = host=localhost port=5432 dbname=webapp_prod

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3

# Server connection limits
server_lifetime = 3600
server_idle_timeout = 600

# Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1

# /etc/pgbouncer/userlist.txt
"webapp_admin" "md5"

# Start PgBouncer
sudo systemctl enable pgbouncer
sudo systemctl start pgbouncer

# Test connection through PgBouncer
psql -h 127.0.0.1 -p 6432 -U webapp_admin webapp_prod

3. Security and Authentication

pg_hba.conf Configuration

# /etc/postgresql/15/main/pg_hba.conf

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# Local connections
local   all             postgres                                peer
local   all             all                                     md5

# IPv4 local connections
host    all             all             127.0.0.1/32            md5

# Application server connections
host    webapp_prod     webapp_admin    10.0.1.0/24             md5
host    webapp_prod     webapp_readonly 10.0.1.0/24             md5

# Replication connections
host    replication     replicator      10.0.2.0/24             md5

# Deny all other connections
host    all             all             0.0.0.0/0               reject
host    all             all             ::/0                    reject

SSL/TLS Configuration

# Generate SSL certificates
sudo openssl req -new -x509 -days 365 -nodes -text 
    -out /etc/postgresql/15/main/server.crt 
    -keyout /etc/postgresql/15/main/server.key 
    -subj "/CN=postgres.example.com"

# Set permissions
sudo chown postgres:postgres /etc/postgresql/15/main/server.*
sudo chmod 600 /etc/postgresql/15/main/server.key

# Enable SSL in postgresql.conf
ssl = on
ssl_cert_file = '/etc/postgresql/15/main/server.crt'
ssl_key_file = '/etc/postgresql/15/main/server.key'

# Require SSL for remote connections in pg_hba.conf
hostssl    webapp_prod     webapp_admin    10.0.1.0/24     md5

# Client connection with SSL
psql "sslmode=require host=postgres.example.com dbname=webapp_prod user=webapp_admin"

Role-Based Access Control

-- Create roles with specific privileges
CREATE ROLE read_only_role;
GRANT CONNECT ON DATABASE webapp_prod TO read_only_role;
GRANT USAGE ON SCHEMA public TO read_only_role;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO read_only_role;

CREATE ROLE read_write_role;
GRANT CONNECT ON DATABASE webapp_prod TO read_write_role;
GRANT USAGE ON SCHEMA public TO read_write_role;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO read_write_role;

CREATE ROLE admin_role;
GRANT ALL PRIVILEGES ON DATABASE webapp_prod TO admin_role;

-- Create users and assign roles
CREATE USER analyst_user WITH ENCRYPTED PASSWORD 'AnalystPass123!';
GRANT read_only_role TO analyst_user;

CREATE USER app_user WITH ENCRYPTED PASSWORD 'AppPass123!';
GRANT read_write_role TO app_user;

-- Row-level security
CREATE POLICY user_policy ON users
    USING (user_id = current_user::text OR current_user = 'admin_role');

ALTER TABLE users ENABLE ROW LEVEL SECURITY;

4. Performance Tuning

Index Optimization

-- Enable pg_stat_statements for query analysis
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find missing indexes
SELECT
    schemaname,
    tablename,
    seq_scan,
    seq_tup_read,
    idx_scan,
    seq_tup_read / seq_scan AS avg_seq_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 10;

-- Create indexes (CONCURRENTLY to avoid table locks)
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
CREATE INDEX CONCURRENTLY idx_users_created_at ON users(created_at);
CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders(user_id);
CREATE INDEX CONCURRENTLY idx_orders_status_created ON orders(status, created_at);

-- Composite index for common queries
CREATE INDEX CONCURRENTLY idx_orders_user_status
ON orders(user_id, status)
WHERE status IN ('pending', 'processing');

-- Partial index for active records
CREATE INDEX CONCURRENTLY idx_users_active
ON users(email)
WHERE is_active = true;

-- Full-text search index
CREATE INDEX CONCURRENTLY idx_products_search
ON products USING GIN(to_tsvector('english', name || ' ' || description));

-- Find unused indexes
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
    AND indexrelid NOT IN (
        SELECT indexrelid FROM pg_index WHERE indisunique
    )
ORDER BY pg_relation_size(indexrelid) DESC;

Query Optimization

-- Analyze query plans
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;

-- Find slow queries
SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time,
    max_exec_time,
    rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Reset statistics
SELECT pg_stat_statements_reset();

-- Vacuum and analyze
VACUUM (ANALYZE, VERBOSE) users;
VACUUM (ANALYZE, VERBOSE) orders;

-- Auto-vacuum tuning for specific table
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.05,
    autovacuum_analyze_scale_factor = 0.02,
    autovacuum_vacuum_cost_delay = 10
);

-- Table bloat check
SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS external_size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

Connection and Memory Optimization

-- Check active connections
SELECT
    datname,
    usename,
    client_addr,
    state,
    COUNT(*) as connections
FROM pg_stat_activity
GROUP BY datname, usename, client_addr, state
ORDER BY connections DESC;

-- Find long-running queries
SELECT
    pid,
    now() - query_start AS duration,
    state,
    query
FROM pg_stat_activity
WHERE state != 'idle'
    AND query_start < now() - interval '5 minutes'
ORDER BY duration DESC;

-- Kill long-running query
SELECT pg_cancel_backend(pid);  -- Graceful
SELECT pg_terminate_backend(pid);  -- Forceful

-- Check table and index sizes
SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
    pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS indexes_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

5. Backup Strategies

Logical Backup with pg_dump

#!/bin/bash
# /usr/local/bin/postgres-backup.sh

set -euo pipefail

# Configuration
DB_NAME="webapp_prod"
DB_USER="webapp_admin"
DB_HOST="localhost"
BACKUP_DIR="/var/backups/postgresql"
RETENTION_DAYS=30
DATE=$(date +%Y%m%d_%H%M%S)
LOG_FILE="/var/log/postgresql/backup.log"

# Ensure backup directory exists
mkdir -p "$BACKUP_DIR"

# Logging function
log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG_FILE"
}

log "Starting PostgreSQL backup"

# Custom format backup (recommended - supports parallel restore)
BACKUP_FILE="${BACKUP_DIR}/${DB_NAME}_${DATE}.backup"
pg_dump -h "$DB_HOST" -U "$DB_USER" 
    --format=custom 
    --compress=9 
    --verbose 
    --file="$BACKUP_FILE" 
    "$DB_NAME" 2>&1 | tee -a "$LOG_FILE"

# Verify backup
if pg_restore --list "$BACKUP_FILE" > /dev/null 2>&1; then
    log "Backup verification successful"
else
    log "ERROR: Backup verification failed"
    exit 1
fi

# SQL format backup (human-readable)
SQL_BACKUP="${BACKUP_DIR}/${DB_NAME}_${DATE}.sql.gz"
pg_dump -h "$DB_HOST" -U "$DB_USER" 
    --format=plain 
    --verbose 
    "$DB_NAME" | gzip > "$SQL_BACKUP" 2>&1

# Schema-only backup
SCHEMA_BACKUP="${BACKUP_DIR}/${DB_NAME}_schema_${DATE}.sql"
pg_dump -h "$DB_HOST" -U "$DB_USER" 
    --schema-only 
    --verbose 
    --file="$SCHEMA_BACKUP" 
    "$DB_NAME" 2>&1 | tee -a "$LOG_FILE"

# Backup global objects (roles, tablespaces)
GLOBALS_BACKUP="${BACKUP_DIR}/globals_${DATE}.sql"
pg_dumpall -h "$DB_HOST" -U postgres 
    --globals-only 
    --file="$GLOBALS_BACKUP" 2>&1 | tee -a "$LOG_FILE"

# Calculate backup size
BACKUP_SIZE=$(du -sh "$BACKUP_FILE" | cut -f1)
log "Backup size: $BACKUP_SIZE"

# Upload to S3 (optional)
if command -v aws &> /dev/null; then
    aws s3 cp "$BACKUP_FILE" "s3://my-postgres-backups/${DB_NAME}/"
    log "Backup uploaded to S3"
fi

# Cleanup old backups
log "Cleaning up backups older than ${RETENTION_DAYS} days"
find "$BACKUP_DIR" -name "*.backup" -mtime +${RETENTION_DAYS} -delete
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +${RETENTION_DAYS} -delete

log "Backup completed successfully"

# Send notification
echo "PostgreSQL backup completed: $BACKUP_FILE ($BACKUP_SIZE)" | 
    mail -s "PostgreSQL Backup Success" admin@example.com

Physical Backup with pg_basebackup

#!/bin/bash
# Full physical backup

BACKUP_DIR="/var/backups/postgresql/basebackup"
DATE=$(date +%Y%m%d_%H%M%S)

mkdir -p "$BACKUP_DIR"

# Create base backup
pg_basebackup -h localhost -U replicator 
    -D "${BACKUP_DIR}/base_${DATE}" 
    -Ft -z -P 
    --wal-method=stream 
    --checkpoint=fast

# Backup size
du -sh "${BACKUP_DIR}/base_${DATE}"

# Cleanup old base backups (keep last 7)
ls -t "$BACKUP_DIR" | tail -n +8 | xargs -I {} rm -rf "${BACKUP_DIR}/{}"

Point-in-Time Recovery (PITR) Setup

# Configure WAL archiving in postgresql.conf
archive_mode = on
archive_command = 'test ! -f /mnt/wal_archive/%f && cp %p /mnt/wal_archive/%f'

# Create WAL archive directory
sudo mkdir -p /mnt/wal_archive
sudo chown postgres:postgres /mnt/wal_archive
sudo chmod 700 /mnt/wal_archive

# Restart PostgreSQL
sudo systemctl restart postgresql

# Base backup for PITR
pg_basebackup -h localhost -U replicator 
    -D /var/backups/pitr_base 
    -Fp -Xs -P -R

# Recovery configuration (recovery.conf or postgresql.auto.conf)
restore_command = 'cp /mnt/wal_archive/%f %p'
recovery_target_time = '2025-01-15 14:30:00'
recovery_target_action = 'promote'

Automated Backup Scheduling

# Cron job for automated backups
# /etc/cron.d/postgresql-backup

# Daily full backup at 2 AM
0 2 * * * postgres /usr/local/bin/postgres-backup.sh >> /var/log/postgresql/backup-cron.log 2>&1

# Hourly incremental WAL archiving check
0 * * * * postgres /usr/local/bin/wal-archive-check.sh >> /var/log/postgresql/wal-archive.log 2>&1

# Weekly vacuum and analyze
0 3 * * 0 postgres psql -d webapp_prod -c "VACUUM ANALYZE;" >> /var/log/postgresql/maintenance.log 2>&1

6. Replication and High Availability

Streaming Replication Setup

Primary Server Configuration

# 1. Create replication user on primary
sudo -u postgres psql

CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'ReplicatorPass123!';

# 2. Configure postgresql.conf on primary
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
wal_keep_size = 1GB
synchronous_standby_names = 'standby1'  # For synchronous replication

# 3. Configure pg_hba.conf on primary
host    replication     replicator      10.0.2.0/24             md5

# 4. Restart primary
sudo systemctl restart postgresql

# 5. Create replication slot
sudo -u postgres psql -c "SELECT * FROM pg_create_physical_replication_slot('standby1_slot');"

Standby Server Setup

# 1. Stop PostgreSQL on standby
sudo systemctl stop postgresql

# 2. Clear data directory
sudo rm -rf /var/lib/postgresql/15/main/*

# 3. Create base backup from primary
sudo -u postgres pg_basebackup 
    -h primary-server.example.com 
    -U replicator 
    -D /var/lib/postgresql/15/main 
    -Fp -Xs -P -R 
    -S standby1_slot

# 4. Configure standby-specific settings
sudo -u postgres cat >> /var/lib/postgresql/15/main/postgresql.auto.conf << EOF
primary_conninfo = 'host=primary-server.example.com port=5432 user=replicator password=ReplicatorPass123!'
primary_slot_name = 'standby1_slot'
hot_standby = on
hot_standby_feedback = on
EOF

# 5. Create standby signal file
sudo -u postgres touch /var/lib/postgresql/15/main/standby.signal

# 6. Start standby
sudo systemctl start postgresql

# 7. Verify replication on primary
sudo -u postgres psql -c "SELECT * FROM pg_stat_replication;"

# 8. Check replication lag
sudo -u postgres psql -c "
SELECT
    client_addr,
    state,
    pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS byte_lag,
    pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) / 1024 / 1024 AS mb_lag
FROM pg_stat_replication;"

Automatic Failover with Patroni

# Install Patroni and etcd
sudo apt install python3-pip python3-psycopg2 etcd -y
sudo pip3 install patroni[etcd]

# Patroni configuration
# /etc/patroni/patroni.yml
scope: postgres-cluster
namespace: /db/
name: node1

restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.0.2.11:8008

etcd:
  hosts: 10.0.2.10:2379,10.0.2.11:2379,10.0.2.12:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:
        max_connections: 200
        shared_buffers: 8GB
        effective_cache_size: 24GB
        wal_level: replica
        max_wal_senders: 10

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.0.2.11:5432
  data_dir: /var/lib/postgresql/15/main
  bin_dir: /usr/lib/postgresql/15/bin
  pgpass: /tmp/pgpass
  authentication:
    replication:
      username: replicator
      password: ReplicatorPass123!
    superuser:
      username: postgres
      password: PostgresPass123!

  parameters:
    unix_socket_directories: '/var/run/postgresql'

  create_replica_methods:
    - basebackup
  basebackup:
    checkpoint: 'fast'

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false

# Start Patroni
sudo systemctl enable patroni
sudo systemctl start patroni

# Check cluster status
patronictl -c /etc/patroni/patroni.yml list

# Manual failover
patronictl -c /etc/patroni/patroni.yml failover

Connection Load Balancing with HAProxy

# Install HAProxy
sudo apt install haproxy -y

# /etc/haproxy/haproxy.cfg
global
    maxconn 100

defaults
    log global
    mode tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /

listen postgres
    bind *:5000
    option httpchk
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server node1 10.0.2.11:5432 maxconn 100 check port 8008
    server node2 10.0.2.12:5432 maxconn 100 check port 8008
    server node3 10.0.2.13:5432 maxconn 100 check port 8008

listen postgres_read
    bind *:5001
    option httpchk GET /replica
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server node1 10.0.2.11:5432 maxconn 100 check port 8008
    server node2 10.0.2.12:5432 maxconn 100 check port 8008
    server node3 10.0.2.13:5432 maxconn 100 check port 8008

# Restart HAProxy
sudo systemctl restart haproxy

7. Monitoring and Maintenance

Database Health Monitoring Script

#!/bin/bash
# /usr/local/bin/postgres-monitor.sh

DB_NAME="webapp_prod"
DB_USER="postgres"
ALERT_EMAIL="admin@example.com"

# Check database size
db_size=$(psql -U $DB_USER -d $DB_NAME -t -c "
    SELECT pg_size_pretty(pg_database_size('$DB_NAME'));" | xargs)

echo "Database size: $db_size"

# Check connection count
connections=$(psql -U $DB_USER -d $DB_NAME -t -c "
    SELECT count(*) FROM pg_stat_activity;" | xargs)

max_connections=$(psql -U $DB_USER -d $DB_NAME -t -c "
    SHOW max_connections;" | xargs)

connection_percent=$((connections * 100 / max_connections))

echo "Connections: $connections / $max_connections ($connection_percent%)"

if [ $connection_percent -gt 80 ]; then
    echo "WARNING: Connection usage over 80%" | mail -s "PostgreSQL Alert" $ALERT_EMAIL
fi

# Check replication lag
if psql -U $DB_USER -t -c "SELECT pg_is_in_recovery();" | grep -q 't'; then
    lag_seconds=$(psql -U $DB_USER -t -c "
        SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()));" | xargs)

    echo "Replication lag: ${lag_seconds}s"

    if (( $(echo "$lag_seconds > 60" | bc -l) )); then
        echo "WARNING: Replication lag over 60 seconds" | mail -s "PostgreSQL Replication Alert" $ALERT_EMAIL
    fi
fi

# Check for long-running queries
long_queries=$(psql -U $DB_USER -d $DB_NAME -t -c "
    SELECT COUNT(*) FROM pg_stat_activity
    WHERE state != 'idle'
    AND query_start < now() - interval '10 minutes';" | xargs)

if [ "$long_queries" -gt 0 ]; then
    echo "WARNING: $long_queries long-running queries detected"
    psql -U $DB_USER -d $DB_NAME -c "
        SELECT pid, now() - query_start AS duration, state, query
        FROM pg_stat_activity
        WHERE state != 'idle'
        AND query_start < now() - interval '10 minutes';"
fi

# Check table bloat
psql -U $DB_USER -d $DB_NAME << EOF
SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    n_dead_tup,
    n_live_tup,
    ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 2) AS dead_tuple_percent
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
    AND n_dead_tup::numeric / NULLIF(n_live_tup, 0) > 0.1
ORDER BY dead_tuple_percent DESC;
EOF

Prometheus Monitoring with postgres_exporter

# Install postgres_exporter
wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.12.0/postgres_exporter-0.12.0.linux-amd64.tar.gz
tar xvfz postgres_exporter-0.12.0.linux-amd64.tar.gz
sudo mv postgres_exporter-0.12.0.linux-amd64/postgres_exporter /usr/local/bin/

# Create systemd service
sudo tee /etc/systemd/system/postgres_exporter.service << EOF
[Unit]
Description=PostgreSQL Exporter
After=network.target

[Service]
Type=simple
User=postgres
Environment=DATA_SOURCE_NAME="postgresql://postgres:password@localhost:5432/postgres?sslmode=disable"
ExecStart=/usr/local/bin/postgres_exporter
Restart=always

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable postgres_exporter
sudo systemctl start postgres_exporter

# Metrics available at http://localhost:9187/metrics

8. Disaster Recovery

Restore from Backup

#!/bin/bash
# Restore database from backup

BACKUP_FILE="/var/backups/postgresql/webapp_prod_20250115_020000.backup"
DB_NAME="webapp_prod"

# 1. Drop existing database (if exists)
sudo -u postgres dropdb --if-exists $DB_NAME

# 2. Create new database
sudo -u postgres createdb $DB_NAME

# 3. Restore from custom format backup
sudo -u postgres pg_restore 
    --dbname=$DB_NAME 
    --jobs=4 
    --verbose 
    $BACKUP_FILE

# 4. Restore from SQL dump
# gunzip -c backup.sql.gz | sudo -u postgres psql -d $DB_NAME

# 5. Restore globals (roles and tablespaces)
sudo -u postgres psql -f /var/backups/postgresql/globals_20250115_020000.sql

# 6. Verify restoration
sudo -u postgres psql -d $DB_NAME -c "dt"
sudo -u postgres psql -d $DB_NAME -c "SELECT COUNT(*) FROM users;"

Point-in-Time Recovery

# 1. Stop PostgreSQL
sudo systemctl stop postgresql

# 2. Backup current data directory
sudo mv /var/lib/postgresql/15/main /var/lib/postgresql/15/main.old

# 3. Restore base backup
sudo cp -R /var/backups/pitr_base /var/lib/postgresql/15/main
sudo chown -R postgres:postgres /var/lib/postgresql/15/main

# 4. Create recovery configuration
sudo -u postgres cat > /var/lib/postgresql/15/main/recovery.signal << EOF
# This file triggers recovery mode
EOF

sudo -u postgres tee -a /var/lib/postgresql/15/main/postgresql.auto.conf << EOF
restore_command = 'cp /mnt/wal_archive/%f %p'
recovery_target_time = '2025-01-15 14:30:00'
recovery_target_action = 'promote'
EOF

# 5. Start PostgreSQL (will enter recovery mode)
sudo systemctl start postgresql

# 6. Monitor recovery
sudo -u postgres tail -f /var/log/postgresql/postgresql-15-main.log

# 7. Verify recovery
sudo -u postgres psql -d webapp_prod -c "SELECT MAX(created_at) FROM orders;"

9. Version Upgrades

Major Version Upgrade (14 to 15)

# 1. Install new PostgreSQL version
sudo apt install postgresql-15 postgresql-client-15

# 2. Stop both clusters
sudo systemctl stop postgresql

# 3. Check compatibility
sudo -u postgres /usr/lib/postgresql/15/bin/pg_upgrade 
    --old-datadir=/var/lib/postgresql/14/main 
    --new-datadir=/var/lib/postgresql/15/main 
    --old-bindir=/usr/lib/postgresql/14/bin 
    --new-bindir=/usr/lib/postgresql/15/bin 
    --check

# 4. Perform upgrade
sudo -u postgres /usr/lib/postgresql/15/bin/pg_upgrade 
    --old-datadir=/var/lib/postgresql/14/main 
    --new-datadir=/var/lib/postgresql/15/main 
    --old-bindir=/usr/lib/postgresql/14/bin 
    --new-bindir=/usr/lib/postgresql/15/bin 
    --link

# 5. Start new cluster
sudo systemctl start postgresql@15-main

# 6. Update statistics
sudo -u postgres /usr/lib/postgresql/15/bin/vacuumdb --all --analyze-in-stages

# 7. Run generated script to delete old cluster
sudo ./delete_old_cluster.sh

# 8. Update configuration and extensions
sudo -u postgres psql -c "ALTER EXTENSION pg_stat_statements UPDATE;"

10. Best Practices

Performance Best Practices

  • Use Connection Pooling: Implement PgBouncer or application-level pooling
  • Optimize Queries: Use EXPLAIN ANALYZE and pg_stat_statements
  • Index Strategically: Create indexes on frequently queried columns
  • Regular Maintenance: Schedule VACUUM ANALYZE during low-traffic periods
  • Monitor Statistics: Keep table statistics current for optimal query planning
  • Partition Large Tables: Use table partitioning for multi-TB tables

Security Best Practices

  • Use Strong Passwords: Minimum 16 characters with complexity
  • Enable SSL/TLS: Encrypt all network connections
  • Restrict pg_hba.conf: Allow only necessary hosts and users
  • Regular Updates: Apply security patches promptly
  • Audit Logging: Enable detailed logging for security events
  • Row-Level Security: Implement for multi-tenant applications

Backup Best Practices

  • 3-2-1 Rule: 3 copies, 2 different media, 1 offsite
  • Test Restores: Regularly verify backup integrity
  • Automate Backups: Use cron or systemd timers
  • Monitor Backup Size: Track growth trends
  • Document Procedures: Maintain recovery runbooks
  • Encrypt Backups: Protect sensitive data at rest

High Availability Best Practices

  • Synchronous Replication: For zero data loss requirements
  • Monitor Replication Lag: Alert on lag > acceptable threshold
  • Automate Failover: Use Patroni or similar tools
  • Load Balance Reads: Distribute read traffic to replicas
  • Test Failover: Regular disaster recovery drills
  • Geographic Distribution: Multi-region for disaster recovery

Conclusion

PostgreSQL administration requires careful planning, continuous monitoring, and adherence to best practices. Key takeaways:

  • Configuration: Tune parameters based on workload and hardware
  • Security: Implement defense-in-depth with SSL, authentication, and auditing
  • Performance: Regular maintenance, proper indexing, and query optimization
  • Backup: Automated, tested, and offsite backup strategies
  • High Availability: Streaming replication with automatic failover
  • Monitoring: Proactive alerting and performance tracking
  • Disaster Recovery: Documented procedures and regular testing

Regular maintenance, monitoring, and following these practices ensures a robust, performant, and highly available PostgreSQL deployment.

Additional Resources

Was this article helpful?

🏷️ Tags: administration backup database high-availability postgresql replication
R

About Ramesh Sundararamaiah

Red Hat Certified Architect

Expert in Linux system administration, DevOps automation, and cloud infrastructure. Specializing in Red Hat Enterprise Linux, CentOS, Ubuntu, Docker, Ansible, and enterprise IT solutions.