PostgreSQL Administration: High Availability and Disaster Recovery Guide
PostgreSQL is the world’s most advanced open-source relational database, powering critical applications for organizations worldwide. This comprehensive guide covers enterprise-grade administration, high availability configurations, performance tuning, backup strategies, and disaster recovery planning for production PostgreSQL deployments.
📑 Table of Contents
- Table of Contents
- 1. Installation and Initial Setup
- Installation on Ubuntu/Debian
- Installation on RHEL/CentOS
- Initial Database Setup
- 2. Advanced Configuration
- PostgreSQL Configuration File
- Connection Pooling with PgBouncer
- 3. Security and Authentication
- pg_hba.conf Configuration
- SSL/TLS Configuration
- Role-Based Access Control
- 4. Performance Tuning
- Index Optimization
- Query Optimization
- Connection and Memory Optimization
- 5. Backup Strategies
- Logical Backup with pg_dump
- Physical Backup with pg_basebackup
- Point-in-Time Recovery (PITR) Setup
- Automated Backup Scheduling
- 6. Replication and High Availability
- Streaming Replication Setup
- Automatic Failover with Patroni
- Connection Load Balancing with HAProxy
- 7. Monitoring and Maintenance
- Database Health Monitoring Script
- Prometheus Monitoring with postgres_exporter
- 8. Disaster Recovery
- Restore from Backup
- Point-in-Time Recovery
- 9. Version Upgrades
- Major Version Upgrade (14 to 15)
- 10. Best Practices
- Performance Best Practices
- Security Best Practices
- Backup Best Practices
- High Availability Best Practices
- Conclusion
- Additional Resources
Table of Contents
- Installation and Initial Setup
- Advanced Configuration
- Security and Authentication
- Performance Tuning
- Backup Strategies
- Replication and High Availability
- Monitoring and Maintenance
- Disaster Recovery
- Version Upgrades
- Best Practices
1. Installation and Initial Setup
Installation on Ubuntu/Debian
# Add PostgreSQL APT repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
# Update and install PostgreSQL 15
sudo apt update
sudo apt install -y postgresql-15 postgresql-client-15 postgresql-contrib-15
# Verify installation
psql --version
# Check service status
sudo systemctl status postgresql
sudo systemctl enable postgresql
Installation on RHEL/CentOS
# Install PostgreSQL repository
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
# Disable built-in PostgreSQL module
sudo dnf -qy module disable postgresql
# Install PostgreSQL 15
sudo dnf install -y postgresql15-server postgresql15-contrib
# Initialize database cluster
sudo /usr/pgsql-15/bin/postgresql-15-setup initdb
# Start and enable service
sudo systemctl enable postgresql-15
sudo systemctl start postgresql-15
Initial Database Setup
# Switch to postgres user
sudo -i -u postgres
# Access PostgreSQL prompt
psql
-- Change postgres user password
ALTER USER postgres WITH ENCRYPTED PASSWORD 'StrongPassword123!';
-- Create application database
CREATE DATABASE webapp_prod;
-- Create application user
CREATE USER webapp_admin WITH ENCRYPTED PASSWORD 'AppPassword123!';
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE webapp_prod TO webapp_admin;
-- Grant schema privileges
c webapp_prod
GRANT ALL ON SCHEMA public TO webapp_admin;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO webapp_admin;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO webapp_admin;
-- Set default privileges for future objects
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO webapp_admin;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO webapp_admin;
-- Create read-only user
CREATE USER webapp_readonly WITH ENCRYPTED PASSWORD 'ReadOnlyPass123!';
GRANT CONNECT ON DATABASE webapp_prod TO webapp_readonly;
GRANT USAGE ON SCHEMA public TO webapp_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO webapp_readonly;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO webapp_readonly;
2. Advanced Configuration
PostgreSQL Configuration File
# /etc/postgresql/15/main/postgresql.conf (Ubuntu)
# /var/lib/pgsql/15/data/postgresql.conf (RHEL)
#---------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#---------------------------------------------------------------------------
# Listen on all interfaces (change for production)
listen_addresses = '*'
port = 5432
# Maximum number of connections
max_connections = 200
# Connection pooling (consider PgBouncer for production)
# reserved_superuser_connections = 3
#---------------------------------------------------------------------------
# RESOURCE USAGE (Memory)
#---------------------------------------------------------------------------
# Shared memory for caching (25% of total RAM)
shared_buffers = 8GB
# Memory for query operations (total RAM / max_connections)
work_mem = 32MB
# Memory for maintenance operations (vacuum, index creation)
maintenance_work_mem = 2GB
# Optimizer's assumption of effective cache size (50-75% of RAM)
effective_cache_size = 24GB
# Memory for each autovacuum worker
autovacuum_work_mem = 2GB
#---------------------------------------------------------------------------
# WRITE AHEAD LOG (WAL)
#---------------------------------------------------------------------------
# WAL level for replication
wal_level = replica
# Minimum WAL size to keep
min_wal_size = 1GB
# Maximum WAL size before checkpoint
max_wal_size = 4GB
# WAL segment size
wal_segment_size = 16MB
# Archive command for backup (customize for your backup solution)
archive_mode = on
archive_command = 'rsync -a %p /mnt/wal_archive/%f'
# Number of WAL sender processes for replication
max_wal_senders = 10
# Replication slots
max_replication_slots = 10
# Keep WAL for standby
wal_keep_size = 1GB
#---------------------------------------------------------------------------
# REPLICATION
#---------------------------------------------------------------------------
# Synchronous standby servers
synchronous_standby_names = 'standby1,standby2'
# Synchronous commit setting
synchronous_commit = on
# Hot standby configuration (on standby servers)
hot_standby = on
hot_standby_feedback = on
#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------
# Planner cost constants
random_page_cost = 1.1 # For SSD storage
effective_io_concurrency = 200 # For SSD storage
# Query planning
default_statistics_target = 100
# Enable parallel query execution
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
max_worker_processes = 8
#---------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#---------------------------------------------------------------------------
# Where to log
log_destination = 'stderr'
logging_collector = on
log_directory = '/var/log/postgresql'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
# Log rotation
log_rotation_age = 1d
log_rotation_size = 100MB
# What to log
log_min_duration_statement = 1000 # Log queries > 1 second
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0
log_autovacuum_min_duration = 0
# Log line format
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
#---------------------------------------------------------------------------
# AUTOVACUUM
#---------------------------------------------------------------------------
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 30s
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
#---------------------------------------------------------------------------
# STATEMENT STATISTICS
#---------------------------------------------------------------------------
# Enable pg_stat_statements extension
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
#---------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#---------------------------------------------------------------------------
# Timezone
timezone = 'UTC'
# Locale
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
# Default text search configuration
default_text_search_config = 'pg_catalog.english'
Connection Pooling with PgBouncer
# Install PgBouncer
sudo apt install pgbouncer -y
# /etc/pgbouncer/pgbouncer.ini
[databases]
webapp_prod = host=localhost port=5432 dbname=webapp_prod
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3
# Server connection limits
server_lifetime = 3600
server_idle_timeout = 600
# Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
# /etc/pgbouncer/userlist.txt
"webapp_admin" "md5"
# Start PgBouncer
sudo systemctl enable pgbouncer
sudo systemctl start pgbouncer
# Test connection through PgBouncer
psql -h 127.0.0.1 -p 6432 -U webapp_admin webapp_prod
3. Security and Authentication
pg_hba.conf Configuration
# /etc/postgresql/15/main/pg_hba.conf
# TYPE DATABASE USER ADDRESS METHOD
# Local connections
local all postgres peer
local all all md5
# IPv4 local connections
host all all 127.0.0.1/32 md5
# Application server connections
host webapp_prod webapp_admin 10.0.1.0/24 md5
host webapp_prod webapp_readonly 10.0.1.0/24 md5
# Replication connections
host replication replicator 10.0.2.0/24 md5
# Deny all other connections
host all all 0.0.0.0/0 reject
host all all ::/0 reject
SSL/TLS Configuration
# Generate SSL certificates
sudo openssl req -new -x509 -days 365 -nodes -text
-out /etc/postgresql/15/main/server.crt
-keyout /etc/postgresql/15/main/server.key
-subj "/CN=postgres.example.com"
# Set permissions
sudo chown postgres:postgres /etc/postgresql/15/main/server.*
sudo chmod 600 /etc/postgresql/15/main/server.key
# Enable SSL in postgresql.conf
ssl = on
ssl_cert_file = '/etc/postgresql/15/main/server.crt'
ssl_key_file = '/etc/postgresql/15/main/server.key'
# Require SSL for remote connections in pg_hba.conf
hostssl webapp_prod webapp_admin 10.0.1.0/24 md5
# Client connection with SSL
psql "sslmode=require host=postgres.example.com dbname=webapp_prod user=webapp_admin"
Role-Based Access Control
-- Create roles with specific privileges
CREATE ROLE read_only_role;
GRANT CONNECT ON DATABASE webapp_prod TO read_only_role;
GRANT USAGE ON SCHEMA public TO read_only_role;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO read_only_role;
CREATE ROLE read_write_role;
GRANT CONNECT ON DATABASE webapp_prod TO read_write_role;
GRANT USAGE ON SCHEMA public TO read_write_role;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO read_write_role;
CREATE ROLE admin_role;
GRANT ALL PRIVILEGES ON DATABASE webapp_prod TO admin_role;
-- Create users and assign roles
CREATE USER analyst_user WITH ENCRYPTED PASSWORD 'AnalystPass123!';
GRANT read_only_role TO analyst_user;
CREATE USER app_user WITH ENCRYPTED PASSWORD 'AppPass123!';
GRANT read_write_role TO app_user;
-- Row-level security
CREATE POLICY user_policy ON users
USING (user_id = current_user::text OR current_user = 'admin_role');
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
4. Performance Tuning
Index Optimization
-- Enable pg_stat_statements for query analysis
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find missing indexes
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
seq_tup_read / seq_scan AS avg_seq_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 10;
-- Create indexes (CONCURRENTLY to avoid table locks)
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
CREATE INDEX CONCURRENTLY idx_users_created_at ON users(created_at);
CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders(user_id);
CREATE INDEX CONCURRENTLY idx_orders_status_created ON orders(status, created_at);
-- Composite index for common queries
CREATE INDEX CONCURRENTLY idx_orders_user_status
ON orders(user_id, status)
WHERE status IN ('pending', 'processing');
-- Partial index for active records
CREATE INDEX CONCURRENTLY idx_users_active
ON users(email)
WHERE is_active = true;
-- Full-text search index
CREATE INDEX CONCURRENTLY idx_products_search
ON products USING GIN(to_tsvector('english', name || ' ' || description));
-- Find unused indexes
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelid NOT IN (
SELECT indexrelid FROM pg_index WHERE indisunique
)
ORDER BY pg_relation_size(indexrelid) DESC;
Query Optimization
-- Analyze query plans
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;
-- Find slow queries
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time,
rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
-- Reset statistics
SELECT pg_stat_statements_reset();
-- Vacuum and analyze
VACUUM (ANALYZE, VERBOSE) users;
VACUUM (ANALYZE, VERBOSE) orders;
-- Auto-vacuum tuning for specific table
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_analyze_scale_factor = 0.02,
autovacuum_vacuum_cost_delay = 10
);
-- Table bloat check
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS external_size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
Connection and Memory Optimization
-- Check active connections
SELECT
datname,
usename,
client_addr,
state,
COUNT(*) as connections
FROM pg_stat_activity
GROUP BY datname, usename, client_addr, state
ORDER BY connections DESC;
-- Find long-running queries
SELECT
pid,
now() - query_start AS duration,
state,
query
FROM pg_stat_activity
WHERE state != 'idle'
AND query_start < now() - interval '5 minutes'
ORDER BY duration DESC;
-- Kill long-running query
SELECT pg_cancel_backend(pid); -- Graceful
SELECT pg_terminate_backend(pid); -- Forceful
-- Check table and index sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS indexes_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
5. Backup Strategies
Logical Backup with pg_dump
#!/bin/bash
# /usr/local/bin/postgres-backup.sh
set -euo pipefail
# Configuration
DB_NAME="webapp_prod"
DB_USER="webapp_admin"
DB_HOST="localhost"
BACKUP_DIR="/var/backups/postgresql"
RETENTION_DAYS=30
DATE=$(date +%Y%m%d_%H%M%S)
LOG_FILE="/var/log/postgresql/backup.log"
# Ensure backup directory exists
mkdir -p "$BACKUP_DIR"
# Logging function
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG_FILE"
}
log "Starting PostgreSQL backup"
# Custom format backup (recommended - supports parallel restore)
BACKUP_FILE="${BACKUP_DIR}/${DB_NAME}_${DATE}.backup"
pg_dump -h "$DB_HOST" -U "$DB_USER"
--format=custom
--compress=9
--verbose
--file="$BACKUP_FILE"
"$DB_NAME" 2>&1 | tee -a "$LOG_FILE"
# Verify backup
if pg_restore --list "$BACKUP_FILE" > /dev/null 2>&1; then
log "Backup verification successful"
else
log "ERROR: Backup verification failed"
exit 1
fi
# SQL format backup (human-readable)
SQL_BACKUP="${BACKUP_DIR}/${DB_NAME}_${DATE}.sql.gz"
pg_dump -h "$DB_HOST" -U "$DB_USER"
--format=plain
--verbose
"$DB_NAME" | gzip > "$SQL_BACKUP" 2>&1
# Schema-only backup
SCHEMA_BACKUP="${BACKUP_DIR}/${DB_NAME}_schema_${DATE}.sql"
pg_dump -h "$DB_HOST" -U "$DB_USER"
--schema-only
--verbose
--file="$SCHEMA_BACKUP"
"$DB_NAME" 2>&1 | tee -a "$LOG_FILE"
# Backup global objects (roles, tablespaces)
GLOBALS_BACKUP="${BACKUP_DIR}/globals_${DATE}.sql"
pg_dumpall -h "$DB_HOST" -U postgres
--globals-only
--file="$GLOBALS_BACKUP" 2>&1 | tee -a "$LOG_FILE"
# Calculate backup size
BACKUP_SIZE=$(du -sh "$BACKUP_FILE" | cut -f1)
log "Backup size: $BACKUP_SIZE"
# Upload to S3 (optional)
if command -v aws &> /dev/null; then
aws s3 cp "$BACKUP_FILE" "s3://my-postgres-backups/${DB_NAME}/"
log "Backup uploaded to S3"
fi
# Cleanup old backups
log "Cleaning up backups older than ${RETENTION_DAYS} days"
find "$BACKUP_DIR" -name "*.backup" -mtime +${RETENTION_DAYS} -delete
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +${RETENTION_DAYS} -delete
log "Backup completed successfully"
# Send notification
echo "PostgreSQL backup completed: $BACKUP_FILE ($BACKUP_SIZE)" |
mail -s "PostgreSQL Backup Success" admin@example.com
Physical Backup with pg_basebackup
#!/bin/bash
# Full physical backup
BACKUP_DIR="/var/backups/postgresql/basebackup"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p "$BACKUP_DIR"
# Create base backup
pg_basebackup -h localhost -U replicator
-D "${BACKUP_DIR}/base_${DATE}"
-Ft -z -P
--wal-method=stream
--checkpoint=fast
# Backup size
du -sh "${BACKUP_DIR}/base_${DATE}"
# Cleanup old base backups (keep last 7)
ls -t "$BACKUP_DIR" | tail -n +8 | xargs -I {} rm -rf "${BACKUP_DIR}/{}"
Point-in-Time Recovery (PITR) Setup
# Configure WAL archiving in postgresql.conf
archive_mode = on
archive_command = 'test ! -f /mnt/wal_archive/%f && cp %p /mnt/wal_archive/%f'
# Create WAL archive directory
sudo mkdir -p /mnt/wal_archive
sudo chown postgres:postgres /mnt/wal_archive
sudo chmod 700 /mnt/wal_archive
# Restart PostgreSQL
sudo systemctl restart postgresql
# Base backup for PITR
pg_basebackup -h localhost -U replicator
-D /var/backups/pitr_base
-Fp -Xs -P -R
# Recovery configuration (recovery.conf or postgresql.auto.conf)
restore_command = 'cp /mnt/wal_archive/%f %p'
recovery_target_time = '2025-01-15 14:30:00'
recovery_target_action = 'promote'
Automated Backup Scheduling
# Cron job for automated backups
# /etc/cron.d/postgresql-backup
# Daily full backup at 2 AM
0 2 * * * postgres /usr/local/bin/postgres-backup.sh >> /var/log/postgresql/backup-cron.log 2>&1
# Hourly incremental WAL archiving check
0 * * * * postgres /usr/local/bin/wal-archive-check.sh >> /var/log/postgresql/wal-archive.log 2>&1
# Weekly vacuum and analyze
0 3 * * 0 postgres psql -d webapp_prod -c "VACUUM ANALYZE;" >> /var/log/postgresql/maintenance.log 2>&1
6. Replication and High Availability
Streaming Replication Setup
Primary Server Configuration
# 1. Create replication user on primary
sudo -u postgres psql
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'ReplicatorPass123!';
# 2. Configure postgresql.conf on primary
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
wal_keep_size = 1GB
synchronous_standby_names = 'standby1' # For synchronous replication
# 3. Configure pg_hba.conf on primary
host replication replicator 10.0.2.0/24 md5
# 4. Restart primary
sudo systemctl restart postgresql
# 5. Create replication slot
sudo -u postgres psql -c "SELECT * FROM pg_create_physical_replication_slot('standby1_slot');"
Standby Server Setup
# 1. Stop PostgreSQL on standby
sudo systemctl stop postgresql
# 2. Clear data directory
sudo rm -rf /var/lib/postgresql/15/main/*
# 3. Create base backup from primary
sudo -u postgres pg_basebackup
-h primary-server.example.com
-U replicator
-D /var/lib/postgresql/15/main
-Fp -Xs -P -R
-S standby1_slot
# 4. Configure standby-specific settings
sudo -u postgres cat >> /var/lib/postgresql/15/main/postgresql.auto.conf << EOF
primary_conninfo = 'host=primary-server.example.com port=5432 user=replicator password=ReplicatorPass123!'
primary_slot_name = 'standby1_slot'
hot_standby = on
hot_standby_feedback = on
EOF
# 5. Create standby signal file
sudo -u postgres touch /var/lib/postgresql/15/main/standby.signal
# 6. Start standby
sudo systemctl start postgresql
# 7. Verify replication on primary
sudo -u postgres psql -c "SELECT * FROM pg_stat_replication;"
# 8. Check replication lag
sudo -u postgres psql -c "
SELECT
client_addr,
state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS byte_lag,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) / 1024 / 1024 AS mb_lag
FROM pg_stat_replication;"
Automatic Failover with Patroni
# Install Patroni and etcd
sudo apt install python3-pip python3-psycopg2 etcd -y
sudo pip3 install patroni[etcd]
# Patroni configuration
# /etc/patroni/patroni.yml
scope: postgres-cluster
namespace: /db/
name: node1
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.2.11:8008
etcd:
hosts: 10.0.2.10:2379,10.0.2.11:2379,10.0.2.12:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
parameters:
max_connections: 200
shared_buffers: 8GB
effective_cache_size: 24GB
wal_level: replica
max_wal_senders: 10
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.0.2.11:5432
data_dir: /var/lib/postgresql/15/main
bin_dir: /usr/lib/postgresql/15/bin
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: ReplicatorPass123!
superuser:
username: postgres
password: PostgresPass123!
parameters:
unix_socket_directories: '/var/run/postgresql'
create_replica_methods:
- basebackup
basebackup:
checkpoint: 'fast'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
# Start Patroni
sudo systemctl enable patroni
sudo systemctl start patroni
# Check cluster status
patronictl -c /etc/patroni/patroni.yml list
# Manual failover
patronictl -c /etc/patroni/patroni.yml failover
Connection Load Balancing with HAProxy
# Install HAProxy
sudo apt install haproxy -y
# /etc/haproxy/haproxy.cfg
global
maxconn 100
defaults
log global
mode tcp
retries 2
timeout client 30m
timeout connect 4s
timeout server 30m
timeout check 5s
listen stats
mode http
bind *:7000
stats enable
stats uri /
listen postgres
bind *:5000
option httpchk
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server node1 10.0.2.11:5432 maxconn 100 check port 8008
server node2 10.0.2.12:5432 maxconn 100 check port 8008
server node3 10.0.2.13:5432 maxconn 100 check port 8008
listen postgres_read
bind *:5001
option httpchk GET /replica
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server node1 10.0.2.11:5432 maxconn 100 check port 8008
server node2 10.0.2.12:5432 maxconn 100 check port 8008
server node3 10.0.2.13:5432 maxconn 100 check port 8008
# Restart HAProxy
sudo systemctl restart haproxy
7. Monitoring and Maintenance
Database Health Monitoring Script
#!/bin/bash
# /usr/local/bin/postgres-monitor.sh
DB_NAME="webapp_prod"
DB_USER="postgres"
ALERT_EMAIL="admin@example.com"
# Check database size
db_size=$(psql -U $DB_USER -d $DB_NAME -t -c "
SELECT pg_size_pretty(pg_database_size('$DB_NAME'));" | xargs)
echo "Database size: $db_size"
# Check connection count
connections=$(psql -U $DB_USER -d $DB_NAME -t -c "
SELECT count(*) FROM pg_stat_activity;" | xargs)
max_connections=$(psql -U $DB_USER -d $DB_NAME -t -c "
SHOW max_connections;" | xargs)
connection_percent=$((connections * 100 / max_connections))
echo "Connections: $connections / $max_connections ($connection_percent%)"
if [ $connection_percent -gt 80 ]; then
echo "WARNING: Connection usage over 80%" | mail -s "PostgreSQL Alert" $ALERT_EMAIL
fi
# Check replication lag
if psql -U $DB_USER -t -c "SELECT pg_is_in_recovery();" | grep -q 't'; then
lag_seconds=$(psql -U $DB_USER -t -c "
SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()));" | xargs)
echo "Replication lag: ${lag_seconds}s"
if (( $(echo "$lag_seconds > 60" | bc -l) )); then
echo "WARNING: Replication lag over 60 seconds" | mail -s "PostgreSQL Replication Alert" $ALERT_EMAIL
fi
fi
# Check for long-running queries
long_queries=$(psql -U $DB_USER -d $DB_NAME -t -c "
SELECT COUNT(*) FROM pg_stat_activity
WHERE state != 'idle'
AND query_start < now() - interval '10 minutes';" | xargs)
if [ "$long_queries" -gt 0 ]; then
echo "WARNING: $long_queries long-running queries detected"
psql -U $DB_USER -d $DB_NAME -c "
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state != 'idle'
AND query_start < now() - interval '10 minutes';"
fi
# Check table bloat
psql -U $DB_USER -d $DB_NAME << EOF
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
n_dead_tup,
n_live_tup,
ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 2) AS dead_tuple_percent
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
AND n_dead_tup::numeric / NULLIF(n_live_tup, 0) > 0.1
ORDER BY dead_tuple_percent DESC;
EOF
Prometheus Monitoring with postgres_exporter
# Install postgres_exporter
wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.12.0/postgres_exporter-0.12.0.linux-amd64.tar.gz
tar xvfz postgres_exporter-0.12.0.linux-amd64.tar.gz
sudo mv postgres_exporter-0.12.0.linux-amd64/postgres_exporter /usr/local/bin/
# Create systemd service
sudo tee /etc/systemd/system/postgres_exporter.service << EOF
[Unit]
Description=PostgreSQL Exporter
After=network.target
[Service]
Type=simple
User=postgres
Environment=DATA_SOURCE_NAME="postgresql://postgres:password@localhost:5432/postgres?sslmode=disable"
ExecStart=/usr/local/bin/postgres_exporter
Restart=always
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable postgres_exporter
sudo systemctl start postgres_exporter
# Metrics available at http://localhost:9187/metrics
8. Disaster Recovery
Restore from Backup
#!/bin/bash
# Restore database from backup
BACKUP_FILE="/var/backups/postgresql/webapp_prod_20250115_020000.backup"
DB_NAME="webapp_prod"
# 1. Drop existing database (if exists)
sudo -u postgres dropdb --if-exists $DB_NAME
# 2. Create new database
sudo -u postgres createdb $DB_NAME
# 3. Restore from custom format backup
sudo -u postgres pg_restore
--dbname=$DB_NAME
--jobs=4
--verbose
$BACKUP_FILE
# 4. Restore from SQL dump
# gunzip -c backup.sql.gz | sudo -u postgres psql -d $DB_NAME
# 5. Restore globals (roles and tablespaces)
sudo -u postgres psql -f /var/backups/postgresql/globals_20250115_020000.sql
# 6. Verify restoration
sudo -u postgres psql -d $DB_NAME -c "dt"
sudo -u postgres psql -d $DB_NAME -c "SELECT COUNT(*) FROM users;"
Point-in-Time Recovery
# 1. Stop PostgreSQL
sudo systemctl stop postgresql
# 2. Backup current data directory
sudo mv /var/lib/postgresql/15/main /var/lib/postgresql/15/main.old
# 3. Restore base backup
sudo cp -R /var/backups/pitr_base /var/lib/postgresql/15/main
sudo chown -R postgres:postgres /var/lib/postgresql/15/main
# 4. Create recovery configuration
sudo -u postgres cat > /var/lib/postgresql/15/main/recovery.signal << EOF
# This file triggers recovery mode
EOF
sudo -u postgres tee -a /var/lib/postgresql/15/main/postgresql.auto.conf << EOF
restore_command = 'cp /mnt/wal_archive/%f %p'
recovery_target_time = '2025-01-15 14:30:00'
recovery_target_action = 'promote'
EOF
# 5. Start PostgreSQL (will enter recovery mode)
sudo systemctl start postgresql
# 6. Monitor recovery
sudo -u postgres tail -f /var/log/postgresql/postgresql-15-main.log
# 7. Verify recovery
sudo -u postgres psql -d webapp_prod -c "SELECT MAX(created_at) FROM orders;"
9. Version Upgrades
Major Version Upgrade (14 to 15)
# 1. Install new PostgreSQL version
sudo apt install postgresql-15 postgresql-client-15
# 2. Stop both clusters
sudo systemctl stop postgresql
# 3. Check compatibility
sudo -u postgres /usr/lib/postgresql/15/bin/pg_upgrade
--old-datadir=/var/lib/postgresql/14/main
--new-datadir=/var/lib/postgresql/15/main
--old-bindir=/usr/lib/postgresql/14/bin
--new-bindir=/usr/lib/postgresql/15/bin
--check
# 4. Perform upgrade
sudo -u postgres /usr/lib/postgresql/15/bin/pg_upgrade
--old-datadir=/var/lib/postgresql/14/main
--new-datadir=/var/lib/postgresql/15/main
--old-bindir=/usr/lib/postgresql/14/bin
--new-bindir=/usr/lib/postgresql/15/bin
--link
# 5. Start new cluster
sudo systemctl start postgresql@15-main
# 6. Update statistics
sudo -u postgres /usr/lib/postgresql/15/bin/vacuumdb --all --analyze-in-stages
# 7. Run generated script to delete old cluster
sudo ./delete_old_cluster.sh
# 8. Update configuration and extensions
sudo -u postgres psql -c "ALTER EXTENSION pg_stat_statements UPDATE;"
10. Best Practices
Performance Best Practices
- Use Connection Pooling: Implement PgBouncer or application-level pooling
- Optimize Queries: Use EXPLAIN ANALYZE and pg_stat_statements
- Index Strategically: Create indexes on frequently queried columns
- Regular Maintenance: Schedule VACUUM ANALYZE during low-traffic periods
- Monitor Statistics: Keep table statistics current for optimal query planning
- Partition Large Tables: Use table partitioning for multi-TB tables
Security Best Practices
- Use Strong Passwords: Minimum 16 characters with complexity
- Enable SSL/TLS: Encrypt all network connections
- Restrict pg_hba.conf: Allow only necessary hosts and users
- Regular Updates: Apply security patches promptly
- Audit Logging: Enable detailed logging for security events
- Row-Level Security: Implement for multi-tenant applications
Backup Best Practices
- 3-2-1 Rule: 3 copies, 2 different media, 1 offsite
- Test Restores: Regularly verify backup integrity
- Automate Backups: Use cron or systemd timers
- Monitor Backup Size: Track growth trends
- Document Procedures: Maintain recovery runbooks
- Encrypt Backups: Protect sensitive data at rest
High Availability Best Practices
- Synchronous Replication: For zero data loss requirements
- Monitor Replication Lag: Alert on lag > acceptable threshold
- Automate Failover: Use Patroni or similar tools
- Load Balance Reads: Distribute read traffic to replicas
- Test Failover: Regular disaster recovery drills
- Geographic Distribution: Multi-region for disaster recovery
Conclusion
PostgreSQL administration requires careful planning, continuous monitoring, and adherence to best practices. Key takeaways:
- Configuration: Tune parameters based on workload and hardware
- Security: Implement defense-in-depth with SSL, authentication, and auditing
- Performance: Regular maintenance, proper indexing, and query optimization
- Backup: Automated, tested, and offsite backup strategies
- High Availability: Streaming replication with automatic failover
- Monitoring: Proactive alerting and performance tracking
- Disaster Recovery: Documented procedures and regular testing
Regular maintenance, monitoring, and following these practices ensures a robust, performant, and highly available PostgreSQL deployment.
Additional Resources
- PostgreSQL Official Documentation
- PostgreSQL Wiki
- PGTune - Configuration calculator
- Patroni - HA solution
- PgBouncer - Connection pooler
Was this article helpful?
About Ramesh Sundararamaiah
Red Hat Certified Architect
Expert in Linux system administration, DevOps automation, and cloud infrastructure. Specializing in Red Hat Enterprise Linux, CentOS, Ubuntu, Docker, Ansible, and enterprise IT solutions.