Advanced Bash Scripting: Professional Automation and System Integration
Advanced Bash scripting transforms simple shell commands into powerful automation tools used in enterprise environments. This comprehensive guide covers professional scripting techniques, error handling strategies, system integration patterns, and best practices used by DevOps engineers and system administrators worldwide.
📑 Table of Contents
- Table of Contents
- 1. Advanced Scripting Fundamentals
- Robust Script Template
- 2. Professional Error Handling
- Comprehensive Error Handling
- Signal Handling and Cleanup
- 3. Advanced Parameter Processing
- Getopts with Long Options
- 4. Arrays and Data Structures
- Advanced Array Operations
- JSON Processing
- 5. Modular Functions and Libraries
- Function Library Pattern
- 6. Advanced File Operations
- File Processing Patterns
- Directory Synchronization
- 7. Network and API Integration
- REST API Interaction
- Network Monitoring
- 8. Parallel Processing and Job Control
- GNU Parallel
- 9. Security and Input Validation
- Input Sanitization
- Secure Credential Management
- 10. System Integration Scripts
- Complete System Monitoring Script
- 11. Testing and Debugging
- Unit Testing with BATS
- Debugging Techniques
- 12. Best Practices and Patterns
- Script Organization
- Code Style Guidelines
- Conclusion
- Additional Resources
Table of Contents
- Advanced Scripting Fundamentals
- Professional Error Handling
- Advanced Parameter Processing
- Arrays and Data Structures
- Modular Functions and Libraries
- Advanced File Operations
- Network and API Integration
- Parallel Processing and Job Control
- Security and Input Validation
- System Integration Scripts
- Testing and Debugging
- Best Practices and Patterns
1. Advanced Scripting Fundamentals
Robust Script Template
#!/usr/bin/env bash
#############################################################################
# Script Name: professional-template.sh
# Description: Professional Bash script template with best practices
# Author: Your Name
# Version: 1.0.0
# Date: $(date +%Y-%m-%d)
#############################################################################
# Strict mode - exit on errors, undefined variables, pipe failures
set -euo pipefail
IFS=$'
'
# Constants
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly SCRIPT_NAME="$(basename "${BASH_SOURCE[0]}")"
readonly LOG_DIR="/var/log/$(basename "$SCRIPT_NAME" .sh)"
readonly LOG_FILE="${LOG_DIR}/$(date +%Y%m%d).log"
readonly PID_FILE="/var/run/$(basename "$SCRIPT_NAME" .sh).pid"
# Color codes for output
readonly RED=' 33[0;31m'
readonly GREEN=' 33[0;32m'
readonly YELLOW=' 33[1;33m'
readonly NC=' 33[0m' # No Color
# Create log directory
mkdir -p "$LOG_DIR"
# Logging functions
log() {
echo "[$(date +"%Y-%m-%d %H:%M:%S")] [INFO] $*" | tee -a "$LOG_FILE"
}
log_error() {
echo -e "${RED}[$(date +"%Y-%m-%d %H:%M:%S")] [ERROR] $*${NC}" | tee -a "$LOG_FILE" >&2
}
log_success() {
echo -e "${GREEN}[$(date +"%Y-%m-%d %H:%M:%S")] [SUCCESS] $*${NC}" | tee -a "$LOG_FILE"
}
log_warning() {
echo -e "${YELLOW}[$(date +"%Y-%m-%d %H:%M:%S")] [WARNING] $*${NC}" | tee -a "$LOG_FILE"
}
# Cleanup function
cleanup() {
local exit_code=$?
log "Cleaning up..."
rm -f "$PID_FILE"
trap - EXIT
exit "$exit_code"
}
# Error trap
error_exit() {
log_error "Error on line $1"
cleanup
}
# Setup traps
trap 'error_exit $LINENO' ERR
trap cleanup EXIT INT TERM
# Check if script is already running
if [[ -f "$PID_FILE" ]]; then
old_pid=$(cat "$PID_FILE")
if ps -p "$old_pid" > /dev/null 2>&1; then
log_error "Script already running with PID $old_pid"
exit 1
else
log_warning "Stale PID file found, removing..."
rm -f "$PID_FILE"
fi
fi
# Write current PID
echo $$ > "$PID_FILE"
# Main script logic goes here
main() {
log "Script started"
# Your code here
log_success "Script completed successfully"
}
main "$@"
2. Professional Error Handling
Comprehensive Error Handling
#!/bin/bash
# Advanced error handling with retry logic
retry() {
local max_attempts="$1"
local delay="$2"
local command="${@:3}"
local attempt=1
while [ $attempt -le $max_attempts ]; do
log "Attempt $attempt of $max_attempts: $command"
if eval "$command"; then
log_success "Command succeeded on attempt $attempt"
return 0
fi
if [ $attempt -lt $max_attempts ]; then
log_warning "Command failed, retrying in ${delay}s..."
sleep "$delay"
fi
((attempt++))
done
log_error "Command failed after $max_attempts attempts"
return 1
}
# Example usage
retry 3 5 "curl -sf https://api.example.com/health"
# Graceful degradation
backup_with_fallback() {
local primary_backup="/mnt/backup"
local fallback_backup="/tmp/backup"
if ! mount | grep -q "$primary_backup"; then
log_warning "Primary backup location not available, using fallback"
backup_location="$fallback_backup"
else
backup_location="$primary_backup"
fi
tar -czf "${backup_location}/backup-$(date +%Y%m%d).tar.gz" /opt/app/ || {
log_error "Backup failed"
return 1
}
log_success "Backup completed to $backup_location"
}
# Error context preservation
run_with_context() {
local context="$1"
shift
{
"$@"
} || {
log_error "Failed in context: $context"
log_error "Command: $*"
log_error "Working directory: $(pwd)"
log_error "User: $(whoami)"
log_error "Environment: $(env | grep -E '^(PATH|HOME|USER)=')"
return 1
}
}
Signal Handling and Cleanup
#!/bin/bash
# Temporary file management
TEMP_FILES=()
create_temp_file() {
local temp_file=$(mktemp)
TEMP_FILES+=("$temp_file")
echo "$temp_file"
}
cleanup_temp_files() {
log "Cleaning up temporary files..."
for temp_file in "${TEMP_FILES[@]}"; do
if [[ -f "$temp_file" ]]; then
rm -f "$temp_file"
log "Removed: $temp_file"
fi
done
}
# Advanced signal handling
declare -A SIGNAL_NAMES=(
[1]="SIGHUP"
[2]="SIGINT"
[3]="SIGQUIT"
[15]="SIGTERM"
)
signal_handler() {
local signal=$1
log_warning "Received ${SIGNAL_NAMES[$signal]}, shutting down gracefully..."
# Perform cleanup
cleanup_temp_files
# Stop background jobs
jobs -p | xargs -r kill 2>/dev/null
# Save state if needed
save_state
exit $((128 + signal))
}
save_state() {
local state_file="/var/lib/myapp/state.json"
cat > "$state_file" << EOF
{
"timestamp": "$(date -Iseconds)",
"pid": $$,
"status": "interrupted"
}
EOF
}
# Set up signal traps
for signal in "${!SIGNAL_NAMES[@]}"; do
trap "signal_handler $signal" "$signal"
done
3. Advanced Parameter Processing
Getopts with Long Options
#!/bin/bash
# Advanced argument parsing with validation
usage() {
cat << EOF
Usage: $0 [OPTIONS] SOURCE DESTINATION
Advanced backup script with multiple options
OPTIONS:
-h, --help Show this help message
-v, --verbose Enable verbose output
-c, --compress LEVEL Compression level (1-9, default: 6)
-e, --exclude PATTERN Exclude pattern (can be used multiple times)
-r, --retention DAYS Retention period in days (default: 30)
-n, --dry-run Perform dry run without actual backup
-p, --parallel JOBS Number of parallel jobs (default: 1)
-m, --email EMAIL Send notification to email
EXAMPLES:
$0 -v -c 9 /opt/app /backup/app
$0 --compress 5 --exclude "*.log" --retention 7 /data /backup/data
$0 -vn --parallel 4 /home /backup/home
EOF
exit 1
}
# Default values
VERBOSE=false
COMPRESS_LEVEL=6
EXCLUDE_PATTERNS=()
RETENTION_DAYS=30
DRY_RUN=false
PARALLEL_JOBS=1
EMAIL=""
# Parse arguments
parse_args() {
while [[ $# -gt 0 ]]; do
case $1 in
-h|--help)
usage
;;
-v|--verbose)
VERBOSE=true
shift
;;
-c|--compress)
COMPRESS_LEVEL="$2"
if ! [[ "$COMPRESS_LEVEL" =~ ^[1-9]$ ]]; then
log_error "Invalid compression level: $COMPRESS_LEVEL"
exit 1
fi
shift 2
;;
-e|--exclude)
EXCLUDE_PATTERNS+=("$2")
shift 2
;;
-r|--retention)
RETENTION_DAYS="$2"
if ! [[ "$RETENTION_DAYS" =~ ^[0-9]+$ ]]; then
log_error "Invalid retention days: $RETENTION_DAYS"
exit 1
fi
shift 2
;;
-n|--dry-run)
DRY_RUN=true
shift
;;
-p|--parallel)
PARALLEL_JOBS="$2"
shift 2
;;
-m|--email)
EMAIL="$2"
if ! [[ "$EMAIL" =~ ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$ ]]; then
log_error "Invalid email address: $EMAIL"
exit 1
fi
shift 2
;;
-*)
log_error "Unknown option: $1"
usage
;;
*)
break
;;
esac
done
# Validate required positional arguments
if [[ $# -lt 2 ]]; then
log_error "Missing required arguments: SOURCE and DESTINATION"
usage
fi
SOURCE="$1"
DESTINATION="$2"
# Validate paths
if [[ ! -d "$SOURCE" ]]; then
log_error "Source directory does not exist: $SOURCE"
exit 1
fi
}
parse_args "$@"
# Use parsed arguments
$VERBOSE && log "Verbose mode enabled"
log "Compression level: $COMPRESS_LEVEL"
log "Retention days: $RETENTION_DAYS"
[[ ${#EXCLUDE_PATTERNS[@]} -gt 0 ]] && log "Exclude patterns: ${EXCLUDE_PATTERNS[*]}"
4. Arrays and Data Structures
Advanced Array Operations
#!/bin/bash
# Associative arrays for configuration
declare -A SERVER_CONFIG=(
[hostname]="web01.example.com"
[port]="8080"
[max_connections]="1000"
[timeout]="30"
)
# Indexed arrays for lists
PACKAGES=(
"nginx"
"postgresql"
"redis"
"nodejs"
)
# Array manipulation functions
array_contains() {
local needle="$1"
shift
local haystack=("$@")
for item in "${haystack[@]}"; do
[[ "$item" == "$needle" ]] && return 0
done
return 1
}
array_remove() {
local remove="$1"
shift
local -n arr=$1
local new_array=()
for item in "${arr[@]}"; do
[[ "$item" != "$remove" ]] && new_array+=("$item")
done
arr=("${new_array[@]}")
}
# Array sorting
array_sort() {
local -n arr=$1
IFS=$'
' arr=($(sort <<<"${arr[*]}"))
unset IFS
}
# Example usage
if array_contains "nginx" "${PACKAGES[@]}"; then
log "Nginx is in the package list"
fi
array_remove "redis" PACKAGES
array_sort PACKAGES
# Iterate with index
for i in "${!PACKAGES[@]}"; do
log "Package $((i+1)): ${PACKAGES[$i]}"
done
# Multi-dimensional array simulation
declare -A SERVERS
SERVERS[web1_ip]="192.168.1.10"
SERVERS[web1_port]="80"
SERVERS[web2_ip]="192.168.1.11"
SERVERS[web2_port]="80"
get_server_config() {
local server_name="$1"
local property="$2"
local key="${server_name}_${property}"
echo "${SERVERS[$key]}"
}
# Usage
web1_ip=$(get_server_config "web1" "ip")
log "Web1 IP: $web1_ip"
JSON Processing
#!/bin/bash
# Parse JSON with jq
parse_json_config() {
local config_file="$1"
if ! command -v jq &>/dev/null; then
log_error "jq is not installed"
return 1
fi
# Read values from JSON
local database_host=$(jq -r '.database.host' "$config_file")
local database_port=$(jq -r '.database.port' "$config_file")
# Process array from JSON
local -a servers
mapfile -t servers < <(jq -r '.servers[]' "$config_file")
for server in "${servers[@]}"; do
log "Processing server: $server"
done
}
# Create JSON output
create_json_report() {
local output_file="$1"
cat > "$output_file" << EOF
{
"timestamp": "$(date -Iseconds)",
"hostname": "$(hostname)",
"status": "success",
"metrics": {
"cpu_usage": $(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1),
"memory_used": $(free -m | awk 'NR==2{print $3}'),
"disk_usage": $(df -h / | awk 'NR==2{print $5}' | sed 's/%//')
},
"services": [
$(systemctl is-active nginx postgresql redis | jq -Rs 'split("
")[:-1] | map({"name": ., "status": .})')
]
}
EOF
log "JSON report created: $output_file"
}
5. Modular Functions and Libraries
Function Library Pattern
# lib/common.sh - Reusable function library
# Check if library already loaded
[[ -n "${__COMMON_LIB_LOADED__:-}" ]] && return 0
readonly __COMMON_LIB_LOADED__=1
# Validation functions
validate_ip() {
local ip="$1"
local pattern='^([0-9]{1,3}.){3}[0-9]{1,3}$'
[[ "$ip" =~ $pattern ]] || return 1
IFS='.' read -ra octets <<< "$ip"
for octet in "${octets[@]}"; do
((octet > 255)) && return 1
done
return 0
}
validate_email() {
local email="$1"
[[ "$email" =~ ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$ ]]
}
validate_url() {
local url="$1"
[[ "$url" =~ ^https?://[a-zA-Z0-9.-]+(:[0-9]+)?(/.*)?$ ]]
}
# File operations
safe_remove() {
local file="$1"
if [[ -f "$file" ]]; then
local backup="${file}.backup.$(date +%Y%m%d%H%M%S)"
cp "$file" "$backup"
rm "$file"
log "Removed $file (backup: $backup)"
fi
}
ensure_directory() {
local dir="$1"
local mode="${2:-0755}"
if [[ ! -d "$dir" ]]; then
mkdir -p "$dir"
chmod "$mode" "$dir"
log "Created directory: $dir"
fi
}
# String manipulation
to_uppercase() {
echo "$1" | tr '[:lower:]' '[:upper:]'
}
to_lowercase() {
echo "$1" | tr '[:upper:]' '[:lower:]'
}
trim() {
local var="$1"
var="${var#"${var%%[![:space:]]*}"}"
var="${var%"${var##*[![:space:]]}"}"
echo "$var"
}
# Usage in main script
#!/bin/bash
source "$(dirname "${BASH_SOURCE[0]}")/lib/common.sh"
if validate_ip "192.168.1.1"; then
log "Valid IP address"
fi
6. Advanced File Operations
File Processing Patterns
#!/bin/bash
# Process large files efficiently
process_large_file() {
local input_file="$1"
local batch_size=1000
local line_count=0
local batch=()
while IFS= read -r line; do
batch+=("$line")
((line_count++))
if ((line_count % batch_size == 0)); then
process_batch "${batch[@]}"
batch=()
fi
done < "$input_file"
# Process remaining lines
[[ ${#batch[@]} -gt 0 ]] && process_batch "${batch[@]}"
}
process_batch() {
local lines=("$@")
# Process batch of lines
for line in "${lines[@]}"; do
# Your processing logic
echo "Processing: $line"
done
}
# Atomic file operations
atomic_write() {
local target_file="$1"
local content="$2"
local temp_file="${target_file}.tmp.$$"
# Write to temporary file
echo "$content" > "$temp_file"
# Verify write was successful
if [[ ! -f "$temp_file" ]]; then
log_error "Failed to create temporary file"
return 1
fi
# Atomic rename
mv "$temp_file" "$target_file"
log "Atomically updated: $target_file"
}
# File locking
with_file_lock() {
local lock_file="$1"
local timeout="${2:-10}"
shift 2
local command="$@"
# Create lock file with timeout
local count=0
while ! mkdir "$lock_file" 2>/dev/null; do
((count++))
if ((count >= timeout)); then
log_error "Failed to acquire lock: $lock_file"
return 1
fi
sleep 1
done
# Execute command with lock
eval "$command"
local exit_code=$?
# Release lock
rmdir "$lock_file"
return $exit_code
}
# Usage
with_file_lock "/tmp/myapp.lock" 30 "critical_operation"
Directory Synchronization
#!/bin/bash
# Intelligent directory sync
sync_directories() {
local source="$1"
local destination="$2"
local exclude_patterns="${3:-}"
local rsync_opts=(
-avz
--delete
--progress
--stats
--human-readable
)
# Add exclude patterns
if [[ -n "$exclude_patterns" ]]; then
while IFS= read -r pattern; do
rsync_opts+=(--exclude="$pattern")
done <<< "$exclude_patterns"
fi
# Perform dry run first
log "Performing dry run..."
if rsync --dry-run "${rsync_opts[@]}" "$source/" "$destination/"; then
log "Dry run successful, proceeding with actual sync"
if rsync "${rsync_opts[@]}" "$source/" "$destination/"; then
log_success "Synchronization completed"
return 0
else
log_error "Synchronization failed"
return 1
fi
else
log_error "Dry run failed, aborting"
return 1
fi
}
# Example usage
exclude_patterns="*.log
*.tmp
.git/
node_modules/"
sync_directories "/opt/app" "/backup/app" "$exclude_patterns"
7. Network and API Integration
REST API Interaction
#!/bin/bash
# API client functions
api_call() {
local method="$1"
local endpoint="$2"
local data="${3:-}"
local headers="${4:-}"
local base_url="https://api.example.com"
local auth_token="${API_TOKEN:-}"
local curl_opts=(
-s
-X "$method"
-H "Content-Type: application/json"
-H "Authorization: Bearer $auth_token"
)
# Add custom headers
if [[ -n "$headers" ]]; then
while IFS= read -r header; do
curl_opts+=(-H "$header")
done <<< "$headers"
fi
# Add data for POST/PUT
if [[ -n "$data" ]]; then
curl_opts+=(-d "$data")
fi
# Make request and capture response
local response
local http_code
response=$(curl "${curl_opts[@]}" -w "
%{http_code}" "${base_url}${endpoint}")
http_code=$(echo "$response" | tail -n1)
response=$(echo "$response" | sed '$d')
# Check HTTP status
if [[ "$http_code" =~ ^2[0-9]{2}$ ]]; then
echo "$response"
return 0
else
log_error "API call failed with status $http_code"
log_error "Response: $response"
return 1
fi
}
# Example usage
get_user() {
local user_id="$1"
api_call "GET" "/users/$user_id"
}
create_user() {
local name="$1"
local email="$2"
local data=$(cat <
Network Monitoring
#!/bin/bash
# Check service availability
check_service() {
local host="$1"
local port="$2"
local timeout="${3:-5}"
if timeout "$timeout" bash -c "cat < /dev/null > /dev/tcp/$host/$port" 2>/dev/null; then
log_success "Service $host:$port is available"
return 0
else
log_error "Service $host:$port is not available"
return 1
fi
}
# Monitor multiple services
monitor_services() {
local -A services=(
[web]="example.com:80"
[database]="db.example.com:5432"
[cache]="cache.example.com:6379"
)
local failed_services=()
for service_name in "${!services[@]}"; do
local host_port="${services[$service_name]}"
local host="${host_port%:*}"
local port="${host_port#*:}"
if ! check_service "$host" "$port"; then
failed_services+=("$service_name")
fi
done
if [[ ${#failed_services[@]} -gt 0 ]]; then
log_error "Failed services: ${failed_services[*]}"
send_alert "Services down: ${failed_services[*]}"
return 1
fi
log_success "All services are healthy"
return 0
}
8. Parallel Processing and Job Control
GNU Parallel
#!/bin/bash
# Process files in parallel
process_files_parallel() {
local file_list=("$@")
local max_jobs=4
export -f process_single_file
export -f log
printf '%s
' "${file_list[@]}" |
parallel -j "$max_jobs" --bar process_single_file
}
process_single_file() {
local file="$1"
log "Processing $file in parallel (PID: $$)"
# Your processing logic
sleep 2
echo "Completed: $file"
}
# Background job management
run_background_jobs() {
local -a pids=()
for i in {1..5}; do
{
log "Background job $i started"
sleep $((RANDOM % 10))
log "Background job $i completed"
} &
pids+=($!)
done
# Wait for all jobs to complete
for pid in "${pids[@]}"; do
wait "$pid"
log "Job with PID $pid finished"
done
log "All background jobs completed"
}
# Process queue with worker pool
declare -a TASK_QUEUE=()
declare -i WORKER_COUNT=4
declare -i ACTIVE_WORKERS=0
add_task() {
TASK_QUEUE+=("$1")
}
worker() {
local task="$1"
log "Worker $$ processing: $task"
# Process task
sleep 2
log "Worker $$ completed: $task"
}
process_queue() {
while [[ ${#TASK_QUEUE[@]} -gt 0 ]] || [[ $ACTIVE_WORKERS -gt 0 ]]; do
while [[ $ACTIVE_WORKERS -lt $WORKER_COUNT ]] && [[ ${#TASK_QUEUE[@]} -gt 0 ]]; do
local task="${TASK_QUEUE[0]}"
TASK_QUEUE=("${TASK_QUEUE[@]:1}")
{
worker "$task"
((ACTIVE_WORKERS--))
} &
((ACTIVE_WORKERS++))
done
sleep 0.1
done
wait
}
# Add tasks
for i in {1..20}; do
add_task "Task-$i"
done
# Process all tasks
process_queue
9. Security and Input Validation
Input Sanitization
#!/bin/bash
# Sanitize user input
sanitize_input() {
local input="$1"
local sanitized
# Remove dangerous characters
sanitized="${input//[^a-zA-Z0-9._-]/}"
echo "$sanitized"
}
# SQL injection prevention (for mysql/psql command line)
safe_sql_query() {
local table="$1"
local id="$2"
# Validate input
if ! [[ "$id" =~ ^[0-9]+$ ]]; then
log_error "Invalid ID: $id"
return 1
fi
# Use parameterized query
mysql -e "SELECT * FROM $table WHERE id = $id"
}
# Command injection prevention
safe_execute() {
local command="$1"
shift
local args=("$@")
# Whitelist allowed commands
local -a allowed_commands=(
"ls"
"cat"
"grep"
"find"
)
if ! array_contains "$command" "${allowed_commands[@]}"; then
log_error "Command not allowed: $command"
return 1
fi
# Execute with explicit arguments (no shell expansion)
"$command" "${args[@]}"
}
# Path traversal prevention
safe_file_access() {
local user_path="$1"
local base_dir="/var/www"
# Resolve to absolute path
local real_path=$(readlink -f "$user_path")
# Check if within allowed directory
if [[ "$real_path" != "$base_dir"* ]]; then
log_error "Access denied: $user_path"
return 1
fi
# Safe to access
cat "$real_path"
}
Secure Credential Management
#!/bin/bash
# Read password securely
read_password() {
local prompt="$1"
local password
read -sp "$prompt: " password
echo >&2
echo "$password"
}
# Store credentials encrypted
store_credential() {
local service="$1"
local username="$2"
local password="$3"
local cred_file="$HOME/.credentials/$service"
ensure_directory "$(dirname "$cred_file")" 0700
# Encrypt and store
echo "$username:$password" | openssl enc -aes-256-cbc -salt -pbkdf2
-out "$cred_file"
chmod 600 "$cred_file"
log "Credentials stored for $service"
}
# Retrieve credentials
get_credential() {
local service="$1"
local cred_file="$HOME/.credentials/$service"
if [[ ! -f "$cred_file" ]]; then
log_error "No credentials found for $service"
return 1
fi
# Decrypt and return
openssl enc -aes-256-cbc -d -pbkdf2 -in "$cred_file"
}
# Use environment variables securely
load_env_file() {
local env_file="$1"
if [[ ! -f "$env_file" ]]; then
log_error "Environment file not found: $env_file"
return 1
fi
# Load without executing arbitrary code
while IFS='=' read -r key value; do
# Skip comments and empty lines
[[ "$key" =~ ^#.*$ ]] && continue
[[ -z "$key" ]] && continue
# Remove quotes from value
value="${value%"}"
value="${value#"}"
export "$key=$value"
done < "$env_file"
}
10. System Integration Scripts
Complete System Monitoring Script
#!/bin/bash
# Complete monitoring script
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/lib/common.sh"
# Configuration
readonly ALERT_EMAIL="admin@example.com"
readonly THRESHOLDS_FILE="/etc/monitoring/thresholds.conf"
readonly STATE_FILE="/var/lib/monitoring/state.json"
# Load thresholds
load_thresholds() {
[[ -f "$THRESHOLDS_FILE" ]] && source "$THRESHOLDS_FILE"
CPU_THRESHOLD="${CPU_THRESHOLD:-80}"
MEMORY_THRESHOLD="${MEMORY_THRESHOLD:-90}"
DISK_THRESHOLD="${DISK_THRESHOLD:-85}"
LOAD_THRESHOLD="${LOAD_THRESHOLD:-5.0}"
}
# Check system resources
check_cpu() {
local cpu_usage=$(top -bn2 -d 0.5 | grep "Cpu(s)" | tail -1 | awk '{print $2}' | cut -d'%' -f1)
if (($(echo "$cpu_usage > $CPU_THRESHOLD" | bc -l))); then
log_warning "High CPU usage: ${cpu_usage}%"
send_alert "CPU Alert" "CPU usage is ${cpu_usage}% (threshold: ${CPU_THRESHOLD}%)"
return 1
fi
log "CPU usage: ${cpu_usage}%"
return 0
}
check_memory() {
local mem_total=$(free -m | awk 'NR==2{print $2}')
local mem_used=$(free -m | awk 'NR==2{print $3}')
local mem_percent=$((mem_used * 100 / mem_total))
if ((mem_percent > MEMORY_THRESHOLD)); then
log_warning "High memory usage: ${mem_percent}%"
send_alert "Memory Alert" "Memory usage is ${mem_percent}% (threshold: ${MEMORY_THRESHOLD}%)"
return 1
fi
log "Memory usage: ${mem_percent}%"
return 0
}
check_disk() {
local failed=false
while IFS= read -r line; do
local usage=$(echo "$line" | awk '{print $5}' | sed 's/%//')
local mount=$(echo "$line" | awk '{print $6}')
if ((usage > DISK_THRESHOLD)); then
log_warning "High disk usage on $mount: ${usage}%"
send_alert "Disk Alert" "Disk usage on $mount is ${usage}% (threshold: ${DISK_THRESHOLD}%)"
failed=true
fi
done < <(df -h | grep -vE '^Filesystem|tmpfs|cdrom')
[[ "$failed" == "true" ]] && return 1
return 0
}
check_load() {
local load_avg=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')
if (($(echo "$load_avg > $LOAD_THRESHOLD" | bc -l))); then
log_warning "High load average: $load_avg"
send_alert "Load Alert" "Load average is $load_avg (threshold: $LOAD_THRESHOLD)"
return 1
fi
log "Load average: $load_avg"
return 0
}
check_services() {
local -a required_services=("nginx" "postgresql" "redis")
local failed_services=()
for service in "${required_services[@]}"; do
if ! systemctl is-active --quiet "$service"; then
log_error "Service $service is not running"
failed_services+=("$service")
# Attempt restart
if systemctl restart "$service"; then
log_success "Successfully restarted $service"
else
log_error "Failed to restart $service"
fi
fi
done
if [[ ${#failed_services[@]} -gt 0 ]]; then
send_alert "Service Alert" "Services down: ${failed_services[*]}"
return 1
fi
return 0
}
# Send alert
send_alert() {
local subject="$1"
local message="$2"
local hostname=$(hostname)
local email_body=$(cat < "$STATE_FILE" << EOF
{
"timestamp": "$(date -Iseconds)",
"checks": {
"cpu": $1,
"memory": $2,
"disk": $3,
"load": $4,
"services": $5
}
}
EOF
}
# Main monitoring logic
main() {
log "Starting system monitoring"
load_thresholds
local cpu_ok=0
local mem_ok=0
local disk_ok=0
local load_ok=0
local services_ok=0
check_cpu && cpu_ok=1
check_memory && mem_ok=1
check_disk && disk_ok=1
check_load && load_ok=1
check_services && services_ok=1
save_state $cpu_ok $mem_ok $disk_ok $load_ok $services_ok
log "System monitoring completed"
}
main "$@"
11. Testing and Debugging
Unit Testing with BATS
#!/usr/bin/env bats
# tests/test_common.sh
setup() {
source "$BATS_TEST_DIRNAME/../lib/common.sh"
}
@test "validate_ip: valid IP should return 0" {
run validate_ip "192.168.1.1"
[ "$status" -eq 0 ]
}
@test "validate_ip: invalid IP should return 1" {
run validate_ip "999.999.999.999"
[ "$status" -eq 1 ]
}
@test "validate_email: valid email should return 0" {
run validate_email "test@example.com"
[ "$status" -eq 0 ]
}
@test "validate_email: invalid email should return 1" {
run validate_email "invalid-email"
[ "$status" -eq 1 ]
}
@test "array_contains: existing item should return 0" {
local arr=("apple" "banana" "cherry")
run array_contains "banana" "${arr[@]}"
[ "$status" -eq 0 ]
}
# Run tests with: bats tests/
Debugging Techniques
#!/bin/bash
# Enable different debug levels
DEBUG_LEVEL="${DEBUG_LEVEL:-0}"
debug() {
local level="$1"
shift
if ((DEBUG_LEVEL >= level)); then
echo "[DEBUG $level] $*" >&2
fi
}
# Use throughout script
debug 1 "Entering function: ${FUNCNAME[1]}"
debug 2 "Variable value: var=$var"
debug 3 "Full environment: $(env)"
# Trace execution
enable_trace() {
set -x
PS4='+(${BASH_SOURCE}:${LINENO}): ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'
}
# Profile script execution
profile_start() {
PROFILE_START=$(date +%s%N)
}
profile_end() {
local PROFILE_END=$(date +%s%N)
local DURATION=$(( (PROFILE_END - PROFILE_START) / 1000000 ))
log "Execution time: ${DURATION}ms"
}
# Usage
profile_start
# Your code here
profile_end
12. Best Practices and Patterns
Script Organization
project/
├── bin/
│ └── main-script.sh # Main entry point
├── lib/
│ ├── common.sh # Common functions
│ ├── logging.sh # Logging utilities
│ └── validation.sh # Input validation
├── config/
│ ├── default.conf # Default configuration
│ └── production.conf # Production settings
├── tests/
│ ├── test_common.sh # Unit tests
│ └── integration_test.sh # Integration tests
├── docs/
│ └── README.md # Documentation
└── .editorconfig # Editor configuration
Code Style Guidelines
#!/bin/bash
# 1. Use meaningful variable names
readonly DATABASE_CONNECTION_STRING="postgresql://localhost/mydb" # Good
readonly DB="postgresql://localhost/mydb" # Bad
# 2. Use functions for code organization
process_user_data() { # Good: descriptive function name
local user_id="$1"
# Process user
}
proc() { # Bad: unclear function name
# Do something
}
# 3. Always quote variables
echo "$variable" # Good
echo $variable # Bad
# 4. Use [[ ]] for conditionals
if [[ "$var" == "value" ]]; then # Good
echo "Match"
fi
if [ "$var" == "value" ]; then # Less preferred
echo "Match"
fi
# 5. Check command success
if command_that_might_fail; then # Good
log "Success"
else
log_error "Failed"
exit 1
fi
command_that_might_fail # Bad: ignores errors
# 6. Use readonly for constants
readonly MAX_RETRIES=3 # Good
MAX_RETRIES=3 # Bad
# 7. Local variables in functions
my_function() {
local temp_var="value" # Good
temp_var="value" # Bad: pollutes global scope
}
Conclusion
Advanced Bash scripting is a powerful skill that enables automation of complex system administration tasks. Key takeaways:
- Error Handling: Use strict mode (
set -euo pipefail
) and comprehensive error trapping - Modularity: Organize code into reusable functions and libraries
- Security: Always validate and sanitize user input
- Logging: Implement comprehensive logging for debugging and auditing
- Testing: Write unit tests and perform thorough testing
- Documentation: Comment complex logic and maintain README files
- Performance: Use parallel processing for CPU-intensive tasks
- Maintainability: Follow consistent coding standards and best practices
Professional Bash scripting combines technical expertise with software engineering principles to create reliable, maintainable automation tools for enterprise environments.
Additional Resources
- ShellCheck - Shell script analysis tool
- BATS - Bash Automated Testing System
- Bash Reference Manual - Official documentation
- Google Shell Style Guide - Coding standards
Was this article helpful?
About Ramesh Sundararamaiah
Red Hat Certified Architect
Expert in Linux system administration, DevOps automation, and cloud infrastructure. Specializing in Red Hat Enterprise Linux, CentOS, Ubuntu, Docker, Ansible, and enterprise IT solutions.