Linux Log Management: How to Find What You Need Before the Disk Fills Up
π― Key Takeaways
- Where Linux Logs Live and Why There Are Two Systems
- The Log File Map Every Admin Must Know
- Reading Log Files Efficiently Under Pressure
- journalctl Power Commands
- Log Levels: What They Mean and How to Act on Them
π Table of Contents
- Where Linux Logs Live and Why There Are Two Systems
- The Log File Map Every Admin Must Know
- Reading Log Files Efficiently Under Pressure
- journalctl Power Commands
- Log Levels: What They Mean and How to Act on Them
- logrotate: How It Works and How to Configure It
- Setting Journal Retention Limits
- Real Scenario: Disk Filling Due to Logs
- Centralized Logging: When You Need to Go Beyond Single-Server Logs
Where Linux Logs Live and Why There Are Two Systems
Modern Linux has a split logging architecture. There is the traditional flat-file system under /var/log/, where applications write plain text log files managed by rsyslog or syslog-ng. And there is the systemd journal β a binary log store managed by journald and queried with journalctl. Understanding both is required because different parts of the system use each one, and some distributions bridge them so messages appear in both places.
π Table of Contents
- Where Linux Logs Live and Why There Are Two Systems
- The Log File Map Every Admin Must Know
- Reading Log Files Efficiently Under Pressure
- Real-Time Log Monitoring with tail -f
- Targeted Searching with grep
- journalctl Power Commands
- Log Levels: What They Mean and How to Act on Them
- logrotate: How It Works and How to Configure It
- Writing a Custom logrotate Configuration for Your Application
- Setting Journal Retention Limits
- Real Scenario: Disk Filling Due to Logs
- Centralized Logging: When You Need to Go Beyond Single-Server Logs
The systemd journal captures all output from services managed by systemd (stdout and stderr), kernel messages, boot sequence events, and service state changes. It stores this data in a structured binary format that enables efficient filtering by service name, priority level, time range, and other fields β without grep. Traditional applications that write their own log files (nginx, Apache, MySQL, PostgreSQL) continue to write to /var/log/ directly. On many distributions (including Ubuntu), rsyslog is configured to forward journal messages to flat files, so you may find the same message in both the journal and /var/log/syslog.
The Log File Map Every Admin Must Know
Knowing where to look when something breaks saves minutes of confusion under pressure. Here is the essential map of /var/log/:
- /var/log/syslog (Ubuntu/Debian) or /var/log/messages (RHEL/CentOS/Rocky): The general system log. Contains messages from the kernel, system daemons, and applications that use the syslog API. This is the starting point when you do not know where else to look.
- /var/log/auth.log (Ubuntu/Debian) or /var/log/secure (RHEL/CentOS): Authentication events β successful SSH logins, failed login attempts, sudo usage, PAM authentication events. This is the first log you check after a suspected intrusion or unauthorized access incident.
- /var/log/kern.log: Kernel messages. Hardware errors, driver issues, OOM killer events, filesystem errors, and storage failures all appear here.
- /var/log/dmesg: The kernel ring buffer snapshot from boot. Contains hardware detection, driver initialization, and device-related messages from system startup.
- /var/log/dpkg.log (Ubuntu/Debian): Records every package installation, upgrade, and removal with timestamps. Invaluable for answering “what changed on this server last Tuesday.”
- /var/log/apt/history.log: Higher-level apt transaction history β which commands were run and which packages were involved.
- /var/log/nginx/access.log and /var/log/nginx/error.log: HTTP request log and nginx-specific error log.
- /var/log/mysql/error.log or /var/log/postgresql/: Database engine logs.
- /var/log/cron: Records of when cron jobs ran and whether they completed successfully.
Reading Log Files Efficiently Under Pressure
Knowing where logs are is only half the skill. The other half is reading them fast when something is broken right now.
Real-Time Log Monitoring with tail -f
# Follow a log file in real time, showing new lines as they appear
tail -f /var/log/nginx/error.log
# Follow multiple log files simultaneously
tail -f /var/log/nginx/error.log /var/log/syslog
# Show the last 100 lines, then continue following
tail -n 100 -f /var/log/auth.log
Targeted Searching with grep
# Find all failed SSH authentication attempts
grep "Failed password" /var/log/auth.log
# Case-insensitive search for errors in the system log
grep -i "error" /var/log/syslog
# Show 5 lines of context before and after each match
grep -B 5 -A 5 "Out of memory" /var/log/kern.log
# Count how many 404 errors nginx served today
grep -c " 404 " /var/log/nginx/access.log
# Search for a term across all log files in /var/log recursively
grep -r "connection refused" /var/log/ 2>/dev/null
# Find errors from multiple patterns with OR
grep -E "error|failed|critical|fatal" /var/log/syslog -i | tail -50
One of the most useful grep patterns for a broad diagnostic sweep:
grep -r "error|failed|critical|fatal" /var/log/ --include="*.log" -i 2>/dev/null |
grep -v ".gz:" |
tail -50
This searches all .log files recursively for any of those error-related keywords, skips matches inside compressed files (which grep cannot read anyway), and shows the last 50 hits β a fast first look at what is going wrong anywhere on the system.
journalctl Power Commands
The systemd journal is more powerful than flat log files for filtering because it stores structured metadata alongside each log entry. You can filter by service, time range, priority level, PID, user, and more β without parsing text with grep.
# View all journal entries, following live
journalctl -f
# View all journal entries, newest first (reverse chronological)
journalctl -r
# View logs for a specific service
journalctl -u nginx
# Follow logs for a specific service in real time
journalctl -u nginx -f
# View logs since a specific timestamp
journalctl --since "2026-03-06 09:00:00"
# View logs in a time range
journalctl --since "2026-03-06 09:00:00" --until "2026-03-06 10:00:00"
# Relative time references (very useful during incidents)
journalctl --since "1 hour ago"
journalctl --since "30 minutes ago"
# Filter by priority level (syslog numeric levels)
# 0=emerg, 1=alert, 2=crit, 3=err, 4=warning, 5=notice, 6=info, 7=debug
journalctl -p err # Show err and above (0-3)
journalctl -p warning # Show warning and above (0-4)
# Combine service and priority filter
journalctl -u myapp -p err
# View logs from the current boot session
journalctl -b
# View logs from the previous boot (for post-crash analysis)
journalctl -b -1
# List available boot sessions
journalctl --list-boots
# Output without the pager (for piping or scripting)
journalctl -u nginx --no-pager
# Show only the last N lines
journalctl -u nginx -n 100
# Output in JSON format (for programmatic processing)
journalctl -u nginx -o json | jq '.MESSAGE'
For incident investigation, combining time and priority filters is extremely powerful:
# All critical errors across the entire system in the past hour
journalctl -p crit --since "1 hour ago" --no-pager
# SSH authentication events in the last 30 minutes
journalctl -u sshd --since "30 minutes ago" --no-pager
Log Levels: What They Mean and How to Act on Them
Log severity levels follow the syslog standard. Understanding what each level signals tells you how urgently to respond:
- EMERG (0) β Emergency: The system is unusable. A kernel panic. A filesystem that has gone completely corrupt. You will rarely see this, but when you do, it requires immediate attention β the system may not be functioning.
- ALERT (1) β Alert: Immediate action required. A RAID array has degraded. A critical monitoring daemon has crashed. Act now, not in the next sprint.
- CRIT (2) β Critical: Critical conditions. Hardware errors, system failures that have not yet taken the system down but will. Page on-call.
- ERR (3) β Error: Error conditions. A service failed to start, a request failed unexpectedly, a file could not be opened. Review and investigate. Set up alerting for this level and above.
- WARNING (4) β Warning: Something unexpected happened but the system is continuing to function. Retry limits approached, deprecated configuration options used, slow query thresholds hit. Review in your normal work cycle.
- NOTICE (5) β Notice: Normal but significant events. Service started, configuration reloaded, a user logged in. Informational, high value, moderate volume.
- INFO (6) β Informational: Normal operational messages. Request processed, task completed. High volume β useful for detailed post-hoc analysis, not for real-time monitoring.
- DEBUG (7) β Debug: Very detailed messages for troubleshooting. Extremely high volume. Enable temporarily to diagnose a specific problem, then disable. Never leave debug logging on in production long-term.
A practical alerting policy: alert immediately on EMERG through CRIT, create tickets for ERR, review WARNING in weekly ops meetings, and let INFO through DEBUG expire naturally according to retention policy.
logrotate: How It Works and How to Configure It
Without log rotation, flat log files in /var/log/ grow indefinitely until the disk is full. logrotate is the standard tool that periodically rotates, compresses, and deletes old log files according to a policy you define. It runs daily via a cron job or systemd timer.
The global configuration is at /etc/logrotate.conf. Application-specific configurations live in /etc/logrotate.d/. Files in the .d directory take precedence and override global settings for the specific log paths they cover.
Example: the nginx logrotate configuration on Ubuntu:
cat /etc/logrotate.d/nginx
Typical output:
/var/log/nginx/*.log {
daily
missingok
rotate 52
compress
delaycompress
notifempty
create 0640 www-data adm
sharedscripts
postrotate
if [ -f /var/run/nginx.pid ]; then
kill -USR1 `cat /var/run/nginx.pid`
fi
endscript
}
Configuration directives explained in plain terms:
- daily: Rotate logs every day. Alternatives:
weekly,monthly,size 100M(rotate when file exceeds 100 MB regardless of time). - rotate 52: Keep 52 rotated copies. With daily rotation, this means 52 days of log history. After 52 rotations, the oldest is deleted.
- compress: Compress rotated files with gzip. Text logs compress at 90%+ ratio, so 100 MB of logs becomes roughly 8-10 MB compressed.
- delaycompress: Do not compress the most recently rotated file. This gives applications a chance to finish writing to the previous log file before it is compressed. Without this, some applications can error when trying to write to a file that is being compressed.
- missingok: Do not produce an error if the log file does not exist. Prevents logrotate from sending failure emails about logs that have not been created yet.
- notifempty: Do not rotate the log file if it contains no data. Avoids creating a clutter of empty rotated files.
- postrotate…endscript: Shell commands to run after rotation. For nginx, this sends a USR1 signal telling nginx to reopen its log files with the new filenames. Without this step, nginx would continue writing to the old (now renamed) file.
Writing a Custom logrotate Configuration for Your Application
cat > /etc/logrotate.d/myapp << 'EOF'
/opt/myapp/logs/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 0644 myapp myapp
dateext
dateformat -%Y%m%d
postrotate
# Signal the application to reopen log files
systemctl kill --kill-who=main --signal=USR1 myapp.service 2>/dev/null || true
endscript
}
EOF
Test the configuration without actually rotating anything (dry run):
logrotate -d /etc/logrotate.d/myapp
Force an immediate rotation for testing:
logrotate -f /etc/logrotate.d/myapp
Check the logrotate state file to see when logs were last rotated:
cat /var/lib/logrotate/status | grep myapp
Setting Journal Retention Limits
By default, the systemd journal limits itself to 10% of the filesystem where it stores data (/var/log/journal/ for persistent storage). On servers with small /var partitions or large numbers of high-volume services, the journal can still consume more space than you want.
Configure journal retention in /etc/systemd/journald.conf:
[Journal]
SystemMaxUse=500M
SystemKeepFree=200M
SystemMaxFileSize=50M
MaxRetentionSec=2week
Compress=yes
Settings explained:
- SystemMaxUse: Maximum total disk space the journal may consume on the
/varfilesystem. - SystemKeepFree: The journal will not grow to the point where less than this much space is free on the filesystem.
- SystemMaxFileSize: Maximum size of a single journal file before it is rotated into a new file.
- MaxRetentionSec: Automatically delete journal entries older than this. Accepts values like
1week,1month,90day.
Apply the new configuration:
systemctl restart systemd-journald
Check current journal disk usage:
journalctl --disk-usage
Manually clean up old journal data:
# Remove journal entries older than 2 weeks
journalctl --vacuum-time=2weeks
# Remove journal data until total usage is under 500MB
journalctl --vacuum-size=500M
Real Scenario: Disk Filling Due to Logs
The scenario: an alert fires at 11 PM. The disk on the application server is at 98%. Logs are suspected. Here is the complete response procedure:
# Step 1: Confirm the disk is full and identify which partition
df -h
# Step 2: Find the largest directories under /var/log
du -sh /var/log/* | sort -rh | head -20
# Step 3: Find the largest individual files in /var/log
find /var/log -type f -exec du -sh {} + 2>/dev/null | sort -rh | head -20
# Step 4: Check if a process currently has the large file open
lsof /var/log/myapp/application.log
# Step 5a: If a process has the file open β truncate, do not delete
truncate -s 0 /var/log/myapp/application.log
# Step 5b: If no process has the file open β delete it
rm /var/log/myapp/old-application.log
# Step 6: Remove old compressed rotated logs older than 14 days
find /var/log -name "*.gz" -mtime +14 -delete
# Step 7: Clear package manager caches
apt clean # Ubuntu/Debian
# Step 8: Verify disk space has been recovered
df -h
# Step 9: Fix the root cause β add or fix the logrotate config
# (use the custom logrotate config example above)
# Step 10: Test the logrotate config
logrotate -d /etc/logrotate.d/myapp
Why truncate instead of delete? When a running process has a file open and you delete it with rm, the directory entry is removed but the inode (and the disk space) is not freed until the process closes its file descriptor. You can see this with df -h showing less free space than you expect after deleting a file. truncate -s 0 empties the file content while keeping the inode and the file descriptor valid β the process continues writing to the same file, and the disk space is freed immediately.
Centralized Logging: When You Need to Go Beyond Single-Server Logs
When you manage more than a handful of servers, checking logs individually via SSH becomes impractical during an incident. If something goes wrong across multiple servers simultaneously, you need to correlate events across all of them β which is impossible when logs are isolated on each machine.
The traditional approach is rsyslog forwarding. On each server, configure /etc/rsyslog.conf to send logs to a central log server:
# Add to /etc/rsyslog.conf on each client server
# UDP (lower overhead, some loss acceptable)
*.* @logserver.example.com:514
# TCP (reliable delivery, better for production)
*.* @@logserver.example.com:514
Modern environments use purpose-built log aggregation systems. The most common options:
- Grafana Loki + Promtail: Lightweight, integrates with Prometheus and Grafana, designed for cloud-native environments. Promtail runs on each server and ships logs to the central Loki store.
- Elasticsearch + Logstash + Kibana (ELK Stack): Full-featured but resource-intensive. Powerful search and visualization. Good for large log volumes requiring complex queries.
- Managed services: AWS CloudWatch Logs, Datadog, Papertrail. Simpler to operate, cost scales with volume. Right choice when infrastructure management capacity is limited.
The signal that you need centralized logging: during an incident, you find yourself SSH-ing between three different servers to piece together a sequence of events. Once that happens twice, the investment in centralized logging is clearly justified. Set it up before the third incident, not after.
Was this article helpful?
About Ramesh Sundararamaiah
Red Hat Certified Architect
Expert in Linux system administration, DevOps automation, and cloud infrastructure. Specializing in Red Hat Enterprise Linux, CentOS, Ubuntu, Docker, Ansible, and enterprise IT solutions.