Linux Boot and Filesystem Issues: Fix GRUB, Initrd, FSCK Errors

Linux Filesystem and Boot Issues Troubleshooting

Boot failures and filesystem corruption are critical issues that can prevent your Linux system from starting properly. Understanding how to diagnose and fix initrd corruption, filesystem errors, and GRUB problems is essential for any Linux administrator. This guide provides step-by-step solutions for common boot and filesystem issues.

Understanding Linux Boot Process

The Linux boot sequence follows these stages:

  1. BIOS/UEFI: Hardware initialization
  2. Boot Loader (GRUB): Loads kernel and initrd/initramfs
  3. Kernel: Initializes hardware and mounts root filesystem
  4. Initrd/Initramfs: Temporary root filesystem with drivers
  5. Init/Systemd: System initialization

GRUB Boot Loader Issues

Common GRUB Error Messages

  • “GRUB>” prompt – GRUB loaded but can’t find configuration
  • “grub rescue>” prompt – GRUB’s core modules not found
  • “Error: no such partition” – Partition table changed
  • “Error 15: File not found” – Kernel or initrd missing

Boot from GRUB Rescue Mode

If you see the “grub rescue>” prompt:

# List available partitions
ls

# Find your root partition (look for familiar directories)
ls (hd0,msdos1)/
ls (hd0,msdos2)/

# Once found, set root
set root=(hd0,msdos1)

# Set prefix
set prefix=(hd0,msdos1)/boot/grub

# Load modules
insmod normal
insmod linux

# Boot normally
normal

Reinstall GRUB

Boot from rescue CD/USB and reinstall GRUB:

# Mount root filesystem
mount /dev/sda1 /mnt

# Mount required filesystems
mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys

# Chroot into system
chroot /mnt

# Reinstall GRUB
# For BIOS systems:
grub2-install /dev/sda

# For UEFI systems:
grub2-install --target=x86_64-efi --efi-directory=/boot/efi

# Regenerate GRUB configuration
grub2-mkconfig -o /boot/grub2/grub.cfg

# Exit and reboot
exit
reboot

Fix Missing GRUB Menu

# Edit GRUB configuration
vi /etc/default/grub

# Ensure these settings:
GRUB_TIMEOUT=5
GRUB_TIMEOUT_STYLE=menu

# Rebuild GRUB config
grub2-mkconfig -o /boot/grub2/grub.cfg

Initrd/Initramfs Corruption Issues

Symptoms of Initrd Corruption

  • Kernel panic during boot
  • “Failed to execute /init” error
  • “No init found” error
  • System hangs after GRUB

Rebuild Initrd/Initramfs

Boot from rescue mode and rebuild initrd:

# Mount root filesystem
mount /dev/sda1 /mnt
chroot /mnt

# Backup existing initrd
cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.backup

# For RHEL/CentOS/Fedora:
dracut -f /boot/initramfs-$(uname -r).img $(uname -r)

# For Ubuntu/Debian:
update-initramfs -u -k $(uname -r)

# If kernel version unknown, find it:
ls /boot/vmlinuz-*

# Rebuild for specific kernel:
dracut -f /boot/initramfs-3.10.0-1160.el7.x86_64.img 3.10.0-1160.el7.x86_64

Verify Initrd Integrity

# Check if initrd is corrupt
lsinitrd /boot/initramfs-$(uname -r).img

# Extract and examine initrd
mkdir /tmp/initrd
cd /tmp/initrd
zcat /boot/initramfs-$(uname -r).img | cpio -idmv

# Check for required modules
lsinitrd /boot/initramfs-$(uname -r).img | grep -i module_name

Add Missing Modules to Initrd

# Add specific driver to initrd
echo 'add_drivers+="module_name"' >> /etc/dracut.conf.d/custom.conf
dracut -f

# For LVM systems
echo 'add_dracutmodules+="lvm"' >> /etc/dracut.conf.d/lvm.conf
dracut -f

Filesystem Corruption and FSCK

Detecting Filesystem Corruption

Common signs of filesystem issues:

  • System drops to emergency mode
  • Read-only filesystem errors
  • “Input/output error” messages
  • Missing files or directories

Running FSCK Safely

CRITICAL: Never run fsck on mounted filesystems!

# Check if filesystem is mounted
mount | grep /dev/sda1

# Unmount before fsck
umount /dev/sda1

# For root filesystem, boot to single-user or rescue mode

# Run fsck
fsck /dev/sda1

# Auto-repair without prompts (use carefully)
fsck -y /dev/sda1

# Check specific filesystem type
fsck.ext4 /dev/sda1
fsck.xfs /dev/sda2

FSCK for Different Filesystems

EXT4 Filesystem

# Check filesystem
e2fsck -f /dev/sda1

# Force check and auto-fix
e2fsck -fy /dev/sda1

# Check bad blocks too
e2fsck -fcy /dev/sda1

XFS Filesystem

# XFS uses xfs_repair (not fsck)
# Must be unmounted first
umount /dev/sda2

# Check only (no repairs)
xfs_repair -n /dev/sda2

# Repair filesystem
xfs_repair /dev/sda2

# If metadata corrupt
xfs_repair -L /dev/sda2  # Zeroes log (use as last resort)

Btrfs Filesystem

# Check btrfs filesystem
btrfs check /dev/sda3

# Repair btrfs (risky)
btrfs check --repair /dev/sda3

Emergency Boot to Single User Mode

From GRUB menu:

# Press 'e' to edit boot entry
# Find line starting with 'linux' or 'linux16'
# Add to end of that line:
systemd.unit=rescue.target

# Or for older systems:
single

# Or:
init=/bin/bash

# Press Ctrl+X to boot

Fix Read-Only Filesystem

# Remount root as read-write
mount -o remount,rw /

# Check for errors
dmesg | grep -i error
journalctl -xb | grep -i error

# Run fsck if needed
fsck -y /dev/sda1

Fstab Configuration Issues

Fixing Incorrect /etc/fstab

Wrong fstab entries can prevent booting:

# Boot to emergency mode
# System usually provides emergency shell

# Remount root as writable
mount -o remount,rw /

# Edit fstab
vi /etc/fstab

# Comment out problematic entries with #
# Example:
# /dev/sdb1  /data  ext4  defaults  0  0

# Test fstab entries
mount -a

# If no errors, reboot
reboot

Verify Fstab Before Reboot

# Check syntax
mount -fav

# Verify UUIDs match
blkid
cat /etc/fstab

# Test mount all
mount -a

Kernel Panic Troubleshooting

Common Kernel Panic Causes

  • Corrupted initrd
  • Missing root filesystem
  • Hardware failure
  • Driver issues

Boot Previous Kernel Version

# At GRUB menu, select "Advanced options"
# Choose older kernel version
# If system boots, investigate new kernel issues

# List installed kernels
rpm -qa | grep kernel    # RHEL/CentOS
dpkg -l | grep linux-image  # Ubuntu/Debian

# Set default kernel
grubby --set-default=/boot/vmlinuz-3.10.0-1160.el7.x86_64

Enable Kernel Panic Debugging

# Add to kernel boot parameters (edit GRUB):
debug ignore_loglevel

# View detailed messages during boot
# This helps identify exact failure point

LVM Boot Issues

LVM Not Activating

# From rescue mode
# Scan for volume groups
vgscan

# Activate all volume groups
vgchange -ay

# List logical volumes
lvs

# Mount root LV
mount /dev/vg_root/lv_root /mnt

# Continue with chroot and repairs

Rebuild Initrd with LVM Support

# Ensure LVM modules in initrd
echo 'add_dracutmodules+="lvm"' > /etc/dracut.conf.d/lvm.conf
dracut -f

# Verify LVM included
lsinitrd | grep lvm

UUID and Device Name Changes

Fix UUID Mismatches

# Get current UUIDs
blkid

# Update fstab with correct UUIDs
vi /etc/fstab

# Update GRUB if root UUID changed
vi /etc/default/grub
# Update GRUB_CMDLINE_LINUX line with new UUID

# Rebuild GRUB config
grub2-mkconfig -o /boot/grub2/grub.cfg

SELinux Preventing Boot

Disable SELinux Temporarily

# Add to kernel parameters at GRUB:
selinux=0

# Or boot to single user and:
setenforce 0

# Fix SELinux contexts
restorecon -R /

# Re-enable SELinux
setenforce 1

Relabel Filesystem

# Force relabel on next boot
touch /.autorelabel
reboot

Boot Logs Analysis

Check Boot Messages

# View last boot messages
journalctl -b

# View previous boot
journalctl -b -1

# Boot messages with errors
journalctl -b -p err

# Kernel messages
dmesg | less

# Systemd failed units
systemctl list-units --failed

Creating Rescue Environment

Backup Critical Boot Files

#!/bin/bash
# Backup boot directory
BACKUP_DIR="/root/boot_backup_$(date +%Y%m%d)"
mkdir -p $BACKUP_DIR

cp -a /boot/* $BACKUP_DIR/
cp /etc/fstab $BACKUP_DIR/
cp /etc/default/grub $BACKUP_DIR/

echo "Backup saved to $BACKUP_DIR"

Best Practices

  1. Always backup before changes – Especially /boot and /etc/fstab
  2. Test after kernel updates – Ensure system boots before production
  3. Keep rescue media ready – USB or CD with recovery tools
  4. Document UUIDs – Maintain record of filesystem UUIDs
  5. Regular fsck checks – Schedule filesystem checks during maintenance
  6. Monitor SMART data – Catch disk failures before corruption
  7. Keep old kernels – Don’t remove all previous kernel versions

Conclusion

Boot and filesystem issues require systematic troubleshooting and a solid understanding of the Linux boot process. By mastering GRUB recovery, initrd rebuilding, and fsck operations, you can quickly recover from most boot failures and filesystem corruption scenarios.

Always remember: prevention is better than cure. Regular backups of critical boot files and maintaining rescue media can save hours of troubleshooting during emergencies.

Frequently Asked Questions

1. When should I run fsck on a filesystem?

Run fsck when you see filesystem errors, input/output errors, or the system drops to emergency mode. Always unmount the filesystem first, or boot to rescue mode for the root filesystem. Schedule regular checks during maintenance windows.

2. Can I recover data from a corrupted XFS filesystem?

XFS is generally resilient, but severe corruption may require xfs_repair with the -L flag, which zeroes the log and may cause data loss. For critical data, consider professional data recovery before attempting repairs. Always backup first if possible.

3. How do I know which initrd file to rebuild?

The initrd file matches your kernel version. Check /boot/vmlinuz-* to see installed kernels. The current running kernel is shown by uname -r. Rebuild the initrd for the kernel version you’re trying to boot.

4. What’s the difference between rescue mode and emergency mode?

Rescue mode (systemd.unit=rescue.target) mounts filesystems and starts basic services. Emergency mode (systemd.unit=emergency.target) provides a minimal environment with only the root filesystem mounted, useful when rescue mode fails.

5. How can I prevent GRUB issues after updates?

Always run grub2-mkconfig -o /boot/grub2/grub.cfg after kernel updates or GRUB configuration changes. Verify the boot menu appears correctly. Keep at least one old working kernel as a fallback option in case the new kernel has issues.

Was this article helpful?

RS

About the Author: Ramesh Sundararamaiah

Red Hat Certified Architect

Ramesh is a Red Hat Certified Architect with extensive experience in enterprise Linux environments. He specializes in system administration, DevOps automation, and cloud infrastructure. Ramesh has helped organizations implement robust Linux solutions and optimize their IT operations for performance and reliability.

Expertise: Red Hat Enterprise Linux, CentOS, Ubuntu, Docker, Ansible, System Administration, DevOps

Add Comment