A deep dive into troubleshooting persistent GRUB boot issues after SSD migration
Background: The Migration Process
My Rocky Linux 9.4 server was running on a traditional HDD, and I decided to upgrade to a faster NVMe SSD. Rather than doing a clean install, I wanted to migrate the existing system to preserve all configurations and data.
How I Migrated with Claude Code’s Help
I used Claude Code to guide me through the migration process. Here’s what we did:
Step 1: Partition the new SSD
Claude helped me create the partition scheme on the new NVMe drive:
# Create GPT partition table
sudo parted /dev/nvme0n1 mklabel gpt
# Create EFI partition (512MB)
sudo parted /dev/nvme0n1 mkpart primary fat32 1MiB 513MiB
sudo parted /dev/nvme0n1 set 1 esp on
# Create swap partition (16GB)
sudo parted /dev/nvme0n1 mkpart primary linux-swap 513MiB 16.5GiB
# Create root partition (remaining space)
sudo parted /dev/nvme0n1 mkpart primary ext4 16.5GiB 100%ShellScriptStep 2: Format the partitions
# Format EFI partition
sudo mkfs.vfat -F32 /dev/nvme0n1p1
# Format swap
sudo mkswap /dev/nvme0n1p2
# Format root partition
sudo mkfs.ext4 /dev/nvme0n1p3ShellScriptStep 3: Use rsync to copy the system
This was the critical part. Claude helped me craft the right rsync command to copy everything while preserving permissions, attributes, and excluding unnecessary directories:
# Mount the new SSD
sudo mkdir /mnt/newssd
sudo mount /dev/nvme0n1p3 /mnt/newssd
# Mount EFI partition
sudo mkdir /mnt/newssd/boot/efi
sudo mount /dev/nvme0n1p1 /mnt/newssd/boot/efi
# Use rsync to copy the entire system
sudo rsync -aAXHv --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found"} / /mnt/newssd/ShellScriptThe rsync flags used:
-a: Archive mode (preserves permissions, timestamps, symbolic links, etc.)-A: Preserve ACLs (Access Control Lists)-X: Preserve extended attributes-H: Preserve hard links-v: Verbose output
Step 4: Update fstab and reinstall GRUB
# Chroot into the new system
sudo mount --bind /dev /mnt/newssd/dev
sudo mount --bind /proc /mnt/newssd/proc
sudo mount --bind /sys /mnt/newssd/sys
sudo chroot /mnt/newssd
# Get new UUIDs
blkid
# Update /etc/fstab with new UUIDs
# (Claude helped me edit this correctly)
# Reinstall GRUB to the new disk
grub2-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=rocky /dev/nvme0n1
# Regenerate GRUB configuration
grub2-mkconfig -o /boot/grub2/grub.cfg
grub2-mkconfig -o /boot/efi/EFI/rocky/grub.cfg
# Exit chroot and reboot
exit
sudo rebootShellScriptWhat went wrong: Despite following all these steps correctly, the system still tried to boot with the old HDD’s UUID. That’s where the real troubleshooting began.
The Problem
After the migration, I encountered what seemed like a straightforward GRUB configuration issue. The system would boot, but only after manually editing the boot parameters at the GRUB menu every single time. The error? GRUB was trying to use the old HDD’s UUID instead of the new SSD’s UUID.
Symptoms:
- Had to press ‘e’ at GRUB menu and manually remove the old UUID
- Boot would fail or hang in dracut emergency shell without manual intervention
- System showed
root=UUID=bf6b9071-263a-49ec-bb6a-721da27a4d8c(old HDD) instead of the correctroot=UUID=8d4ad011-ae38-4c14-9589-975b4bae5405(new SSD)
Seems simple enough to fix, right? Just update the configuration files. Wrong.
The Investigation: A Journey Through GRUB’s Labyrinth
Phase 1: The Obvious Fixes (That Didn’t Work)
I started with the standard GRUB troubleshooting steps:
# Remove resume parameter from grub defaults
sudo sed -i 's/ resume=UUID=[^ ]*//g' /etc/default/grub
sudo grubby --update-kernel=ALL --remove-args='resume'
# Regenerate GRUB configuration
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
# Regenerate initramfs
sudo dracut --force --regenerate-allShellScriptResult: No change. GRUB menu still showed the old UUID.
Phase 2: Deep Dive into BLS Entries
Rocky Linux uses Boot Loader Specification (BLS) entries stored in /boot/loader/entries/. I verified all entries:
grep '^options' /boot/loader/entries/*.conf
Everything looked correct! Each entry showed the proper UUID. So I tried hardcoding the values:
# Hardcode correct root UUID in all BLS entries
sudo sed -i 's|^options .*|options root=UUID=8d4ad011-ae38-4c14-9589-975b4bae5405 ro rhgb quiet|' \
/boot/loader/entries/*.conf
Result: Still no change after reboot.
Phase 3: The grubenv Investigation
GRUB uses grubenv files to store boot environment variables. I updated both locations:
# Update main grubenv
sudo grub2-editenv /boot/grub2/grubenv set \
'kernelopts=root=UUID=8d4ad011-ae38-4c14-9589-975b4bae5405 ro rhgb quiet'
# Copy to EFI partition
sudo cp /boot/grub2/grubenv /boot/efi/EFI/rocky/grubenv
Result: Old UUID persisted in GRUB menu.
Phase 4: The Mystery Deepens
At this point, I performed an exhaustive search:
# Search ALL files in /boot for the old UUID
sudo find /boot -type f -exec grep -l 'bf6b9071-263a-49ec-bb6a-721da27a4d8c' {} \;ShellScriptFinding: The old UUID appeared in initramfs images (kdump configuration), but that shouldn’t affect GRUB’s boot menu.
I even found and deleted an old backup file:
sudo rm /boot/efi/EFI/rocky/grub.cfg.rpmsave.OLDShellScriptResult: Still no change.
The Breakthrough: Hidden Configuration Directories
After hours of troubleshooting, I discovered something unexpected. The OS migration had created multiple GRUB configuration directories scattered across the filesystem:
Discovery #1: /grub2/ in the Root Directory
$ ls /grub2/
grub.cfg grubenv
$ cat /grub2/grub.cfg | grep kernelopts
set kernelopts="root=UUID=bf6b9071-263a-49ec-bb6a-721da27a4d8c..."ShellScriptThere it was! A complete GRUB configuration directory sitting at /grub2/ (not /boot/grub2/). This was a leftover from the migration.
Discovery #2: The Real Culprit – /loader/entries/
Here’s the smoking gun:
$ ls -la / | grep loader
drwxr-xr-x. 3 root root 4096 Jan 5 14:23 loader
$ grep '^options' /loader/entries/*.conf
options root=UUID=bf6b9071-263a-49ec-bb6a-721da27a4d8c resume=UUID=b970092e-0ff0-4b19-a7e7-1022b2205448 roShellScriptThis was the primary source GRUB was using!
When GRUB’s blscfg command searches for BLS entries, it checks multiple locations. Depending on the $prefix variable in the GRUB core image, it may find /loader/entries/ before /boot/loader/entries/.
The Fix
Once I found the real source, the fix was straightforward:
# Fix the REAL BLS entries location
sudo find /loader/entries/ -name '*.conf' \
-exec sed -i 's/bf6b9071-263a-49ec-bb6a-721da27a4d8c/8d4ad011-ae38-4c14-9589-975b4bae5405/g' {} \;
# Remove problematic resume parameter
sudo find /loader/entries/ -name '*.conf' \
-exec sed -i 's/ resume=UUID=[^ \"$]*//g' {} \;
# Also fix the other discovered location
sudo sed -i 's/bf6b9071-263a-49ec-bb6a-721da27a4d8c/8d4ad011-ae38-4c14-9589-975b4bae5405/g' /grub2/grub.cfg
sudo sed -i 's/ resume=UUID=[^ \"]*//g' /grub2/grub.cfgShellScriptResult: Success! System now boots without manual intervention.
Bonus Issue: The GUI Problem
After fixing the boot issue, I encountered another problem: GDM (GNOME Display Manager) showed “Oh no! Something has gone wrong” and refused to start.
Error in logs:
gnome-shell: symbol lookup error: /lib64/libmutter-8.so.0: undefined symbol: drmModeCloseFB
Root cause: The SSD migration had left libdrm outdated while Mesa/Mutter were updated.
# Check version
rpm -q libdrm
libdrm-2.4.117-1.el9.x86_64 # Too old!
# Fix
sudo dnf update libdrm -y
# Upgraded to libdrm-2.4.123
sudo systemctl restart gdmShellScriptGUI immediately started working.
Lessons Learned
- OS migrations create configuration sprawl: When migrating systems, old configuration files can end up in unexpected locations outside the standard paths.
- GRUB’s search path is complex: BLS configurations can exist in multiple locations (
/loader/entries/,/boot/loader/entries/, etc.) and GRUB may prioritize them differently than expected. - Always check the root directory: After a migration, don’t just check
/boot/– orphaned configuration directories like/grub2/and/loader/can interfere with the boot process. - Library version mismatches matter: Graphics stack components (libdrm, mesa, mutter) need to stay in sync. Update all components together.
- Methodical searching pays off: Using
findandgrepto exhaustively search the filesystem eventually revealed the hidden configuration sources.
Key Takeaways for Troubleshooting GRUB
If you’re experiencing persistent GRUB issues after an OS migration:
- Check all possible configuration locations:find / -name “grub.cfg” 2>/dev/null
find / -name “grubenv” 2>/dev/null
find / -type d -name “loader” 2>/dev/null - Search for old UUIDs everywhere:sudo grep -r “OLD_UUID” / 2>/dev/null | grep -v /proc | grep -v /sys
- Don’t assume standard paths – migrations create chaos
- Test each fix with a reboot – configuration caching is real
Conclusion
What started as a “simple” UUID update turned into a multi-hour investigation through GRUB’s configuration hierarchy. The root cause wasn’t incorrect configuration in the expected locations – it was hidden configuration in unexpected places created during the OS migration.
If you’re facing similar issues, I hope this post saves you some troubleshooting time. The key is persistence and methodical searching. The configuration causing your problem is somewhere on the filesystem – you just need to find it.
System Details:
- Boot: UEFI with GRUB2 using BLS
- Migration: HDD to NVMe SSD
- Issue Duration: ~4 hours across multiple troubleshooting sessions
- Final Status: Fully resolved
Questions or similar experiences? Feel free to reach out!
Leave a Reply