Tutorials

Troubleshooting — when things go wrong

This page covers the most common problems encountered when installing or running kldload, with exact commands to diagnose and fix each one. If your issue isn't listed here, check the FAQ or open a thread on Discord.

DKMS build fails

Symptom: The installer stops with an error like DKMS: build failed for zfs/2.x.x or Error! Bad return status for module build on kernel.

Cause 1 — Missing kernel headers. The headers package for the running kernel isn't installed.

# Check what kernel is running
uname -r

# Check if headers are installed (CentOS/RHEL)
rpm -q kernel-devel-$(uname -r)

# Install if missing
dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)

# Debian/Ubuntu
apt list --installed 2>/dev/null | grep linux-headers
apt install -y linux-headers-$(uname -r)

Cause 2 — Kernel version mismatch. The running kernel and the headers package don't match exactly. This happens when a kernel update was applied but the system hasn't rebooted yet.

# Check all installed kernels
rpm -q kernel         # CentOS/RHEL
dpkg -l linux-image*  # Debian

# Reboot into the latest kernel, then retry
reboot

# After reboot, rebuild DKMS manually
dkms autoinstall -k $(uname -r)

# Verify ZFS loaded
modinfo zfs | grep -E "^filename|^version"
lsmod | grep zfs

Cause 3 — Build log shows compiler errors. Read the full DKMS log:

cat /var/lib/dkms/zfs/*/build/make.log | tail -40

If you see gcc: command not found, install build tools:

dnf install -y gcc make         # CentOS/RHEL
apt install -y build-essential  # Debian

System won't boot after install

Symptom: After install completes, the system reboots to a black screen, firmware menu, or a grub rescue prompt instead of ZFSBootMenu.

Check 1 — Boot order. The disk is installed but the firmware is booting from a different device. Enter your UEFI/BIOS setup (usually F2, F12, or Del at POST) and move the target disk to the top of the boot order. Look for an entry like ZFSBootMenu or the disk's model name.

Check 2 — ZFSBootMenu EFI entry missing. Boot from the kldload ISO and run:

# Import the pool read-only to inspect
zpool import -N rpool

# Check if EFI partition has ZFSBootMenu
mount /dev/disk/by-partlabel/EFI /mnt/efi
ls -lh /mnt/efi/EFI/

# Reinstall the bootloader from the recovery tool
krecovery import rpool
krecovery reinstall-bootloader /dev/sda

Check 3 — UEFI vs legacy BIOS mismatch. kldload uses UEFI. If the system booted the ISO in legacy/CSM mode, the EFI entries won't be created correctly. Disable CSM in your firmware and re-run the install.

# Confirm the ISO was booted in UEFI mode
ls /sys/firmware/efi    # directory exists = UEFI mode
# If this path is missing, you're in legacy mode — reboot the ISO with UEFI enabled

No network after install

Symptom: After first boot, ip addr shows the NIC but no IP address, or the interface isn't managed by NetworkManager.

# Check interface state
ip addr show
nmcli device status

# If the NIC shows "unmanaged", force NetworkManager to adopt it
nmcli device connect eth0   # replace eth0 with your interface name

# Or edit the connection directly
nmcli con add type ethernet ifname eth0 con-name eth0
nmcli con up eth0

# Check if NetworkManager is running at all
systemctl status NetworkManager
systemctl enable --now NetworkManager

Static IP left over from install. If you configured a static IP during install and the subnet changed:

# List connections
nmcli con show

# Modify the IP
nmcli con mod eth0 ipv4.addresses 192.168.1.100/24
nmcli con mod eth0 ipv4.gateway 192.168.1.1
nmcli con mod eth0 ipv4.dns 1.1.1.1
nmcli con up eth0

Interface renamed after reboot. Predictable network interface names (enp3s0, ens192, etc.) are assigned at install time. If the disk image was moved to different hardware, the name may differ. Check ip link to find the current name.

Live ISO won't boot on hardware

Symptom: After writing the ISO to USB and booting, the system goes to a blank screen, a firmware error, or boots directly to the existing OS.

Check 1 — Write method. The ISO must be written with dd, not extracted as a filesystem. Writing with a GUI tool that extracts files (some versions of Rufus in ISO mode, for example) produces a non-bootable result.

sudo dd if=kldload-1.0.iso of=/dev/sda bs=4M status=progress oflag=sync

Check 2 — Secure Boot. kldload 1.0 does not yet ship a signed MOK (Machine Owner Key). Secure Boot must be disabled in UEFI firmware settings before booting the ISO. Look for "Secure Boot" under the Security or Boot tab in your firmware, and set it to Disabled.

Check 3 — USB port. Some systems have issues booting from USB 3 ports. Try a USB 2 port (or the rear panel ports on a server).

Check 4 — UEFI boot entry. Some firmware won't auto-detect the USB as a UEFI device. Enter the one-time boot menu (usually F12 at POST) and explicitly select the USB device in UEFI mode — look for an entry that starts with UEFI: followed by the USB drive name.

ZFS pool not importing

Symptom: After booting from the live ISO for recovery, zpool import shows the pool but import fails, or the pool isn't visible at all.

Cause 1 — hostid mismatch. ZFS records the system's hostid when a pool is created. If you're booting a different machine (or the hostid changed), import is blocked as a safety measure.

# Check the current hostid
hostid

# Force import with -f (only do this if you know the pool is not imported elsewhere)
zpool import -f rpool

# Or set a matching hostid before importing
zgenhostid <original-hostid>
zpool import rpool

Cause 2 — Stale cachefile. The pool's cachefile (/etc/zfs/zpool.cache) points to device paths that no longer exist (common after disk replacement or hardware changes).

# Import by scanning for pools across all devices
zpool import -d /dev rpool

# Or scan specific devices
zpool import -d /dev/disk/by-id rpool

# After successful import, regenerate the cachefile
zpool set cachefile=/etc/zfs/zpool.cache rpool

Cause 3 — Pool not visible at all. The pool may be on a disk that isn't online yet, or the device path changed.

# List all block devices
lsblk -o NAME,SIZE,TYPE,MOUNTPOINT

# Scan for importable ZFS pools on all devices
zpool import -d /dev/disk/by-id

# Check ZFS event daemon for errors
journalctl -u zfs-zed --since "1 hour ago"

RHEL install gets CentOS packages

Symptom: You chose RHEL as the target distro, but installed packages show CentOS branding, or /etc/os-release says CentOS after install.

Cause — Darksite contamination (fixed in 1.0). In pre-release builds, the CentOS darksite could bleed into RHEL installs when the RHEL CDN was unavailable and the installer fell back to local packages. This was fixed in kldload 1.0.

# Verify your ISO version
cat /etc/kldload-release   # on the live ISO
# or check the boot menu — it shows the version

# After install, verify os-release on the target
cat /etc/os-release | grep -E "^(NAME|VERSION_ID|ID)="

# If contaminated, recover by re-running the install with a 1.0 ISO
# RHEL installs require CDN access — ensure internet is available before starting

Ubuntu first boot GDM hang

Symptom: On first boot after a Desktop profile install targeting Ubuntu/Debian, the screen is black for 1–3 minutes before the login screen appears. Subsequent boots are normal.

This is expected behavior, not a bug. On first boot, several one-time services run: DKMS rebuilds ZFS for the installed kernel, the ZFS event daemon initializes, and systemd generates machine IDs and SSH host keys. GDM waits for these to complete before showing the login screen.

# If you want to watch what's happening during the wait
# Switch to a virtual console with Ctrl+Alt+F2 and log in, then:
journalctl -f

# Check if DKMS is still building
dkms status

# After first boot completes normally, second boot will be fast

If the hang persists beyond 5 minutes on a second or third boot, check for a failed service:

systemctl --failed
journalctl -p err --since "10 minutes ago"

kexport fails

Symptom: kexport qcow2 exits with an error. Common causes are a busy disk or a missing tool.

Cause 1 — Disk busy. The root ZFS pool can't be snapshotted while heavily in use, or the output target is full.

# Check available disk space on the export target
df -h /tmp /var /home

# Check if any process is holding open files on the pool
lsof +D /

# Check the exact error from kexport
kexport qcow2 2>&1 | tail -20

Cause 2 — qemu-img not found. kexport depends on qemu-img for format conversion.

# Check if qemu-img is available
which qemu-img
qemu-img --version

# Install if missing
dnf install -y qemu-img      # CentOS/RHEL
apt install -y qemu-utils    # Debian

Cause 3 — Export path not writable. The default export path may not have enough space or the correct permissions.

# Export to a specific path
KEXPORT_PATH=/mnt/export kexport qcow2

# Or with a custom name
KEXPORT_NAME=myserver KEXPORT_PATH=/mnt/nas kexport qcow2

General diagnostics

When in doubt, these commands give a fast overview of system health:

# kldload system status dashboard
kst

# ZFS pool health
zpool status -v

# Recent errors in the journal
journalctl -p err --since "1 hour ago"

# Services that have failed
systemctl --failed

# Disk space across all ZFS datasets
kdf

# Kernel messages (hardware errors, driver issues)
dmesg | grep -iE "(error|fail|warn)" | tail -30

← Post-Install Checklist FAQ →