Upgrades & Boot Environments Masterclass
This guide covers the hardest unsolved problem in Linux system administration:
how to upgrade an operating system without risking downtime, data loss, or an
unbootable machine. It explains why traditional package manager upgrades are
fundamentally unsafe, how ZFS dataset architecture separates OS state from user data,
how kldload produces and manages kernel modules across six different distro families,
and how boot environments give you instant rollback — the same capability that
Solaris had in 2005, that FreeBSD has with bectl, and that mainstream Linux
still does not provide natively.
What this page covers: the structural problems with traditional Linux upgrades, ZFS dataset architecture as an upgrade safety net, kernel module production via DKMS across all supported distros, ZFSBootMenu and boot environments, the kupgrade workflow step by step, the kbe boot environment management tool, side-by-side comparison of traditional vs kldload upgrades, and real-world rollback scenarios.
Prerequisites: the ZFS Zero to Hero tutorial and basic familiarity with your distro's package manager. If you have ever run apt upgrade or dnf update and held your breath, this page is for you.
1. The Problem With Traditional Linux Upgrades
Every Linux distribution ships with a package manager — apt on
Debian/Ubuntu, dnf on Fedora/RHEL/CentOS/Rocky, pacman on Arch. When you
run an upgrade, the package manager downloads new versions of installed packages and
overwrites files on the live root filesystem. Libraries are replaced in-place.
Kernel images are swapped. Configuration files are merged (or not). The entire
operation mutates the running system, and there is no built-in mechanism to undo it.
Think about what this means. You have a production server running a database, a
web application, and a monitoring stack. You run dnf upgrade -y. The package
manager begins replacing hundreds of files across /usr, /lib,
/etc, and /boot. If the power goes out halfway through, you have a
partially upgraded system with mismatched library versions. If the new kernel does
not support your storage driver, you cannot boot. If a package conflict leaves
dpkg in a broken state, you are manually resolving dependencies in
single-user mode.
The "Reboot and Pray" Anti-Pattern
In traditional Linux, upgrading a kernel requires a reboot. You have no way to verify that the new kernel will boot successfully until you are already committed to it. If the new kernel lacks a critical module — say, the ZFS driver, or an NVIDIA GPU driver, or a network card firmware — you discover this after the old kernel is no longer running. Recovery requires booting from external media, chrooting into the broken system, and manually fixing packages. On a remote server with no physical access, this means an emergency support ticket and hours of downtime.
Why Package Managers Cannot Solve This
Package managers are designed to install, remove, and update individual packages. They are not designed to be transactional systems. Consider the failure modes:
No Atomic Rollback
apt upgrade installs packages sequentially. If package 47 of 200 fails, packages
1–46 are already installed. There is no single command to undo the partial upgrade.
apt does not even track what the previous state was — it tracks what is
currently installed.
Live Filesystem Mutation
Packages overwrite files on the running root filesystem. A library update to
libssl takes effect immediately for newly started processes, but running
processes still hold the old version in memory. You can end up with two versions of
the same library active simultaneously — and if the ABI changed, things break
in subtle ways.
Held Packages and Dependency Hell
When packages conflict, the package manager "holds" them. Over time, held packages accumulate. A system that has been running for two years may have dozens of held packages, each blocking something else. The only reliable fix is a fresh install — which means rebuilding the entire machine.
No OS/Data Separation
On a traditional ext4 or xfs system, / is one filesystem. The kernel, the
application data, the database, and the user's home directory are all on the same
partition. Upgrading the OS means touching the same storage that holds your data.
A failed upgrade can corrupt the entire disk.
The fundamental problem: traditional Linux treats the operating system as a mutable pile of files. Every upgrade is a destructive in-place modification with no undo. Every reboot after an upgrade is a gamble. Every kernel update is a prayer. This is not how production infrastructure should work.
# The traditional upgrade workflow — hope as a strategy
sudo apt update
sudo apt upgrade -y # modifies live root filesystem in-place
sudo reboot # pray the new kernel boots
# ... 30 seconds of silence ...
# Did it come back? Check SSH. No response? Drive to the datacenter.
2. Storage Separation — Why Dataset Architecture Matters
The single most important design decision kldload makes is not putting everything on one filesystem. When the installer creates a ZFS pool, it builds a hierarchy of datasets — each with its own mountpoint, its own snapshot timeline, its own tunable properties, and its own independent lifecycle. The operating system lives in one dataset. Your data lives in others. They share the same pool (and therefore the same disk space), but they are logically independent.
Here is the dataset layout that kldload creates during installation:
# kldload ZFS dataset layout — created by storage-zfs.sh during install
rpool # pool root (canmount=off)
├── ROOT # boot environment container (canmount=off)
│ └── default # / (the active boot environment)
├── root # /root (admin home directory)
├── home # /home (canmount=on)
│ └── <username> # /home/<username> (per-user dataset)
├── srv # /srv (application data)
├── opt # /opt (optional packages)
├── usr # /usr container (canmount=off)
│ └── local # /usr/local
└── var # /var container (canmount=off)
├── cache # /var/cache
├── lib # /var/lib
├── log # /var/log
├── spool # /var/spool
└── tmp # /var/tmp
Why Each Dataset Is Separate
Every ZFS dataset has its own snapshot namespace. When you snapshot
rpool/ROOT/default, you capture the OS state without touching
rpool/home, rpool/srv, or rpool/var/lib. Rolling
back the OS does not roll back your databases, your container images, your home
directories, or your application data. This is the key insight: the OS is
disposable, the data is not.
Dataset Properties Per Workload
Because each dataset is independent, you can tune storage properties per workload. A database needs small record sizes for random I/O. A media server needs large record sizes for sequential streaming. A container runtime needs different compression settings than a log directory. ZFS lets you set these per dataset, and they take effect without reformatting anything:
# Workload-specific datasets created by kldload profiles
# (KVM and Server profiles add these on top of the base layout)
# Kubernetes etcd — 8K recordsize for small key-value writes
zfs create -o recordsize=8K -o compression=lz4 \
-o primarycache=metadata rpool/var/lib/etcd
# Container storage — 64K recordsize, lz4 compression
zfs create -o recordsize=64K -o compression=lz4 \
rpool/var/lib/containers/storage/zfs
# KVM virtual machine disks — 64K block size, metadata caching only
zfs create -o compression=lz4 -o recordsize=64K \
-o primarycache=metadata rpool/vms
# Application data — 1M recordsize for large file streaming
zfs create -o compression=zstd -o recordsize=1M rpool/srv
# AI model storage — 1M recordsize, zstd compression for large blobs
zfs create -o compression=zstd -o recordsize=1M rpool/srv/ollama
8K recordsize (etcd, PostgreSQL)
Databases perform small random reads and writes. An 8K recordsize matches the database page size, eliminating read amplification. ZFS reads exactly one database page per I/O instead of pulling in 128K of surrounding data that will never be used.
64K recordsize (VMs, containers)
Virtual machine disk images and container layers have mixed I/O patterns. 64K is a compromise that works well for both random and sequential access. It also matches the default block size of most guest filesystems.
128K recordsize (general purpose)
The ZFS default. Good for home directories, configuration files, source code, and mixed workloads. Most files are read sequentially and written in full, so larger records mean fewer I/O operations and better compression ratios.
1M recordsize (media, AI models)
Large sequential files — video, audio, machine learning models, ISO images — benefit from maximum record size. A 4GB file stored with 1M records is 4,096 blocks instead of 32,768 blocks at 128K. Fewer blocks means less metadata overhead and faster sequential throughput.
This is what Solaris got right in 2005. Sun Microsystems designed ZFS with the understanding that an operating system and its data have different lifecycles. The OS should be snapshottable, rollbackable, and disposable. The data should persist independently. Twenty years later, mainstream Linux still puts everything on one ext4 partition. kldload brings the Solaris dataset model to every supported distro — CentOS, Debian, Ubuntu, Fedora, RHEL, Rocky, Arch — because this is the correct architecture for any system that you intend to upgrade more than once.
3. How kldload Produces Kernel Modules — And Why It Matters
ZFS is not in the Linux kernel. It will never be in the Linux kernel. The reason
is licensing: ZFS is released under the CDDL (Common Development and Distribution
License), which is incompatible with the GPL (GNU General Public License) that
covers the Linux kernel. You cannot statically link CDDL code into a GPL binary.
This means ZFS must be built and loaded as an out-of-tree kernel module —
zfs.ko — and that module must be compiled against the exact kernel
headers of the running kernel.
This creates a hard dependency chain: every kernel upgrade requires a matching ZFS module. If the kernel updates and ZFS does not, the system cannot mount its root filesystem. On a ZFS-on-root system, this means the machine does not boot. This is the single most common cause of ZFS breakage on Linux, and every distribution handles it differently.
The Four Approaches to ZFS Module Delivery
Pre-built kmod packages (CentOS/RHEL/Rocky)
The ZFS on Linux project ships kmod-zfs RPMs that are compiled against
specific kernel ABIs. Red Hat enterprise kernels maintain a stable kABI
(kernel ABI) — the internal function signatures that modules depend on do not
change within a minor release. This means one kmod-zfs binary works across
dozens of kernel micro-updates. When the kABI breaks (major kernel update), a new
kmod-zfs RPM is released. This is the most reliable method but depends on
the ZFS project tracking Red Hat's release schedule.
DKMS (Debian/Ubuntu/Fedora)
DKMS (Dynamic Kernel Module Support) is a framework that automatically recompiles
kernel modules whenever a new kernel is installed. The zfs-dkms package
ships ZFS source code and a dkms.conf that tells DKMS how to build it.
When apt install linux-image-6.x runs, a DKMS hook triggers
dkms autoinstall, which compiles zfs.ko for the new kernel.
This works well when it works — but it requires a working compiler toolchain,
matching kernel headers, and enough disk space for the build. If any of those are
missing, the build silently fails and the system will not boot the new kernel.
Pre-built binary (Arch Linux)
The Arch ZFS community maintains zfs-linux packages in the AUR (Arch User
Repository) that are pre-compiled against specific Arch kernel versions. Because
Arch is a rolling release, the kernel updates constantly. The zfs-linux
maintainers must release a new package every time the kernel updates. If the kernel
updates before zfs-linux catches up, users must hold the kernel back.
kupgrade handles this automatically by detecting the conflict and upgrading
everything except the kernel.
Manual compilation
Download the OpenZFS source, run ./configure && make && make install,
hope you got the right version for your kernel. This is what people did before DKMS
existed. It requires expert knowledge, breaks on every kernel update, and provides
no automatic rebuild mechanism. Nobody should do this on a production system.
How kldload Handles This
kldload takes a two-stage approach to ZFS module production that ensures the module is always correctly compiled for the target system:
Stage 1: ISO build time. When build-iso.sh assembles the live ISO, it
installs zfs-dkms into the live environment's rootfs and compiles ZFS against
the live kernel. This is done inside the builder container with full compiler
toolchain access:
# From build-iso.sh — ZFS DKMS build inside the live rootfs
# The builder container has: gcc, make, autoconf, automake, libtool, kernel-devel
# Get the ZFS DKMS version
ZFS_VER=$(chroot "$ROOTFS" rpm -q --qf '%{VERSION}' zfs-dkms)
# Clean slate
chroot "$ROOTFS" dkms remove -m zfs -v "$ZFS_VER" --all 2>/dev/null || true
chroot "$ROOTFS" dkms add -m zfs -v "$ZFS_VER"
# Build against the exact kernel installed in the rootfs
chroot "$ROOTFS" env ARCH=x86_64 dkms build -m zfs -v "$ZFS_VER" -k "$KVER"
# Install the compiled module
chroot "$ROOTFS" dkms install -m zfs -v "$ZFS_VER" -k "$KVER" --force
Stage 2: Target install time. When the user installs a distro (say, Debian 13),
the installer bootstraps that distro with debootstrap and installs
zfs-dkms and the target kernel's headers. DKMS then compiles ZFS against
the target distro's kernel — not the live ISO's kernel. This is critical:
the target system's kernel may be completely different from the live environment's
kernel. CentOS 9 runs kernel 5.14. Debian 13 runs kernel 6.x. Ubuntu 24.04 runs
kernel 6.8. Fedora 41 runs kernel 6.11. Each needs its own ZFS module.
# During distro installation — DKMS rebuilds for the TARGET kernel
# (this happens automatically via package manager hooks)
# Debian/Ubuntu: apt triggers dkms autoinstall
apt-get install -y zfs-dkms linux-headers-$(uname -r)
# DKMS hook fires → compiles zfs.ko for the installed kernel
# Fedora: dnf triggers dkms autoinstall
dnf install -y zfs-dkms kernel-devel
# DKMS hook fires → compiles zfs.ko for the installed kernel
# CentOS/RHEL/Rocky: uses pre-built kmod instead of DKMS
dnf install -y kmod-zfs
# No compilation needed — kABI-tracked binary module
# Arch: uses pre-built zfs-linux from archzfs repo
pacman -S zfs-linux zfs-utils
# No compilation needed — pre-built for the exact kernel version
Module Signing and Secure Boot
On systems with Secure Boot enabled, kernel modules must be cryptographically signed with a key enrolled in the machine's MOK (Machine Owner Key) database. DKMS-compiled modules are unsigned by default — the kernel will refuse to load them, and the system will not boot.
kldload handles this with a MOK signing chain: during install, a signing key pair
is generated, the ZFS modules are signed with the private key, and the public key
is enrolled in the MOK database via mokutil. After reboot, the user confirms
the MOK enrollment at the UEFI firmware prompt. From that point forward, DKMS
modules signed with that key are trusted by the kernel.
# MOK signing flow (kupgrade re-signs after DKMS rebuild)
# 1. Check if MOK signing is configured
if [[ -x /etc/dkms/sign_helper.sh && -f /var/lib/dkms/mok.key ]]; then
# 2. Find all ZFS kernel objects for the new kernel
find "/lib/modules/${KVER}" -name '*.ko' -o -name '*.ko.gz' -o -name '*.ko.zst' \
| grep -i zfs
# 3. Sign each module with the MOK key
/etc/dkms/sign_helper.sh "${KVER}" "${MODULE_PATH}"
fi
Why this matters: on a traditional Linux system, a kernel upgrade can silently
break ZFS, leaving you with an unbootable machine. kldload's kupgrade tool
detects which module delivery method your distro uses (kmod-zfs, zfs-dkms,
or zfs-linux), verifies that modules exist for every installed kernel, rebuilds
via DKMS if needed, re-signs for Secure Boot if configured, and warns you before
you reboot if anything failed. And even if everything goes wrong, the old kernel
is still bootable via ZFSBootMenu — you are never stuck.
4. Boot Environments — Time Travel for Your OS
A boot environment is a snapshot of your root dataset that you can boot into.
It captures the entire state of your operating system — every binary in
/usr, every config file in /etc, every kernel in /boot
— at a single point in time. You can have multiple boot environments
simultaneously, switch between them at boot time, and delete ones you no longer need.
This is the same concept that Solaris had with beadm, that FreeBSD has with
bectl, and that macOS Time Machine wishes it could provide at the OS level.
ZFSBootMenu
kldload uses ZFSBootMenu instead of GRUB. ZFSBootMenu is a UEFI binary that runs before the kernel loads. It scans ZFS pools for bootable datasets, presents them in a menu, and boots the selected one by loading its kernel and initramfs directly from ZFS. This means:
No GRUB configuration
GRUB requires /boot/grub/grub.cfg to be regenerated every time a kernel
is installed. If that file is wrong, the system does not boot. ZFSBootMenu discovers
kernels automatically by reading the ZFS datasets — no configuration file to
break.
Multiple boot environments in one menu
ZFSBootMenu shows every bootable dataset and every snapshot of rpool/ROOT.
You can boot into yesterday's snapshot, last week's snapshot, or a snapshot you
created five minutes before an upgrade. Switching is instant — select from the
menu and press Enter.
Kernel selection per environment
Each boot environment contains its own kernels. If the current environment has a broken kernel, you can boot into a previous environment that has the working kernel. You do not need a rescue disk or external media.
Snapshot rollback from the boot menu
ZFSBootMenu can roll back to a snapshot directly from its menu, before the OS even starts. If a bad upgrade left the system in a broken state, you can fix it from the boot menu without any additional tools.
The Boot Environment Workflow
Here is how boot environments work in practice. The key insight is that this is a proactive safety model — you create the safety net before you need it, not after things go wrong.
# Step 1: Create a boot environment before upgrading
kbe create pre-upgrade
# → Creates snapshot: rpool/ROOT/default@pre-upgrade
# Step 2: Run the upgrade
kupgrade
# → Automatic snapshot + package manager upgrade + ZFS module verification
# Step 3: Reboot and test
sudo reboot
# → System boots with the upgraded kernel and packages
# Step 4a: Everything works — clean up the old snapshot (optional)
kbe delete rpool/ROOT/default@pre-upgrade
# Step 4b: Something is broken — roll back immediately
kbe rollback rpool/ROOT/default@pre-upgrade
sudo reboot
# → System is back to exactly where it was before the upgrade
# → Your data in /home, /srv, /var/lib is untouched
Why This Is Better Than Filesystem Snapshots Alone
You could take a ZFS snapshot before upgrading without boot environments. But
a snapshot alone does not let you boot into the old state. If the upgrade
broke the kernel, you cannot even log in to run zfs rollback. Boot
environments solve this by making the old state directly bootable from the
ZFSBootMenu — you select it from a menu, no login required, no rescue disk
needed. The old kernel, the old libraries, the old init system — all intact,
all bootable.
5. The kupgrade Workflow — Step by Step
kupgrade is the kldload upgrade tool. It wraps your distro's package manager
with boot environment snapshots, ZFS module verification, and rollback instructions.
It is a single bash script that detects your distro, does the right thing, and logs
everything. Here is exactly what happens when you run it:
Step 1: Detect the Active Boot Environment
# kupgrade reads the active boot environment from three sources (in order):
# 1. /etc/kldload/boot-environment (written during install)
# 2. zfs list -Ho name / (ask ZFS what is mounted at /)
# 3. Fall back to rpool/ROOT/default
BE_DATASET=""
if [[ -f /etc/kldload/boot-environment ]]; then
BE_DATASET="$(cat /etc/kldload/boot-environment | head -1)"
fi
if [[ -z "${BE_DATASET}" ]]; then
BE_DATASET="$(zfs list -Ho name /)"
fi
if [[ -z "${BE_DATASET}" ]]; then
BE_DATASET="rpool/ROOT/default"
fi
Step 2: Create Pre-Upgrade Snapshot
# Automatic timestamped snapshot of the root dataset
SNAP="${BE_DATASET}@pre-upgrade-$(date +%Y%m%d-%H%M%S)"
zfs snapshot "${SNAP}"
# Example: rpool/ROOT/default@pre-upgrade-20260408-143022
# This snapshot captures the ENTIRE OS state:
# - All installed packages and their versions
# - All kernel images in /boot
# - All configuration files in /etc
# - All binaries in /usr
# - All ZFS kernel modules in /lib/modules
The snapshot is instantaneous. ZFS snapshots are copy-on-write pointers — they consume zero additional disk space at creation time. Space is only consumed when blocks in the active dataset are modified (which happens during the upgrade). Even then, only the changed blocks are tracked, not full file copies.
Step 3: Run the Package Manager Upgrade
kupgrade detects which package manager is available and runs the appropriate
upgrade command. Each distro family gets the correct invocation:
# RPM distros (CentOS, RHEL, Rocky, Fedora)
dnf upgrade -y
# Debian/Ubuntu
DEBIAN_FRONTEND=noninteractive apt-get update
DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade -y
DEBIAN_FRONTEND=noninteractive apt-get autoremove -y
# Arch Linux (special handling for kernel/ZFS conflicts)
pacman -Syu --noconfirm
# If that fails due to kernel/ZFS version mismatch:
pacman -Syu --noconfirm --ignore linux,linux-headers,linux-firmware
# The kernel stays pinned until archzfs releases a matching zfs-linux
Arch Linux: The Kernel Hold-Back Strategy
Arch is a rolling release. The kernel updates constantly, but the
zfs-linux package from archzfs must be compiled against a specific kernel
version. If the Arch repos ship kernel 6.12.3 but archzfs only supports 6.12.1,
a full pacman -Syu will fail because the ZFS module cannot load on the
new kernel. kupgrade detects this failure, automatically holds back the
kernel, and upgrades everything else. The kernel will upgrade automatically when
archzfs catches up.
Step 4: Verify ZFS Modules for All Installed Kernels
This is the critical step that prevents unbootable systems. After the package
manager finishes, kupgrade iterates over every installed kernel and verifies
that ZFS modules are present and correctly built:
# kupgrade's verification logic (simplified from the actual source):
# 1. CentOS/RHEL/Rocky with kmod-zfs: skip verification
# kmod-zfs is kABI-tracked — it works across kernel micro-updates
if rpm -q kmod-zfs; then
echo "kmod-zfs detected — kABI-tracked, skipping DKMS verification"
exit 0
fi
# 2. Arch with zfs-linux: skip verification
# zfs-linux is a pre-built binary module, already matched to the kernel
if pacman -Q zfs-linux; then
echo "zfs-linux detected — pre-built, skipping DKMS verification"
exit 0
fi
# 3. DKMS distros (Debian, Ubuntu, Fedora): verify every kernel
for kver in /lib/modules/*/; do
kver=$(basename "$kver")
# Check if ZFS DKMS is installed for this kernel
if dkms status -k "$kver" | grep -qi 'zfs.*installed'; then
echo "ZFS OK: $kver"
continue
fi
# Missing — try to rebuild
echo "ZFS missing for $kver — attempting rebuild..."
dkms autoinstall -k "$kver"
# Re-sign with MOK key if Secure Boot is configured
if [[ -x /etc/dkms/sign_helper.sh ]]; then
# Sign all ZFS modules for this kernel
find "/lib/modules/$kver" -name '*.ko*' | grep -i zfs | while read ko; do
/etc/dkms/sign_helper.sh "$kver" "$ko"
done
fi
done
Step 5: Handle Failures
If DKMS fails to build ZFS for a new kernel, kupgrade does not panic. It
prints a clear warning with the exact rollback command:
┌─────────────────────────────────────────────────────────────┐
│ kldload WARNING: ZFS DKMS build failed for one or more │
│ kernels. The previous kernel is still bootable via │
│ ZFSBootMenu. Check: /var/log/kldload/upgrade.log │
│ │
│ To fix: dkms autoinstall -k <kver> │
│ To roll back: kbe rollback <snapshot> │
└─────────────────────────────────────────────────────────────┘
The key point: the old kernel still works. ZFSBootMenu shows all installed kernels. Even if the new kernel lacks ZFS modules, you can boot the old kernel from the ZFSBootMenu menu. You are never locked out of your system because of a failed module build.
Step 6: Log Everything
# All output goes to /var/log/kldload/upgrade.log
# Example log output:
2026-04-08T14:30:22 [kupgrade] ===================================================================
2026-04-08T14:30:22 [kupgrade] kldload upgrade started
2026-04-08T14:30:22 [kupgrade] ===================================================================
2026-04-08T14:30:22 [kupgrade] Detected package manager: dnf
2026-04-08T14:30:22 [kupgrade] Boot environment dataset: rpool/ROOT/default
2026-04-08T14:30:22 [kupgrade] Creating pre-upgrade boot environment: rpool/ROOT/default@pre-upgrade-20260408-143022
2026-04-08T14:30:22 [kupgrade] Roll back with: kbe rollback rpool/ROOT/default@pre-upgrade-20260408-143022
2026-04-08T14:30:22 [kupgrade] Running: dnf upgrade
... (full dnf output) ...
2026-04-08T14:32:15 [kupgrade] kmod-zfs detected — ZFS modules are kABI-tracked, skipping DKMS verification
2026-04-08T14:32:15 [kupgrade] ===================================================================
2026-04-08T14:32:15 [kupgrade] kldload upgrade complete
2026-04-08T14:32:15 [kupgrade] Pre-upgrade snapshot: rpool/ROOT/default@pre-upgrade-20260408-143022
2026-04-08T14:32:15 [kupgrade] Log: /var/log/kldload/upgrade.log
2026-04-08T14:32:15 [kupgrade] ===================================================================
6. The kbe Tool — Boot Environment Management
kbe (kldload Boot Environment) is a standalone tool for managing boot
environment snapshots. It is the manual counterpart to kupgrade's automatic
snapshots — use it when you want to create, inspect, activate, roll back,
or delete boot environments outside of the upgrade workflow.
kbe list
Show all boot environments, their creation dates, and which one is active:
$ sudo kbe list
Boot environments (rpool/ROOT):
---
NAME CREATION USED
rpool/ROOT Tue Apr 1 10:00 96K
rpool/ROOT/default Tue Apr 1 10:00 4.2G
Active bootfs:
rpool/ROOT/default
# With snapshots visible:
$ zfs list -t snapshot -r rpool/ROOT
NAME USED REFER
rpool/ROOT/default@install 128M 3.8G
rpool/ROOT/default@pre-upgrade-20260401-140000 256M 3.9G
rpool/ROOT/default@pre-upgrade-20260408-143022 12M 4.2G
kbe create
Create a named boot environment snapshot. Use this before any risky operation — not just upgrades, but also configuration changes, service deployments, or experimental package installations:
# Before a major configuration change
$ sudo kbe create before-nginx-rewrite
[kbe] Snapshot created: rpool/ROOT/default@before-nginx-rewrite
# Before testing a new package
$ sudo kbe create before-experimental-driver
[kbe] Snapshot created: rpool/ROOT/default@before-experimental-driver
# Before a distro major version upgrade
$ sudo kbe create debian12-baseline
[kbe] Snapshot created: rpool/ROOT/default@debian12-baseline
kbe activate
Set which boot environment will be used on the next boot. This changes the
bootfs property on the pool, which ZFSBootMenu reads to determine the
default boot target:
# Switch to a previous boot environment
$ sudo kbe activate rpool/ROOT/default@before-nginx-rewrite
[kbe] Next boot will use: rpool/ROOT/default
# Reboot to apply
$ sudo reboot
kbe rollback
Destructively roll back to a snapshot. This discards all changes made after the
snapshot was taken. kbe gives you a 5-second countdown to abort:
$ sudo kbe rollback rpool/ROOT/default@pre-upgrade-20260408-143022
[kbe] WARNING: Rolling back will DISCARD all changes made after
rpool/ROOT/default@pre-upgrade-20260408-143022.
[kbe] Press Ctrl+C within 5 seconds to abort...
[kbe] Rollback complete. Reboot to apply.
$ sudo reboot
# System boots with the exact OS state from before the upgrade
# /home, /srv, /var/lib — all untouched
Rollback Is Instant
A ZFS rollback does not copy data. It moves a pointer. The rollback itself takes less than a second regardless of how many gigabytes changed since the snapshot. The only time cost is the reboot. Compare this to restoring a backup, which requires copying gigabytes of data from a backup medium, and which may take hours on a large system.
kbe delete
Remove a boot environment snapshot you no longer need. This frees the space consumed by blocks that were modified since the snapshot:
# Clean up old snapshots
$ sudo kbe delete rpool/ROOT/default@pre-upgrade-20260401-140000
[kbe] Deleted: rpool/ROOT/default@pre-upgrade-20260401-140000
# Check how much space was reclaimed
$ zfs list -o name,used,avail rpool
NAME USED AVAIL
rpool 4.8G 95.2G
7. Comparison: Traditional vs kldload Upgrades
Here is the difference laid out side by side. This is not a theoretical comparison — it is what actually happens on real systems when things go wrong.
Traditional Linux Upgrade
Before upgrade: No preparation. No snapshot. No safety net.
During upgrade: Package manager modifies the live root filesystem in-place. Files are overwritten as they are downloaded. Interruption = partial state.
After upgrade: Reboot and hope. If the kernel boots, SSH in and check that services are running. If it does not boot, find a rescue disk.
On failure: Boot from USB/CD. Mount the root filesystem manually. Chroot in. Try to downgrade packages (may not be possible if old versions are no longer in the repo). Reinstall if desperate. 1–4 hours of downtime. Data loss possible if the root filesystem was corrupted.
kldload Upgrade (kupgrade)
Before upgrade: Automatic ZFS snapshot of the root dataset. Zero additional disk space. Instant.
During upgrade: Package manager runs normally. All changes go to new blocks (copy-on-write). The snapshot preserves the old state regardless of what happens.
After upgrade: ZFS modules verified for every kernel. Secure Boot signatures refreshed. Clear log of everything that happened.
On failure: kbe rollback <snapshot> && reboot. Or select the old
kernel from ZFSBootMenu. 10 seconds of downtime. Zero data loss. Your databases,
containers, and home directories were never touched.
Kernel Panic After Upgrade
Traditional: Machine does not boot. No remote access. Drive to the datacenter or open a support ticket. Boot from USB. Manually roll back kernel packages. Regenerate GRUB config. Pray again.
kldload: ZFSBootMenu shows the old kernel. Select it. Press Enter. System
boots. Run kbe rollback if you want to undo the entire upgrade, or keep the
old kernel and debug the new one at your leisure.
ZFS Module Missing After Kernel Update
Traditional: Kernel boots but cannot mount root filesystem (because root is
ZFS and zfs.ko is missing). Drops to initramfs emergency shell. Hope you
know how to manually load modules or switch kernels from there.
kldload: kupgrade detected the missing module before you rebooted and
warned you. If you rebooted anyway, ZFSBootMenu lets you select the old kernel that
has working ZFS modules.
The difference is architectural, not cosmetic. Traditional Linux upgrades are unsafe because the filesystem model is unsafe — one partition, live mutation, no rollback. kldload upgrades are safe because the storage model is safe — copy-on-write, dataset separation, instant snapshots, and a boot loader that can switch between any boot environment without touching the disk. Safety is not a feature bolted on top. It is the foundation.
8. Real-World Scenarios
Theory is useful. Practice is what matters at 2am when your pager goes off. Here are real scenarios and exactly how to handle them with kldload's upgrade tooling.
Scenario 1: Kernel Upgrade Breaks NVIDIA Driver
You run kupgrade on a workstation with an NVIDIA GPU. The kernel updates
from 6.8.0-41 to 6.8.0-45. The NVIDIA DKMS module fails to compile against the
new kernel because NVIDIA has not released a driver compatible with the new kernel's
internal API changes.
# kupgrade already created the snapshot and warned you:
# "ZFS DKMS build failed for kernel 6.8.0-45"
# Option A: Roll back the entire upgrade
$ sudo kbe rollback rpool/ROOT/default@pre-upgrade-20260408-143022
$ sudo reboot
# You are back on 6.8.0-41 with working NVIDIA and ZFS. Total time: 30 seconds.
# Option B: Keep the upgrade, boot the old kernel temporarily
# Select 6.8.0-41 from ZFSBootMenu at boot time.
# Wait for NVIDIA to release a compatible driver.
# Then: dkms autoinstall -k 6.8.0-45
# Switch to the new kernel when it works.
Scenario 2: Major Distro Version Upgrade (Debian 12 to 13)
Major version upgrades are the highest-risk operation in Linux system administration. Thousands of packages change simultaneously. Library ABIs break. Init system configurations change. Service names change. This is where boot environments truly shine.
# Step 1: Create a named boot environment
$ sudo kbe create debian12-final
[kbe] Snapshot created: rpool/ROOT/default@debian12-final
# Step 2: Update sources.list to trixie
$ sudo sed -i 's/bookworm/trixie/g' /etc/apt/sources.list
# Step 3: Run the upgrade
$ sudo apt update
$ sudo apt full-upgrade -y
# ... (this takes a while — hundreds of packages)
# Step 4: Reboot and test
$ sudo reboot
# Step 5a: Everything works — you're on Debian 13. Celebrate.
$ cat /etc/debian_version
13.0
# Step 5b: Something critical is broken (mail server, database, custom app)
$ sudo kbe rollback rpool/ROOT/default@debian12-final
$ sudo reboot
# You are back on Debian 12 exactly as it was.
# Fix the issue, research the migration path, try again later.
# Your data never moved. Your databases never changed.
Scenario 3: Failed Package Dependency Chain
You install a package that pulls in 30 dependencies, one of which conflicts with something already installed. The package manager resolves the conflict by removing a critical package. Your monitoring stack stops working.
# If you used kbe create before installing:
$ sudo kbe rollback rpool/ROOT/default@before-package-install
$ sudo reboot
# Every package is back to its previous state. The conflicting install never happened.
# If you forgot to snapshot (kupgrade does this automatically, but manual installs don't):
# You can still use ZFS snapshots directly:
$ zfs list -t snapshot -r rpool/ROOT/default
# Find the most recent snapshot before the bad install
$ sudo zfs rollback rpool/ROOT/default@pre-upgrade-20260408-143022
$ sudo reboot
Scenario 4: Testing a Bleeding-Edge Kernel
You want to test kernel 6.13-rc1 on your development machine to evaluate new eBPF features. This is an unsigned release candidate with no ZFS support yet.
# Step 1: Snapshot current state
$ sudo kbe create stable-6.8-baseline
# Step 2: Install the RC kernel
$ sudo dpkg -i linux-image-6.13.0-rc1_amd64.deb
# Step 3: Boot into it (select from ZFSBootMenu)
# Note: ZFS will NOT work on this kernel (no module). That's fine —
# ZFSBootMenu loads ZFS in initramfs before the kernel needs it.
# The root filesystem is already mounted by the time init starts.
# Step 4: Test your eBPF programs, benchmark performance, check hardware support
# Step 5: Done testing — remove the RC kernel
$ sudo dpkg -r linux-image-6.13.0-rc1
# Or just roll back:
$ sudo kbe rollback rpool/ROOT/default@stable-6.8-baseline
$ sudo reboot
Scenario 5: Production Server Fleet Upgrade
You manage 50 servers running CentOS Stream 9 with kldload. A critical OpenSSL CVE requires an immediate upgrade across the fleet.
# On each server (or via Ansible/SSH loop):
$ sudo kupgrade
# kupgrade:
# 1. Snapshots rpool/ROOT/default@pre-upgrade-20260408-...
# 2. Runs dnf upgrade -y (installs patched openssl)
# 3. Verifies kmod-zfs is intact (kABI-tracked, always passes)
# 4. Logs everything
# Verify:
$ rpm -q openssl
openssl-3.0.7-27.el9.x86_64 # patched version
# If any server has issues after reboot:
$ sudo kbe rollback rpool/ROOT/default@pre-upgrade-20260408-...
$ sudo reboot
# Server is back to pre-patch state in seconds.
# Investigate, fix, re-upgrade when ready.
The pattern is always the same: snapshot, upgrade, verify, keep or roll back. It does not matter whether you are patching OpenSSL on a production server, upgrading from Debian 12 to 13, testing an experimental kernel, or installing a risky package. The safety net is always there, it costs nothing to create, and it takes seconds to use. This is what infrastructure should feel like.
Further Reading
- kupgrade Tutorial — quick-start guide with common upgrade recipes
- Boot Environments Tutorial — hands-on guide to creating and managing boot environments
- ZFS Without GRUB — how ZFSBootMenu replaces GRUB and why it matters
- ZFS Masterclass — deep dive on pool design, snapshots, replication, and tuning
- ZFS Zero to Hero — foundational ZFS tutorial for beginners
- Snapshots Guide — comprehensive guide to ZFS snapshots and rollback
- Build ZFS from Scratch — how the ISO build pipeline compiles ZFS DKMS
- Secure Boot & the Boot Chain — MOK enrollment, module signing, and the UEFI trust chain
- Editions & Profiles — how Desktop, Server, and Core profiles affect the upgrade story
- ZFS Wiki: Boot Chain — technical reference for the ZFS boot process