Documentation

Upgrades & Boot Environments Masterclass

This guide covers the hardest unsolved problem in Linux system administration: how to upgrade an operating system without risking downtime, data loss, or an unbootable machine. It explains why traditional package manager upgrades are fundamentally unsafe, how ZFS dataset architecture separates OS state from user data, how kldload produces and manages kernel modules across six different distro families, and how boot environments give you instant rollback — the same capability that Solaris had in 2005, that FreeBSD has with bectl, and that mainstream Linux still does not provide natively.

What this page covers: the structural problems with traditional Linux upgrades, ZFS dataset architecture as an upgrade safety net, kernel module production via DKMS across all supported distros, ZFSBootMenu and boot environments, the kupgrade workflow step by step, the kbe boot environment management tool, side-by-side comparison of traditional vs kldload upgrades, and real-world rollback scenarios.

Prerequisites: the ZFS Zero to Hero tutorial and basic familiarity with your distro's package manager. If you have ever run apt upgrade or dnf update and held your breath, this page is for you.

1. The Problem With Traditional Linux Upgrades

Every Linux distribution ships with a package manager — apt on Debian/Ubuntu, dnf on Fedora/RHEL/CentOS/Rocky, pacman on Arch. When you run an upgrade, the package manager downloads new versions of installed packages and overwrites files on the live root filesystem. Libraries are replaced in-place. Kernel images are swapped. Configuration files are merged (or not). The entire operation mutates the running system, and there is no built-in mechanism to undo it.

Think about what this means. You have a production server running a database, a web application, and a monitoring stack. You run dnf upgrade -y. The package manager begins replacing hundreds of files across /usr, /lib, /etc, and /boot. If the power goes out halfway through, you have a partially upgraded system with mismatched library versions. If the new kernel does not support your storage driver, you cannot boot. If a package conflict leaves dpkg in a broken state, you are manually resolving dependencies in single-user mode.

The "Reboot and Pray" Anti-Pattern

In traditional Linux, upgrading a kernel requires a reboot. You have no way to verify that the new kernel will boot successfully until you are already committed to it. If the new kernel lacks a critical module — say, the ZFS driver, or an NVIDIA GPU driver, or a network card firmware — you discover this after the old kernel is no longer running. Recovery requires booting from external media, chrooting into the broken system, and manually fixing packages. On a remote server with no physical access, this means an emergency support ticket and hours of downtime.

Traditional upgrade: pull the engine out of a moving car, install the new one, hope the car keeps driving.

Why Package Managers Cannot Solve This

Package managers are designed to install, remove, and update individual packages. They are not designed to be transactional systems. Consider the failure modes:

No Atomic Rollback

apt upgrade installs packages sequentially. If package 47 of 200 fails, packages 1–46 are already installed. There is no single command to undo the partial upgrade. apt does not even track what the previous state was — it tracks what is currently installed.

Live Filesystem Mutation

Packages overwrite files on the running root filesystem. A library update to libssl takes effect immediately for newly started processes, but running processes still hold the old version in memory. You can end up with two versions of the same library active simultaneously — and if the ABI changed, things break in subtle ways.

Held Packages and Dependency Hell

When packages conflict, the package manager "holds" them. Over time, held packages accumulate. A system that has been running for two years may have dozens of held packages, each blocking something else. The only reliable fix is a fresh install — which means rebuilding the entire machine.

No OS/Data Separation

On a traditional ext4 or xfs system, / is one filesystem. The kernel, the application data, the database, and the user's home directory are all on the same partition. Upgrading the OS means touching the same storage that holds your data. A failed upgrade can corrupt the entire disk.

The fundamental problem: traditional Linux treats the operating system as a mutable pile of files. Every upgrade is a destructive in-place modification with no undo. Every reboot after an upgrade is a gamble. Every kernel update is a prayer. This is not how production infrastructure should work.

# The traditional upgrade workflow — hope as a strategy
sudo apt update
sudo apt upgrade -y          # modifies live root filesystem in-place
sudo reboot                  # pray the new kernel boots
# ... 30 seconds of silence ...
# Did it come back? Check SSH. No response? Drive to the datacenter.

2. Storage Separation — Why Dataset Architecture Matters

The single most important design decision kldload makes is not putting everything on one filesystem. When the installer creates a ZFS pool, it builds a hierarchy of datasets — each with its own mountpoint, its own snapshot timeline, its own tunable properties, and its own independent lifecycle. The operating system lives in one dataset. Your data lives in others. They share the same pool (and therefore the same disk space), but they are logically independent.

Here is the dataset layout that kldload creates during installation:

# kldload ZFS dataset layout — created by storage-zfs.sh during install
rpool                           # pool root (canmount=off)
├── ROOT                        # boot environment container (canmount=off)
│   └── default                 # / (the active boot environment)
├── root                        # /root (admin home directory)
├── home                        # /home (canmount=on)
│   └── <username>             # /home/<username> (per-user dataset)
├── srv                         # /srv (application data)
├── opt                         # /opt (optional packages)
├── usr                         # /usr container (canmount=off)
│   └── local                   # /usr/local
└── var                         # /var container (canmount=off)
    ├── cache                   # /var/cache
    ├── lib                     # /var/lib
    ├── log                     # /var/log
    ├── spool                   # /var/spool
    └── tmp                     # /var/tmp

Why Each Dataset Is Separate

Every ZFS dataset has its own snapshot namespace. When you snapshot rpool/ROOT/default, you capture the OS state without touching rpool/home, rpool/srv, or rpool/var/lib. Rolling back the OS does not roll back your databases, your container images, your home directories, or your application data. This is the key insight: the OS is disposable, the data is not.

Think of it as version control for your OS. You can git checkout a previous commit (boot environment) without losing your working directory (data datasets).

Dataset Properties Per Workload

Because each dataset is independent, you can tune storage properties per workload. A database needs small record sizes for random I/O. A media server needs large record sizes for sequential streaming. A container runtime needs different compression settings than a log directory. ZFS lets you set these per dataset, and they take effect without reformatting anything:

# Workload-specific datasets created by kldload profiles
# (KVM and Server profiles add these on top of the base layout)

# Kubernetes etcd — 8K recordsize for small key-value writes
zfs create -o recordsize=8K -o compression=lz4 \
  -o primarycache=metadata rpool/var/lib/etcd

# Container storage — 64K recordsize, lz4 compression
zfs create -o recordsize=64K -o compression=lz4 \
  rpool/var/lib/containers/storage/zfs

# KVM virtual machine disks — 64K block size, metadata caching only
zfs create -o compression=lz4 -o recordsize=64K \
  -o primarycache=metadata rpool/vms

# Application data — 1M recordsize for large file streaming
zfs create -o compression=zstd -o recordsize=1M rpool/srv

# AI model storage — 1M recordsize, zstd compression for large blobs
zfs create -o compression=zstd -o recordsize=1M rpool/srv/ollama

8K recordsize (etcd, PostgreSQL)

Databases perform small random reads and writes. An 8K recordsize matches the database page size, eliminating read amplification. ZFS reads exactly one database page per I/O instead of pulling in 128K of surrounding data that will never be used.

64K recordsize (VMs, containers)

Virtual machine disk images and container layers have mixed I/O patterns. 64K is a compromise that works well for both random and sequential access. It also matches the default block size of most guest filesystems.

128K recordsize (general purpose)

The ZFS default. Good for home directories, configuration files, source code, and mixed workloads. Most files are read sequentially and written in full, so larger records mean fewer I/O operations and better compression ratios.

1M recordsize (media, AI models)

Large sequential files — video, audio, machine learning models, ISO images — benefit from maximum record size. A 4GB file stored with 1M records is 4,096 blocks instead of 32,768 blocks at 128K. Fewer blocks means less metadata overhead and faster sequential throughput.

This is what Solaris got right in 2005. Sun Microsystems designed ZFS with the understanding that an operating system and its data have different lifecycles. The OS should be snapshottable, rollbackable, and disposable. The data should persist independently. Twenty years later, mainstream Linux still puts everything on one ext4 partition. kldload brings the Solaris dataset model to every supported distro — CentOS, Debian, Ubuntu, Fedora, RHEL, Rocky, Arch — because this is the correct architecture for any system that you intend to upgrade more than once.

3. How kldload Produces Kernel Modules — And Why It Matters

ZFS is not in the Linux kernel. It will never be in the Linux kernel. The reason is licensing: ZFS is released under the CDDL (Common Development and Distribution License), which is incompatible with the GPL (GNU General Public License) that covers the Linux kernel. You cannot statically link CDDL code into a GPL binary. This means ZFS must be built and loaded as an out-of-tree kernel module — zfs.ko — and that module must be compiled against the exact kernel headers of the running kernel.

This creates a hard dependency chain: every kernel upgrade requires a matching ZFS module. If the kernel updates and ZFS does not, the system cannot mount its root filesystem. On a ZFS-on-root system, this means the machine does not boot. This is the single most common cause of ZFS breakage on Linux, and every distribution handles it differently.

The Four Approaches to ZFS Module Delivery

Pre-built kmod packages (CentOS/RHEL/Rocky)

The ZFS on Linux project ships kmod-zfs RPMs that are compiled against specific kernel ABIs. Red Hat enterprise kernels maintain a stable kABI (kernel ABI) — the internal function signatures that modules depend on do not change within a minor release. This means one kmod-zfs binary works across dozens of kernel micro-updates. When the kABI breaks (major kernel update), a new kmod-zfs RPM is released. This is the most reliable method but depends on the ZFS project tracking Red Hat's release schedule.

DKMS (Debian/Ubuntu/Fedora)

DKMS (Dynamic Kernel Module Support) is a framework that automatically recompiles kernel modules whenever a new kernel is installed. The zfs-dkms package ships ZFS source code and a dkms.conf that tells DKMS how to build it. When apt install linux-image-6.x runs, a DKMS hook triggers dkms autoinstall, which compiles zfs.ko for the new kernel. This works well when it works — but it requires a working compiler toolchain, matching kernel headers, and enough disk space for the build. If any of those are missing, the build silently fails and the system will not boot the new kernel.

Pre-built binary (Arch Linux)

The Arch ZFS community maintains zfs-linux packages in the AUR (Arch User Repository) that are pre-compiled against specific Arch kernel versions. Because Arch is a rolling release, the kernel updates constantly. The zfs-linux maintainers must release a new package every time the kernel updates. If the kernel updates before zfs-linux catches up, users must hold the kernel back. kupgrade handles this automatically by detecting the conflict and upgrading everything except the kernel.

Manual compilation

Download the OpenZFS source, run ./configure && make && make install, hope you got the right version for your kernel. This is what people did before DKMS existed. It requires expert knowledge, breaks on every kernel update, and provides no automatic rebuild mechanism. Nobody should do this on a production system.

How kldload Handles This

kldload takes a two-stage approach to ZFS module production that ensures the module is always correctly compiled for the target system:

Stage 1: ISO build time. When build-iso.sh assembles the live ISO, it installs zfs-dkms into the live environment's rootfs and compiles ZFS against the live kernel. This is done inside the builder container with full compiler toolchain access:

# From build-iso.sh — ZFS DKMS build inside the live rootfs
# The builder container has: gcc, make, autoconf, automake, libtool, kernel-devel

# Get the ZFS DKMS version
ZFS_VER=$(chroot "$ROOTFS" rpm -q --qf '%{VERSION}' zfs-dkms)

# Clean slate
chroot "$ROOTFS" dkms remove -m zfs -v "$ZFS_VER" --all 2>/dev/null || true
chroot "$ROOTFS" dkms add -m zfs -v "$ZFS_VER"

# Build against the exact kernel installed in the rootfs
chroot "$ROOTFS" env ARCH=x86_64 dkms build -m zfs -v "$ZFS_VER" -k "$KVER"

# Install the compiled module
chroot "$ROOTFS" dkms install -m zfs -v "$ZFS_VER" -k "$KVER" --force

Stage 2: Target install time. When the user installs a distro (say, Debian 13), the installer bootstraps that distro with debootstrap and installs zfs-dkms and the target kernel's headers. DKMS then compiles ZFS against the target distro's kernel — not the live ISO's kernel. This is critical: the target system's kernel may be completely different from the live environment's kernel. CentOS 9 runs kernel 5.14. Debian 13 runs kernel 6.x. Ubuntu 24.04 runs kernel 6.8. Fedora 41 runs kernel 6.11. Each needs its own ZFS module.

# During distro installation — DKMS rebuilds for the TARGET kernel
# (this happens automatically via package manager hooks)

# Debian/Ubuntu: apt triggers dkms autoinstall
apt-get install -y zfs-dkms linux-headers-$(uname -r)
# DKMS hook fires → compiles zfs.ko for the installed kernel

# Fedora: dnf triggers dkms autoinstall
dnf install -y zfs-dkms kernel-devel
# DKMS hook fires → compiles zfs.ko for the installed kernel

# CentOS/RHEL/Rocky: uses pre-built kmod instead of DKMS
dnf install -y kmod-zfs
# No compilation needed — kABI-tracked binary module

# Arch: uses pre-built zfs-linux from archzfs repo
pacman -S zfs-linux zfs-utils
# No compilation needed — pre-built for the exact kernel version

Module Signing and Secure Boot

On systems with Secure Boot enabled, kernel modules must be cryptographically signed with a key enrolled in the machine's MOK (Machine Owner Key) database. DKMS-compiled modules are unsigned by default — the kernel will refuse to load them, and the system will not boot.

kldload handles this with a MOK signing chain: during install, a signing key pair is generated, the ZFS modules are signed with the private key, and the public key is enrolled in the MOK database via mokutil. After reboot, the user confirms the MOK enrollment at the UEFI firmware prompt. From that point forward, DKMS modules signed with that key are trusted by the kernel.

# MOK signing flow (kupgrade re-signs after DKMS rebuild)
# 1. Check if MOK signing is configured
if [[ -x /etc/dkms/sign_helper.sh && -f /var/lib/dkms/mok.key ]]; then
    # 2. Find all ZFS kernel objects for the new kernel
    find "/lib/modules/${KVER}" -name '*.ko' -o -name '*.ko.gz' -o -name '*.ko.zst' \
        | grep -i zfs

    # 3. Sign each module with the MOK key
    /etc/dkms/sign_helper.sh "${KVER}" "${MODULE_PATH}"
fi

Why this matters: on a traditional Linux system, a kernel upgrade can silently break ZFS, leaving you with an unbootable machine. kldload's kupgrade tool detects which module delivery method your distro uses (kmod-zfs, zfs-dkms, or zfs-linux), verifies that modules exist for every installed kernel, rebuilds via DKMS if needed, re-signs for Secure Boot if configured, and warns you before you reboot if anything failed. And even if everything goes wrong, the old kernel is still bootable via ZFSBootMenu — you are never stuck.

4. Boot Environments — Time Travel for Your OS

A boot environment is a snapshot of your root dataset that you can boot into. It captures the entire state of your operating system — every binary in /usr, every config file in /etc, every kernel in /boot — at a single point in time. You can have multiple boot environments simultaneously, switch between them at boot time, and delete ones you no longer need. This is the same concept that Solaris had with beadm, that FreeBSD has with bectl, and that macOS Time Machine wishes it could provide at the OS level.

ZFSBootMenu

kldload uses ZFSBootMenu instead of GRUB. ZFSBootMenu is a UEFI binary that runs before the kernel loads. It scans ZFS pools for bootable datasets, presents them in a menu, and boots the selected one by loading its kernel and initramfs directly from ZFS. This means:

No GRUB configuration

GRUB requires /boot/grub/grub.cfg to be regenerated every time a kernel is installed. If that file is wrong, the system does not boot. ZFSBootMenu discovers kernels automatically by reading the ZFS datasets — no configuration file to break.

Multiple boot environments in one menu

ZFSBootMenu shows every bootable dataset and every snapshot of rpool/ROOT. You can boot into yesterday's snapshot, last week's snapshot, or a snapshot you created five minutes before an upgrade. Switching is instant — select from the menu and press Enter.

Kernel selection per environment

Each boot environment contains its own kernels. If the current environment has a broken kernel, you can boot into a previous environment that has the working kernel. You do not need a rescue disk or external media.

Snapshot rollback from the boot menu

ZFSBootMenu can roll back to a snapshot directly from its menu, before the OS even starts. If a bad upgrade left the system in a broken state, you can fix it from the boot menu without any additional tools.

The Boot Environment Workflow

Here is how boot environments work in practice. The key insight is that this is a proactive safety model — you create the safety net before you need it, not after things go wrong.

# Step 1: Create a boot environment before upgrading
kbe create pre-upgrade
# → Creates snapshot: rpool/ROOT/default@pre-upgrade

# Step 2: Run the upgrade
kupgrade
# → Automatic snapshot + package manager upgrade + ZFS module verification

# Step 3: Reboot and test
sudo reboot
# → System boots with the upgraded kernel and packages

# Step 4a: Everything works — clean up the old snapshot (optional)
kbe delete rpool/ROOT/default@pre-upgrade

# Step 4b: Something is broken — roll back immediately
kbe rollback rpool/ROOT/default@pre-upgrade
sudo reboot
# → System is back to exactly where it was before the upgrade
# → Your data in /home, /srv, /var/lib is untouched

Why This Is Better Than Filesystem Snapshots Alone

You could take a ZFS snapshot before upgrading without boot environments. But a snapshot alone does not let you boot into the old state. If the upgrade broke the kernel, you cannot even log in to run zfs rollback. Boot environments solve this by making the old state directly bootable from the ZFSBootMenu — you select it from a menu, no login required, no rescue disk needed. The old kernel, the old libraries, the old init system — all intact, all bootable.

Snapshots are save files. Boot environments are save files that you can load from the title screen without starting the game first.

5. The kupgrade Workflow — Step by Step

kupgrade is the kldload upgrade tool. It wraps your distro's package manager with boot environment snapshots, ZFS module verification, and rollback instructions. It is a single bash script that detects your distro, does the right thing, and logs everything. Here is exactly what happens when you run it:

Step 1: Detect the Active Boot Environment

# kupgrade reads the active boot environment from three sources (in order):
# 1. /etc/kldload/boot-environment (written during install)
# 2. zfs list -Ho name / (ask ZFS what is mounted at /)
# 3. Fall back to rpool/ROOT/default

BE_DATASET=""
if [[ -f /etc/kldload/boot-environment ]]; then
    BE_DATASET="$(cat /etc/kldload/boot-environment | head -1)"
fi
if [[ -z "${BE_DATASET}" ]]; then
    BE_DATASET="$(zfs list -Ho name /)"
fi
if [[ -z "${BE_DATASET}" ]]; then
    BE_DATASET="rpool/ROOT/default"
fi

Step 2: Create Pre-Upgrade Snapshot

# Automatic timestamped snapshot of the root dataset
SNAP="${BE_DATASET}@pre-upgrade-$(date +%Y%m%d-%H%M%S)"
zfs snapshot "${SNAP}"
# Example: rpool/ROOT/default@pre-upgrade-20260408-143022

# This snapshot captures the ENTIRE OS state:
# - All installed packages and their versions
# - All kernel images in /boot
# - All configuration files in /etc
# - All binaries in /usr
# - All ZFS kernel modules in /lib/modules

The snapshot is instantaneous. ZFS snapshots are copy-on-write pointers — they consume zero additional disk space at creation time. Space is only consumed when blocks in the active dataset are modified (which happens during the upgrade). Even then, only the changed blocks are tracked, not full file copies.

Step 3: Run the Package Manager Upgrade

kupgrade detects which package manager is available and runs the appropriate upgrade command. Each distro family gets the correct invocation:

# RPM distros (CentOS, RHEL, Rocky, Fedora)
dnf upgrade -y

# Debian/Ubuntu
DEBIAN_FRONTEND=noninteractive apt-get update
DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade -y
DEBIAN_FRONTEND=noninteractive apt-get autoremove -y

# Arch Linux (special handling for kernel/ZFS conflicts)
pacman -Syu --noconfirm
# If that fails due to kernel/ZFS version mismatch:
pacman -Syu --noconfirm --ignore linux,linux-headers,linux-firmware
# The kernel stays pinned until archzfs releases a matching zfs-linux

Arch Linux: The Kernel Hold-Back Strategy

Arch is a rolling release. The kernel updates constantly, but the zfs-linux package from archzfs must be compiled against a specific kernel version. If the Arch repos ship kernel 6.12.3 but archzfs only supports 6.12.1, a full pacman -Syu will fail because the ZFS module cannot load on the new kernel. kupgrade detects this failure, automatically holds back the kernel, and upgrades everything else. The kernel will upgrade automatically when archzfs catches up.

kupgrade on Arch: upgrade the house but keep the foundation stable until the new foundation is ready.

Step 4: Verify ZFS Modules for All Installed Kernels

This is the critical step that prevents unbootable systems. After the package manager finishes, kupgrade iterates over every installed kernel and verifies that ZFS modules are present and correctly built:

# kupgrade's verification logic (simplified from the actual source):

# 1. CentOS/RHEL/Rocky with kmod-zfs: skip verification
#    kmod-zfs is kABI-tracked — it works across kernel micro-updates
if rpm -q kmod-zfs; then
    echo "kmod-zfs detected — kABI-tracked, skipping DKMS verification"
    exit 0
fi

# 2. Arch with zfs-linux: skip verification
#    zfs-linux is a pre-built binary module, already matched to the kernel
if pacman -Q zfs-linux; then
    echo "zfs-linux detected — pre-built, skipping DKMS verification"
    exit 0
fi

# 3. DKMS distros (Debian, Ubuntu, Fedora): verify every kernel
for kver in /lib/modules/*/; do
    kver=$(basename "$kver")

    # Check if ZFS DKMS is installed for this kernel
    if dkms status -k "$kver" | grep -qi 'zfs.*installed'; then
        echo "ZFS OK: $kver"
        continue
    fi

    # Missing — try to rebuild
    echo "ZFS missing for $kver — attempting rebuild..."
    dkms autoinstall -k "$kver"

    # Re-sign with MOK key if Secure Boot is configured
    if [[ -x /etc/dkms/sign_helper.sh ]]; then
        # Sign all ZFS modules for this kernel
        find "/lib/modules/$kver" -name '*.ko*' | grep -i zfs | while read ko; do
            /etc/dkms/sign_helper.sh "$kver" "$ko"
        done
    fi
done

Step 5: Handle Failures

If DKMS fails to build ZFS for a new kernel, kupgrade does not panic. It prints a clear warning with the exact rollback command:

  ┌─────────────────────────────────────────────────────────────┐
  │  kldload WARNING: ZFS DKMS build failed for one or more    │
  │  kernels. The previous kernel is still bootable via         │
  │  ZFSBootMenu. Check: /var/log/kldload/upgrade.log          │
  │                                                             │
  │  To fix:  dkms autoinstall -k <kver>                       │
  │  To roll back: kbe rollback <snapshot>                     │
  └─────────────────────────────────────────────────────────────┘

The key point: the old kernel still works. ZFSBootMenu shows all installed kernels. Even if the new kernel lacks ZFS modules, you can boot the old kernel from the ZFSBootMenu menu. You are never locked out of your system because of a failed module build.

Step 6: Log Everything

# All output goes to /var/log/kldload/upgrade.log
# Example log output:
2026-04-08T14:30:22 [kupgrade] ===================================================================
2026-04-08T14:30:22 [kupgrade] kldload upgrade started
2026-04-08T14:30:22 [kupgrade] ===================================================================
2026-04-08T14:30:22 [kupgrade] Detected package manager: dnf
2026-04-08T14:30:22 [kupgrade] Boot environment dataset: rpool/ROOT/default
2026-04-08T14:30:22 [kupgrade] Creating pre-upgrade boot environment: rpool/ROOT/default@pre-upgrade-20260408-143022
2026-04-08T14:30:22 [kupgrade] Roll back with:  kbe rollback rpool/ROOT/default@pre-upgrade-20260408-143022
2026-04-08T14:30:22 [kupgrade] Running: dnf upgrade
... (full dnf output) ...
2026-04-08T14:32:15 [kupgrade] kmod-zfs detected — ZFS modules are kABI-tracked, skipping DKMS verification
2026-04-08T14:32:15 [kupgrade] ===================================================================
2026-04-08T14:32:15 [kupgrade] kldload upgrade complete
2026-04-08T14:32:15 [kupgrade] Pre-upgrade snapshot: rpool/ROOT/default@pre-upgrade-20260408-143022
2026-04-08T14:32:15 [kupgrade] Log: /var/log/kldload/upgrade.log
2026-04-08T14:32:15 [kupgrade] ===================================================================

6. The kbe Tool — Boot Environment Management

kbe (kldload Boot Environment) is a standalone tool for managing boot environment snapshots. It is the manual counterpart to kupgrade's automatic snapshots — use it when you want to create, inspect, activate, roll back, or delete boot environments outside of the upgrade workflow.

kbe list

Show all boot environments, their creation dates, and which one is active:

$ sudo kbe list
Boot environments (rpool/ROOT):
---
NAME                                      CREATION              USED
rpool/ROOT                                Tue Apr  1 10:00  96K
rpool/ROOT/default                        Tue Apr  1 10:00  4.2G

Active bootfs:
rpool/ROOT/default

# With snapshots visible:
$ zfs list -t snapshot -r rpool/ROOT
NAME                                                USED  REFER
rpool/ROOT/default@install                         128M  3.8G
rpool/ROOT/default@pre-upgrade-20260401-140000     256M  3.9G
rpool/ROOT/default@pre-upgrade-20260408-143022      12M  4.2G

kbe create

Create a named boot environment snapshot. Use this before any risky operation — not just upgrades, but also configuration changes, service deployments, or experimental package installations:

# Before a major configuration change
$ sudo kbe create before-nginx-rewrite
[kbe] Snapshot created: rpool/ROOT/default@before-nginx-rewrite

# Before testing a new package
$ sudo kbe create before-experimental-driver
[kbe] Snapshot created: rpool/ROOT/default@before-experimental-driver

# Before a distro major version upgrade
$ sudo kbe create debian12-baseline
[kbe] Snapshot created: rpool/ROOT/default@debian12-baseline

kbe activate

Set which boot environment will be used on the next boot. This changes the bootfs property on the pool, which ZFSBootMenu reads to determine the default boot target:

# Switch to a previous boot environment
$ sudo kbe activate rpool/ROOT/default@before-nginx-rewrite
[kbe] Next boot will use: rpool/ROOT/default

# Reboot to apply
$ sudo reboot

kbe rollback

Destructively roll back to a snapshot. This discards all changes made after the snapshot was taken. kbe gives you a 5-second countdown to abort:

$ sudo kbe rollback rpool/ROOT/default@pre-upgrade-20260408-143022
[kbe] WARNING: Rolling back will DISCARD all changes made after
      rpool/ROOT/default@pre-upgrade-20260408-143022.
[kbe] Press Ctrl+C within 5 seconds to abort...
[kbe] Rollback complete. Reboot to apply.

$ sudo reboot
# System boots with the exact OS state from before the upgrade
# /home, /srv, /var/lib — all untouched

Rollback Is Instant

A ZFS rollback does not copy data. It moves a pointer. The rollback itself takes less than a second regardless of how many gigabytes changed since the snapshot. The only time cost is the reboot. Compare this to restoring a backup, which requires copying gigabytes of data from a backup medium, and which may take hours on a large system.

Rollback is moving a bookmark in a book. Restore from backup is retyping the entire chapter from a photocopy.

kbe delete

Remove a boot environment snapshot you no longer need. This frees the space consumed by blocks that were modified since the snapshot:

# Clean up old snapshots
$ sudo kbe delete rpool/ROOT/default@pre-upgrade-20260401-140000
[kbe] Deleted: rpool/ROOT/default@pre-upgrade-20260401-140000

# Check how much space was reclaimed
$ zfs list -o name,used,avail rpool
NAME    USED   AVAIL
rpool   4.8G   95.2G

7. Comparison: Traditional vs kldload Upgrades

Here is the difference laid out side by side. This is not a theoretical comparison — it is what actually happens on real systems when things go wrong.

Traditional Linux Upgrade

Before upgrade: No preparation. No snapshot. No safety net.

During upgrade: Package manager modifies the live root filesystem in-place. Files are overwritten as they are downloaded. Interruption = partial state.

After upgrade: Reboot and hope. If the kernel boots, SSH in and check that services are running. If it does not boot, find a rescue disk.

On failure: Boot from USB/CD. Mount the root filesystem manually. Chroot in. Try to downgrade packages (may not be possible if old versions are no longer in the repo). Reinstall if desperate. 1–4 hours of downtime. Data loss possible if the root filesystem was corrupted.

kldload Upgrade (kupgrade)

Before upgrade: Automatic ZFS snapshot of the root dataset. Zero additional disk space. Instant.

During upgrade: Package manager runs normally. All changes go to new blocks (copy-on-write). The snapshot preserves the old state regardless of what happens.

After upgrade: ZFS modules verified for every kernel. Secure Boot signatures refreshed. Clear log of everything that happened.

On failure: kbe rollback <snapshot> && reboot. Or select the old kernel from ZFSBootMenu. 10 seconds of downtime. Zero data loss. Your databases, containers, and home directories were never touched.

Kernel Panic After Upgrade

Traditional: Machine does not boot. No remote access. Drive to the datacenter or open a support ticket. Boot from USB. Manually roll back kernel packages. Regenerate GRUB config. Pray again.

kldload: ZFSBootMenu shows the old kernel. Select it. Press Enter. System boots. Run kbe rollback if you want to undo the entire upgrade, or keep the old kernel and debug the new one at your leisure.

ZFS Module Missing After Kernel Update

Traditional: Kernel boots but cannot mount root filesystem (because root is ZFS and zfs.ko is missing). Drops to initramfs emergency shell. Hope you know how to manually load modules or switch kernels from there.

kldload: kupgrade detected the missing module before you rebooted and warned you. If you rebooted anyway, ZFSBootMenu lets you select the old kernel that has working ZFS modules.

The difference is architectural, not cosmetic. Traditional Linux upgrades are unsafe because the filesystem model is unsafe — one partition, live mutation, no rollback. kldload upgrades are safe because the storage model is safe — copy-on-write, dataset separation, instant snapshots, and a boot loader that can switch between any boot environment without touching the disk. Safety is not a feature bolted on top. It is the foundation.

8. Real-World Scenarios

Theory is useful. Practice is what matters at 2am when your pager goes off. Here are real scenarios and exactly how to handle them with kldload's upgrade tooling.

Scenario 1: Kernel Upgrade Breaks NVIDIA Driver

You run kupgrade on a workstation with an NVIDIA GPU. The kernel updates from 6.8.0-41 to 6.8.0-45. The NVIDIA DKMS module fails to compile against the new kernel because NVIDIA has not released a driver compatible with the new kernel's internal API changes.

# kupgrade already created the snapshot and warned you:
# "ZFS DKMS build failed for kernel 6.8.0-45"

# Option A: Roll back the entire upgrade
$ sudo kbe rollback rpool/ROOT/default@pre-upgrade-20260408-143022
$ sudo reboot
# You are back on 6.8.0-41 with working NVIDIA and ZFS. Total time: 30 seconds.

# Option B: Keep the upgrade, boot the old kernel temporarily
# Select 6.8.0-41 from ZFSBootMenu at boot time.
# Wait for NVIDIA to release a compatible driver.
# Then: dkms autoinstall -k 6.8.0-45
# Switch to the new kernel when it works.

Scenario 2: Major Distro Version Upgrade (Debian 12 to 13)

Major version upgrades are the highest-risk operation in Linux system administration. Thousands of packages change simultaneously. Library ABIs break. Init system configurations change. Service names change. This is where boot environments truly shine.

# Step 1: Create a named boot environment
$ sudo kbe create debian12-final
[kbe] Snapshot created: rpool/ROOT/default@debian12-final

# Step 2: Update sources.list to trixie
$ sudo sed -i 's/bookworm/trixie/g' /etc/apt/sources.list

# Step 3: Run the upgrade
$ sudo apt update
$ sudo apt full-upgrade -y
# ... (this takes a while — hundreds of packages)

# Step 4: Reboot and test
$ sudo reboot

# Step 5a: Everything works — you're on Debian 13. Celebrate.
$ cat /etc/debian_version
13.0

# Step 5b: Something critical is broken (mail server, database, custom app)
$ sudo kbe rollback rpool/ROOT/default@debian12-final
$ sudo reboot
# You are back on Debian 12 exactly as it was.
# Fix the issue, research the migration path, try again later.
# Your data never moved. Your databases never changed.

Scenario 3: Failed Package Dependency Chain

You install a package that pulls in 30 dependencies, one of which conflicts with something already installed. The package manager resolves the conflict by removing a critical package. Your monitoring stack stops working.

# If you used kbe create before installing:
$ sudo kbe rollback rpool/ROOT/default@before-package-install
$ sudo reboot
# Every package is back to its previous state. The conflicting install never happened.

# If you forgot to snapshot (kupgrade does this automatically, but manual installs don't):
# You can still use ZFS snapshots directly:
$ zfs list -t snapshot -r rpool/ROOT/default
# Find the most recent snapshot before the bad install
$ sudo zfs rollback rpool/ROOT/default@pre-upgrade-20260408-143022
$ sudo reboot

Scenario 4: Testing a Bleeding-Edge Kernel

You want to test kernel 6.13-rc1 on your development machine to evaluate new eBPF features. This is an unsigned release candidate with no ZFS support yet.

# Step 1: Snapshot current state
$ sudo kbe create stable-6.8-baseline

# Step 2: Install the RC kernel
$ sudo dpkg -i linux-image-6.13.0-rc1_amd64.deb

# Step 3: Boot into it (select from ZFSBootMenu)
# Note: ZFS will NOT work on this kernel (no module). That's fine —
# ZFSBootMenu loads ZFS in initramfs before the kernel needs it.
# The root filesystem is already mounted by the time init starts.

# Step 4: Test your eBPF programs, benchmark performance, check hardware support

# Step 5: Done testing — remove the RC kernel
$ sudo dpkg -r linux-image-6.13.0-rc1
# Or just roll back:
$ sudo kbe rollback rpool/ROOT/default@stable-6.8-baseline
$ sudo reboot

Scenario 5: Production Server Fleet Upgrade

You manage 50 servers running CentOS Stream 9 with kldload. A critical OpenSSL CVE requires an immediate upgrade across the fleet.

# On each server (or via Ansible/SSH loop):
$ sudo kupgrade
# kupgrade:
#   1. Snapshots rpool/ROOT/default@pre-upgrade-20260408-...
#   2. Runs dnf upgrade -y (installs patched openssl)
#   3. Verifies kmod-zfs is intact (kABI-tracked, always passes)
#   4. Logs everything

# Verify:
$ rpm -q openssl
openssl-3.0.7-27.el9.x86_64  # patched version

# If any server has issues after reboot:
$ sudo kbe rollback rpool/ROOT/default@pre-upgrade-20260408-...
$ sudo reboot
# Server is back to pre-patch state in seconds.
# Investigate, fix, re-upgrade when ready.

The pattern is always the same: snapshot, upgrade, verify, keep or roll back. It does not matter whether you are patching OpenSSL on a production server, upgrading from Debian 12 to 13, testing an experimental kernel, or installing a risky package. The safety net is always there, it costs nothing to create, and it takes seconds to use. This is what infrastructure should feel like.