kldload kldload — your Linux re-packer your Linux re-packer — for freegt; kldload — infrastructure, your way — for freemdash; pick your distro, get ZFS on root

Build Your Own · Complete Walkthrough

Zero to Hero — from nothing to a running custom appliance, every single step.

This is the complete, end-to-end walkthrough. Every command. Every output. Every file. You start with nothing — no ISO, no machine, no infrastructure. You end with a custom ZFS-on-root Linux appliance deployed to production, a golden image for cloning, an unattended pipeline for repeatable installs, and a Packer template for CI/CD integration. Follow it top to bottom. No steps skipped. No "exercise left to the reader."

Impatient? Start here.

git clone https://github.com/kldload/kldload.git
cd kldload
cp kldload.env.example kldload.env
source kldload.env
bash builder/container-build.sh

Five lines. Come back in 10 minutes. You'll have a bootable ISO with ZFS on root.
The rest of this page is for people who want to understand what just happened.

This page is the complete build-to-production pipeline. Every phase is a real operation you'll run. No steps are skipped. No "see the docs for details." If you follow this page top to bottom, you'll have: a custom ISO, a deployed VM or bare metal machine, a golden image for cloning, an unattended pipeline, a Packer template, and verification scripts. You'll also understand every step because each one is explained.

More importantly: you'll understand how the image factory works. Phase 1 is prerequisites. Phase 2 is booting the live environment. Phase 3 is walking through the interactive installer. Phase 4 is first boot. Everything after that layers on top — customization, golden images, cloning, unattended installs, Packer, verification. One source. One build. Every platform. That's the image factory in practice.

1. Prerequisites

Before you touch anything, make sure your hardware and software are ready. kldload supports 8 distros across three profiles, but the live ISO itself is always CentOS Stream 9 running from RAM. The target machine — where the OS installs — has specific requirements.

Hardware requirements

Minimum (server profile)

CPU: x86_64 with UEFI firmware (no legacy BIOS)
RAM: 4 GB (2 GB for live ISO + 2 GB for install)
Disk: 20 GB (single disk, GPT partitioned)
Network: Optional — the darksite has everything for offline install
USB: 8 GB+ for the ISO (2.2 GB image, needs overhead)

Server profile installs headless SSH + ZFS + tools. No GUI. Runs on anything with UEFI.

Recommended (desktop profile)

CPU: 4+ cores, x86_64 with UEFI
RAM: 8 GB (GNOME needs headroom)
Disk: 40 GB+ (GNOME + Firefox + tools add up)
GPU: Any (NVIDIA needs drivers — toggle in the installer)
Network: Optional for install, needed for Arch (rolling release, no darksite)

Desktop profile adds GNOME, Firefox, and the full tool suite. 8 GB RAM gives GNOME breathing room.

Supported distros and what they require

Distro              Offline?   Darksite     Notes
─────────────────── ────────── ──────────── ───────────────────────────────────
CentOS Stream 9     Yes        RPM          Default. Most tested.
Debian 13 (trixie)  Yes        APT :3142    Darksite served via HTTP on port 3142
Ubuntu 24.04        Yes        APT :3143    Darksite served via HTTP on port 3143
Fedora 41           Yes        RPM          Uses dnf, same darksite as CentOS
RHEL 9              Yes        RPM          Needs activation key + org ID
Rocky Linux 9       Yes        RPM          Binary-compatible RHEL rebuild
Arch Linux          No         None         Rolling release — requires internet
Alpine Linux        Yes        APK          Core profile only

Download the ISO

Option A: Download a pre-built ISO

# Download the latest release
curl -L -o kldload-free-latest.iso https://dl.kldload.com/kldload-free-latest.iso
curl -L -o kldload-free-latest.iso.sha256 https://dl.kldload.com/kldload-free-latest.iso.sha256

# Verify the checksum
sha256sum -c kldload-free-latest.iso.sha256
# kldload-free-latest.iso: OK

# Check the size — should be ~2.2 GB for free edition
ls -lh kldload-free-latest.iso
# -rw-r--r-- 1 user user 2.2G kldload-free-latest.iso

Option B: Build from source

# Clone the repository
git clone https://github.com/kldload/kldload.git
cd kldload

# Full build from scratch (first time)
./deploy.sh clean
./deploy.sh builder-image
./deploy.sh build-debian-darksite    # ~15 min, cached after first run
./deploy.sh build-ubuntu-darksite    # ~15 min, cached after first run
PROFILE=desktop ./deploy.sh build    # ~10 min

# Incremental rebuild (skips darksites if cache exists)
PROFILE=server ./deploy.sh build     # ~5 min

# Output lands in live-build/output/
ls -lh live-build/output/*.iso

deploy.sh auto-detects podman or docker. No manual container commands needed.

Burn to USB or boot in a VM

Burn to USB (bare metal)

# IMPORTANT: verify the target device before writing
lsblk
# NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
# sda      8:0    1  14.9G  0 disk              ← this is the USB
# nvme0n1  0:0    0 476.9G  0 disk              ← this is NOT the USB

# Write the ISO — replace /dev/sda with YOUR USB device
sudo dd if=kldload-free-latest.iso of=/dev/sda bs=4M status=progress oflag=direct conv=fsync
sudo sync
sudo eject /dev/sda

# Plug the USB into the target machine
# Enter UEFI boot menu (usually F12, F2, or Del)
# Select the USB drive
# GRUB menu appears → "KLDload Live" → Enter

Boot in KVM (local testing)

# Create a 40 GB virtual disk
qemu-img create -f qcow2 /var/lib/libvirt/images/kldload-test.qcow2 40G

# Launch the VM with virt-install
virt-install \
  --name kldload-test \
  --ram 8192 \
  --vcpus 4 \
  --cpu host-passthrough \
  --os-variant centos-stream9 \
  --machine q35 \
  --boot uefi,loader.secure=no,cdrom,hd \
  --disk /var/lib/libvirt/images/kldload-test.qcow2,bus=virtio \
  --cdrom /path/to/kldload-free-latest.iso \
  --network default \
  --graphics vnc,listen=0.0.0.0 \
  --noautoconsole

# Connect to the VM console
virt-viewer kldload-test
# Or use VNC: vncviewer localhost:5900

Boot in Proxmox

# Upload ISO to Proxmox
scp kldload-free-latest.iso root@proxmox-host:/var/lib/vz/template/iso/

# Create the VM
pvesh create /nodes/pve/qemu \
  --vmid 200 \
  --name kldload-test \
  --memory 8192 \
  --cores 4 \
  --cpu host \
  --machine q35 \
  --bios ovmf \
  --efidisk0 local-lvm:1 \
  --tpmstate0 local-lvm:1,version=v2.0 \
  --scsi0 local-lvm:40 \
  --scsihw virtio-scsi-single \
  --ide2 local:iso/kldload-free-latest.iso,media=cdrom \
  --net0 virtio,bridge=vmbr0 \
  --serial0 socket \
  --boot order=ide2 \
  --agent enabled=1

# Start the VM
pvesh create /nodes/pve/qemu/200/status/start

# Open console in Proxmox web UI (noVNC)

Three paths to the same place. USB for bare metal, virt-install for local KVM testing, pvesh for Proxmox. The ISO does not care which one you choose. UEFI is UEFI. The live environment boots identically on all three. Pick whichever one matches your hardware.

The KVM path with --boot uefi,loader.secure=no is important. Secure Boot requires MOK enrollment for the ZFS kernel module, which is not yet automated. Disable it for now. Bare metal machines with Secure Boot enabled will need to enroll the MOK key during first boot.

2. Boot the live environment

When the ISO boots, GRUB loads the CentOS Stream 9 live environment from a squashfs image into RAM. This takes 15-30 seconds depending on media speed. What happens next depends on whether you chose the desktop or server profile at build time.

What happens at boot (step by step)

1. UEFI firmware loads GRUB from the ISO's EFI partition
2. GRUB menu: "KLDload Live" (default, 5 second timeout)
3. Linux kernel + initramfs load into RAM
4. initramfs mounts the squashfs image as the root filesystem
5. systemd starts — this IS a full CentOS Stream 9 system
6. ZFS kernel module loads (zfs.ko, compiled during ISO build)
7. NetworkManager starts — DHCP on all interfaces
8. kldload-webui.service starts — Python WebSocket server on port 8080
9. Desktop profile: GDM starts → autologin as 'live' → GNOME → Firefox opens http://localhost:8080
   Server profile: getty on tty1 → autologin as 'live' → banner with web UI URL

Live credentials

User: live     Password: live      # autologin, sudo NOPASSWD
User: root     Password: kldload   # SSH password auth enabled, root login disabled

# SSH into the live environment from another machine:
ssh live@192.168.122.XXX   # password: live

# The live user has full passwordless sudo:
sudo -i                    # instant root shell, no password prompt

The live environment is a full CentOS system running from RAM. You can install packages, run services, mount disks. It's a real OS, not a stripped-down rescue environment.

Accessing the web UI

# Desktop profile: Firefox opens automatically to:
http://localhost:8080

# Server profile: check the IP and connect from another machine:
ip addr show | grep 'inet '
# inet 192.168.122.45/24 brd 192.168.122.255 scope global dynamic noprefixroute ens3

# From your workstation:
firefox http://192.168.122.45:8080

# Or use curl to verify the web UI is running:
curl -s http://192.168.122.45:8080 | head -5
# <!DOCTYPE html>
# <html lang="en">
# ...

The live environment is not a thin installer. It is a complete CentOS Stream 9 system with ZFS loaded, WireGuard available, all kldload tools installed, and the full darksite embedded. You can poke around, inspect disks, test network connectivity, even mount ZFS pools from existing systems. The installer is just a web UI that drives the kldload-install-target script. Everything it does, you could do by hand from the command line.

3. Interactive install walkthrough

The web UI walks you through every decision. Here is exactly what each screen asks, what to select, and why. The install takes 3-8 minutes depending on profile and disk speed.

Screen 1: Choose your distro

Eight options: CentOS Stream 9, Debian 13, Ubuntu 24.04, Fedora 41, RHEL 9, Rocky Linux 9, Arch Linux, Alpine Linux.

Pick based on your needs:
  CentOS Stream 9  — enterprise stable, longest tested, best offline support
  Debian 13        — rock solid, APT ecosystem, huge package library
  Ubuntu 24.04     — Debian-based, familiar to most, PPAs available
  Fedora 41        — bleeding edge, newest kernel and packages
  RHEL 9           — Red Hat supported (requires activation key)
  Rocky Linux 9    — RHEL binary-compatible, free, enterprise stable
  Arch Linux       — rolling release, newest everything (requires internet)
  Alpine Linux     — minimal, musl-based, core profile only (containers, edge)

If you don't know which to pick: CentOS for servers, Debian for appliances, Ubuntu for developer workstations.

Screen 2: Choose your profile

Three profiles determine what gets installed on top of the base distro:

desktop  — GNOME desktop + Firefox + all kldload tools + ZFS
           Best for: workstations, dev machines, lab desktops
           Packages: gnome-shell, gdm, firefox, tmux, htop, btop, fzf, bat, eza...
           Size: ~6 GB installed

server   — headless SSH + all kldload tools + ZFS
           Best for: production servers, VMs, cloud instances
           Packages: openssh, chrony, tmux, htop, btop, fzf, bat, eza, podman...
           Size: ~3 GB installed

core     — ZFS on root only, stock distro, nothing extra
           Best for: minimal base images, containers, custom builds
           Packages: openssh, sudo, curl, vim, nftables, wireguard-tools
           Size: ~1.5 GB installed

Screen 3: Select target disk

The installer shows all block devices. Pick the disk where the OS will be installed. This disk will be wiped entirely.

# The web UI lists available disks. Example:
/dev/vda    40G   QEMU HARDDISK         ← typical KVM/Proxmox disk
/dev/sda    477G  Samsung SSD 870 EVO    ← typical bare metal SSD
/dev/nvme0n1 1TB  WD Black SN850X       ← NVMe drive

# For mirror topology, select a second disk:
/dev/sdb    477G  Samsung SSD 870 EVO    ← mirror pair

Screen 4: ZFS topology

Choose how ZFS organizes your disks:

single         — one disk, no redundancy (default for VMs)
mirror         — two disks, identical copies (recommended for production)
raidz1         — three+ disks, one parity (capacity-optimized redundancy)
mirror-stripe  — four+ disks in striped mirrors (performance + redundancy)

For a single VM or test: use single. For production bare metal: use mirror. The topology is set at install time and cannot be changed later without a reinstall.

Screen 5: Hostname, user, password, SSH key

Hostname:    web-prod-01              # sets /etc/hostname
Username:    admin                    # non-root user with passwordless sudo
Password:    ••••••••                 # password for the admin user
SSH Key:     ssh-ed25519 AAAA... user # optional — paste your public key
Timezone:    America/Vancouver        # TZ database name

Screen 6: Networking

DHCP (default):  No configuration needed. The installed system gets an IP from DHCP.

Static:
  Interface:  eth0 / ens3 / enp1s0    # shown dynamically based on detected NICs
  IP Address: 10.0.1.50
  Prefix:     24
  Gateway:    10.0.1.1
  DNS:        1.1.1.1, 8.8.8.8

Screen 7: Optional features

ZFS Encryption:     Off (default) — enables native ZFS encryption, prompts for passphrase on boot
KVM / libvirt:      Off (default) — installs QEMU/KVM + libvirt for running VMs on the host
eBPF tools:         Off (default) — installs bcc, bpftrace, libbpf for kernel tracing
NVIDIA drivers:     Off (default) — installs NVIDIA proprietary drivers from RPMFusion/PPA
WireGuard:          Off (default) — pre-configures a WireGuard interface (need keys later)
Export format:      None (default) — qcow2, vmdk, vhd, ova, raw for golden image export

Turn on only what you need. Each feature adds packages and increases install time. You can always add features later with the package manager.

Screen 8: Review and install

The UI shows a summary of all selections. Click Install. The installer runs kldload-install-target with your choices as environment variables. Progress streams back to the browser via WebSocket.

# What happens during install (watch the progress bar):
[partition]   Wiping /dev/vda, creating GPT: ESP (512M) + ZFS partition
[zfs]         Creating pool: rpool, datasets: ROOT, home, srv, var, var/log
[bootstrap]   Installing base packages via dnf/debootstrap/pacstrap (from darksite)
[configure]   Setting hostname, timezone, locale, keyboard, fstab, networking
[users]       Creating admin user, setting password, adding SSH key, sudoers
[bootloader]  Installing ZFSBootMenu, building initramfs with ZFS module
[features]    Installing optional packages (KVM, eBPF, NVIDIA, etc.)
[postinstall] Running postinstall.sh (if present)
[cleanup]     Unmounting filesystems, exporting ZFS pool
[done]        "Remove the USB / ISO and reboot when ready."

The install is deterministic. Same inputs produce the same system every time. The web UI is a thin wrapper around kldload-install-target, which sources nine bash libraries from /usr/lib/kldload-installer/lib/. Each library handles one concern: common.sh (logging), storage-zfs.sh (partitioning + pool creation), bootstrap.sh (package install), bootloader.sh (ZFSBootMenu + initramfs), profiles.sh (profile-specific packages), networking.sh (static/DHCP config), and so on. If you want to understand exactly what the installer does, cat each library. They are short, commented, and readable.

4. First boot

Remove the USB or ISO. Reboot. The machine boots from disk for the first time. Here is exactly what happens and what to verify.

Boot sequence on first boot

1. UEFI firmware reads the EFI boot entry "KLDload" from the ESP
2. ZFSBootMenu loads — scans for ZFS pools with bootable datasets
3. ZFSBootMenu finds rpool/ROOT/<hostname> — boots it automatically (3 sec timeout)
4. Kernel loads with initramfs → ZFS module loads → root pool imports → pivot_root
5. systemd starts — full boot to multi-user.target (server) or graphical.target (desktop)
6. sshd starts — you can SSH in immediately
7. zfs-zed starts — ZFS Event Daemon monitors pool health
8. sanoid.timer starts — automatic snapshot schedule begins
9. chronyd starts — NTP time sync
10. Desktop: GDM starts → login screen with your username

Verify the system is healthy

# Log in as your admin user (SSH or console)
ssh admin@192.168.122.45

# Run kst — the kldload status tool
sudo kst
# kldload (build 20260404)
# Pool    • rpool ONLINE (No known data errors)
# Root    1.54G used / 34.6G available (compression: 1.79x)
# Snapshots 0 total
# Boot envs 1 available
# Services  sshd  zfs-zed  sanoid.timer  chronyd

# Verify ZFS pool is healthy
sudo zpool status
#   pool: rpool
#  state: ONLINE
#   scan: none requested
# config:
#     NAME        STATE     READ WRITE CKSUM
#     rpool       ONLINE       0     0     0
#       vda2      ONLINE       0     0     0

# Verify ZFS datasets
zfs list
# NAME                        USED  AVAIL  REFER  MOUNTPOINT
# rpool                       1.54G 34.6G    96K  none
# rpool/ROOT                  1.52G 34.6G    96K  none
# rpool/ROOT/web-prod-01      1.52G 34.6G  1.52G  /
# rpool/home                  640K  34.6G   640K  /home
# rpool/srv                    96K  34.6G    96K  /srv
# rpool/var                   8.5M  34.6G    96K  /var
# rpool/var/log               8.4M  34.6G  8.4M   /var/log

# Verify the kernel module
lsmod | grep zfs
# zfs                  4358144  6
# zunicode              335872  1 zfs
# zzstd                 561152  1 zfs
# spl                   135168  1 zfs

WireGuard enrollment (if enabled)

# If you enabled WireGuard during install, the interface config is pre-staged
# but you still need to add keys and peers:

# Generate keys
wg genkey | sudo tee /etc/wireguard/wg0.key | wg pubkey | sudo tee /etc/wireguard/wg0.pub

# Edit the config
sudo vim /etc/wireguard/wg0.conf
# [Interface]
# Address = 10.78.0.5/24
# PrivateKey = <contents of wg0.key>
# ListenPort = 51820
#
# [Peer]
# PublicKey = <hub's public key>
# Endpoint = hub.example.com:51820
# AllowedIPs = 10.78.0.0/24
# PersistentKeepalive = 25

# Bring up the tunnel
sudo systemctl enable --now wg-quick@wg0

# Verify
sudo wg show
# interface: wg0
#   public key: xxxxx
#   private key: (hidden)
#   listening port: 51820

Monitoring setup (first boot)

# Sanoid is already configured and running — verify the timer:
systemctl status sanoid.timer
# sanoid.timer - Sanoid snapshot timer
#    Loaded: loaded
#    Active: active (waiting)

# Check the Sanoid config (auto-generated at install time):
cat /etc/sanoid/sanoid.conf
# [rpool/ROOT]
#     use_template = production
#     recursive = yes
# [template_production]
#     frequently = 0
#     hourly = 24
#     daily = 30
#     monthly = 3
#     yearly = 0
#     autosnap = yes
#     autoprune = yes

# ZFS Event Daemon monitors pool health:
systemctl status zfs-zed
# zfs-zed.service - ZFS Event Daemon
#    Active: active (running)

# If you enabled eBPF tools, verify they work:
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }' --timeout 2
# sshd /etc/ssh/sshd_config
# chronyd /etc/chrony.conf

First boot is where you verify the installer did its job. Pool online. Datasets mounted. Services running. ZFS module loaded. SSH reachable. If any of these are wrong, something went wrong during install. The kldload way is: verify before you move on. Take 60 seconds now. It saves hours later.

Sanoid runs on a timer, not a cron job. By default it takes 24 hourly snapshots, 30 daily, and 3 monthly. Autoprune is on — old snapshots get cleaned up automatically. You will never run out of disk space because of snapshots unless you are writing faster than ZFS can prune. This is the "free backup" that comes with every kldload install.

5. Post-install customization

The system is running. Now make it yours. Add packages, configure services, set up Sanoid replication, enable eBPF monitors. Every change you make here can be rolled back via ZFS snapshots.

Adding packages

# Take a snapshot before making changes (always)
sudo ksnap

# RPM distros (CentOS, Fedora, RHEL, Rocky):
sudo dnf install -y nginx postgresql-server redis

# Debian/Ubuntu:
sudo apt update && sudo apt install -y nginx postgresql redis-server

# Arch:
sudo pacman -S nginx postgresql redis

# Alpine (core profile):
sudo apk add nginx postgresql redis

# Verify the snapshot exists — you can roll back if anything breaks
sudo kbe list

Configuring services

# Enable and start services
sudo systemctl enable --now nginx
sudo systemctl enable --now postgresql
sudo systemctl enable --now redis

# Verify they're running
systemctl status nginx postgresql redis
# nginx.service - The nginx HTTP and reverse proxy server
#    Active: active (running)
# postgresql.service - PostgreSQL database server
#    Active: active (running)

# Check that ports are listening
ss -tlnp | grep -E '(80|5432|6379)'
# LISTEN  0  511  *:80    *:*  users:(("nginx",pid=1234,fd=6))
# LISTEN  0  244  *:5432  *:*  users:(("postgres",pid=1235,fd=5))
# LISTEN  0  511  *:6379  *:*  users:(("redis-server",pid=1236,fd=6))

Setting up Sanoid replication

# Sanoid is already snapshotting locally. Now add replication.
# Destination: a backup server with ZFS (another kldload node, a NAS, anything)

# Set up SSH key for automated replication
sudo ssh-keygen -t ed25519 -f /root/.ssh/syncoid_key -N ""
# Copy the public key to the backup server
sudo ssh-copy-id -i /root/.ssh/syncoid_key.pub root@backup-server

# Test initial replication (full send)
sudo syncoid --recursive --sshkey /root/.ssh/syncoid_key \
  rpool root@backup-server:tank/backups/$(hostname)

# Add a systemd timer for hourly incremental replication
sudo tee /etc/systemd/system/syncoid-backup.service <<'EOF'
[Unit]
Description=ZFS replication to backup server
After=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/sbin/syncoid --recursive --no-sync-snap \
  --sshkey /root/.ssh/syncoid_key \
  rpool root@backup-server:tank/backups/%H
EOF

sudo tee /etc/systemd/system/syncoid-backup.timer <<'EOF'
[Unit]
Description=Hourly ZFS replication

[Timer]
OnCalendar=hourly
Persistent=true

[Install]
WantedBy=timers.target
EOF

sudo systemctl enable --now syncoid-backup.timer

# Verify the timer is active
systemctl list-timers syncoid-backup.timer
# NEXT                        LEFT     LAST  PASSED  UNIT
# Sat 2026-04-04 15:00:00 UTC  42min    -     -       syncoid-backup.timer

Enabling eBPF monitors

# If you didn't enable eBPF during install, add it now:

# CentOS/Fedora/RHEL/Rocky:
sudo dnf install -y bcc-tools bpftrace

# Debian/Ubuntu:
sudo apt install -y bpfcc-tools bpftrace

# Useful one-liners:

# Watch all file opens in real time
sudo opensnoop-bpfcc

# TCP connection tracing
sudo tcpconnect-bpfcc

# Disk I/O latency histogram
sudo biolatency-bpfcc -D 5

# Slow syscalls (>1ms)
sudo bpftrace -e 'tracepoint:raw_syscalls:sys_exit /args->ret > 0/ { @[comm] = count(); }'

# ZFS ARC hit rate (custom)
sudo bpftrace -e 'kprobe:arc_read { @total++; } kprobe:arc_read_done { @hits++; } END { printf("ARC hit rate: %d%%\n", @hits * 100 / @total); }'

eBPF is kernel-level observability without kernel modules. It runs sandboxed programs inside the kernel. No reboots. No kernel recompiles. Attach, observe, detach.

Every customization follows the same pattern: snapshot first, make changes, verify, move on. If something breaks, roll back with kbe rollback and you're back to the exact state before the change. This is why ZFS on root matters for operations. You never fear making changes because undo is instant.

6. Building a golden image

A golden image is a fully configured system that you clone to create new machines. kldload has a built-in golden image workflow: install, configure, seal, export. The result is a cloud-init-ready disk image that can be used as a Packer base, a Proxmox template, or a direct hypervisor import.

Step 6.1: Install with export format

During the interactive install (or via answers file), select an export format. The installer will: install the OS, configure it, seal it for cloning, and export the disk image.

# In the web UI: set "Export Format" to qcow2 (or vmdk, vhd, ova, raw)
# Or via answers file:
KLDLOAD_EXPORT_FORMAT=qcow2

# The installer runs these phases automatically:
# 1. Normal install (partition, ZFS, bootstrap, configure, bootloader)
# 2. Seal: clear machine-id, remove SSH host keys, enable cloud-init
# 3. Export: qemu-img convert from ZFS pool to chosen format
# 4. Optional: SCP the image to a remote host

Step 6.2: What "seal" does

The k_seal_image_for_clone() function prepares the system for cloning by removing machine-specific state:

# What gets cleared (inside the installed rootfs):
/etc/machine-id              → truncated to empty (systemd regenerates on boot)
/var/lib/dbus/machine-id     → removed (regenerated from /etc/machine-id)
/etc/ssh/ssh_host_*          → removed (regenerated on first boot by sshd)
/var/log/*                   → cleared (fresh logs on each clone)
/tmp/*                       → cleared
/root/.bash_history          → cleared
cloud-init                   → installed and enabled with multi-datasource config

# The result: a generic system image that re-personalizes on first boot.
# Cloud-init reads from: NoCloud, GCE, AWS EC2, Azure, ConfigDrive, and OpenStack.
# Whichever datasource is present wins.

Step 6.3: Export with SCP upload

# Configure SCP upload in the web UI or answers file:
KLDLOAD_EXPORT_FORMAT=qcow2
KLDLOAD_EXPORT_SCP_HOST=images.example.com
KLDLOAD_EXPORT_SCP_USER=root
KLDLOAD_EXPORT_SCP_PATH=/var/lib/images/
KLDLOAD_EXPORT_SCP_KEY=/root/.ssh/id_ed25519

# The installer will:
# 1. Install + seal + export to qcow2
# 2. SCP the image to images.example.com:/var/lib/images/
# 3. Print the remote path and image size

# Result on remote server:
ssh root@images.example.com ls -lh /var/lib/images/
# -rw-r--r-- 1 root root 1.8G kldload-web-prod-01.qcow2

Step 6.4: Manual golden image (from a running system)

Already have a running kldload system you want to turn into a golden image? Use kexport:

# Take a clean snapshot first
sudo ksnap

# Export the current system as a qcow2
sudo kexport --format qcow2 --output /tmp/golden.qcow2

# What kexport does:
# 1. Seals the image (machine-id, SSH keys, cloud-init)
# 2. Exports the ZFS pool to a raw block device
# 3. Converts to the requested format via qemu-img
# 4. Reports the output path and SHA256

# Verify the image
qemu-img info /tmp/golden.qcow2
# image: /tmp/golden.qcow2
# file format: qcow2
# virtual size: 40 GiB
# disk size: 1.8 GiB

The golden image workflow is the core of the image factory model. You build once, export once, and deploy the result everywhere. The seal step is what makes this work — it strips machine-specific identity from the image so that each clone can re-personalize itself via cloud-init. Without sealing, every clone would have the same SSH host keys, the same machine-id, and the same hostname. That causes real problems in production: duplicate DHCP leases, SSH key conflicts, systemd journal collisions.

7. Deploying clones

You have a golden image. Now stamp it onto 10, 50, or 500 machines. Different methods for different platforms, same result everywhere.

ZFS clone (KVM hosts with ZFS)

# If your KVM host runs ZFS, cloning is instant and free:

# Import the golden image as a ZFS volume
sudo qemu-img convert -f qcow2 -O raw golden.qcow2 /dev/zvol/tank/vms/golden

# Snapshot it
sudo zfs snapshot tank/vms/golden@base

# Clone 10 VMs — instant, zero extra disk space
for i in $(seq 1 10); do
  name="node-$(printf '%02d' $i)"
  sudo zfs clone tank/vms/golden@base "tank/vms/${name}"

  # Create a VM using the clone as its disk
  virt-install \
    --name "$name" \
    --ram 4096 --vcpus 2 \
    --cpu host-passthrough \
    --os-variant centos-stream9 \
    --machine q35 \
    --boot uefi,loader.secure=no \
    --disk "/dev/zvol/tank/vms/${name}",bus=virtio \
    --network default \
    --graphics none \
    --noautoconsole \
    --import
done

# Each clone uses copy-on-write — only changed blocks cost space.
# 10 VMs from a 1.8 GB image: still ~1.8 GB total on disk.

Proxmox template clone

# Convert the golden VM to a template
pvesh create /nodes/pve/qemu/200/template

# Clone 10 VMs from the template
for i in $(seq 1 10); do
  vmid=$((200 + i))
  name="node-$(printf '%02d' $i)"
  pvesh create /nodes/pve/qemu/200/clone \
    --newid "$vmid" \
    --name "$name" \
    --full true
  # Start each clone
  pvesh create /nodes/pve/qemu/${vmid}/status/start
done

# Cloud-init re-personalizes each clone on first boot:
# — new machine-id
# — new SSH host keys
# — hostname from Proxmox VM name

Cloud deployment (qcow2 to AMI/VHD/GCP)

# AWS: convert and import
qemu-img convert -f qcow2 -O raw golden.qcow2 golden.raw
aws s3 cp golden.raw s3://my-images/golden.raw
aws ec2 import-image \
  --description "kldload golden image" \
  --disk-containers "Description=kldload,Format=raw,UserBucket={S3Bucket=my-images,S3Key=golden.raw}"

# Azure: convert to VHD
qemu-img convert -f qcow2 -O vpc -o subformat=fixed,force_size golden.qcow2 golden.vhd
az storage blob upload --account-name myaccount --container-name images \
  --type page --file golden.vhd --name golden.vhd
az image create --name kldload-golden --resource-group mygroup \
  --source "https://myaccount.blob.core.windows.net/images/golden.vhd" --os-type Linux

# GCP: convert to raw, tar, upload
qemu-img convert -f qcow2 -O raw golden.qcow2 disk.raw
tar czf golden.tar.gz disk.raw
gsutil cp golden.tar.gz gs://my-images/
gcloud compute images create kldload-golden \
  --source-uri gs://my-images/golden.tar.gz --guest-os-features UEFI_COMPATIBLE

# Now launch 10 instances from the golden image on any cloud
aws ec2 run-instances --image-id ami-XXXXX --count 10 --instance-type m5.large

ZFS clones are the magic trick here. On a KVM host with ZFS, cloning a VM is a metadata operation. It takes less than a second regardless of image size. The clone shares all blocks with the original — only writes after the clone point cost additional space. This means you can run 50 VMs from a single golden image and pay for storage only once (plus deltas). No other filesystem does this. ext4 needs a full copy. XFS needs a full copy. Even btrfs clones are slower and less reliable.

8. Unattended path

Same build, fully automated. Write an answers file, put it on a FAT32 USB labeled KLDLOAD-SEED, boot alongside the ISO. Zero interaction. The machine installs itself and powers off (or reboots, if auto-install mode is set).

Complete answers.env example

# answers.env — drop this on a FAT32 USB labeled KLDLOAD-SEED
# ═══════════════════════════════════════════════════════════════

# ── Core ───────────────────────────────────────────────────────
KLDLOAD_DISTRO=debian
KLDLOAD_PROFILE=server
KLDLOAD_DISK=/dev/sda
KLDLOAD_HOSTNAME=web-prod-01
KLDLOAD_USERNAME=admin
KLDLOAD_PASSWORD='correct-horse-battery-staple'
KLDLOAD_TIMEZONE=America/Vancouver
KLDLOAD_LOCALE=en_US.UTF-8
KLDLOAD_KEYBOARD_LAYOUT=us

# ── SSH ────────────────────────────────────────────────────────
KLDLOAD_SSH_PUBKEY="ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIExampleKeyHere ops@infra"
KLDLOAD_ADMIN_SSH_PUBKEY="ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAISecondKeyHere admin@workstation"

# ── ZFS ────────────────────────────────────────────────────────
KLDLOAD_ZFS_TOPOLOGY=mirror
KLDLOAD_ZFS_DATA_DISKS=/dev/sdb
KLDLOAD_ZFS_ENCRYPT=0
KLDLOAD_FORCE_WIPE=1

# ── Networking ─────────────────────────────────────────────────
KLDLOAD_NET_METHOD=static
KLDLOAD_NET_IFACE=ens3
KLDLOAD_NET_IP=10.0.1.50
KLDLOAD_NET_PREFIX=24
KLDLOAD_NET_GW=10.0.1.1
KLDLOAD_NET_DNS=1.1.1.1,8.8.8.8

# ── Features ───────────────────────────────────────────────────
KLDLOAD_ENABLE_KVM=0
KLDLOAD_ENABLE_EBPF=1
KLDLOAD_ENABLE_AI=0
KLDLOAD_NVIDIA_DRIVERS=0
KLDLOAD_WIREGUARD=1

# ── Packages ───────────────────────────────────────────────────
KLDLOAD_EXTRA_PACKAGES="nginx postgresql redis-server"
KLDLOAD_KEEP_DARKSITE=0

# ── Export (golden image) ──────────────────────────────────────
KLDLOAD_EXPORT_FORMAT=qcow2
KLDLOAD_EXPORT_SCP_HOST=images.example.com
KLDLOAD_EXPORT_SCP_USER=root
KLDLOAD_EXPORT_SCP_PATH=/var/lib/images/
KLDLOAD_EXPORT_SCP_KEY=/root/.ssh/id_ed25519

Create the seed USB

# Format a USB drive as FAT32 with the magic label
sudo mkfs.vfat -n KLDLOAD-SEED /dev/sdc1

# Mount and write the answers file
sudo mount /dev/sdc1 /mnt
sudo cp answers.env /mnt/answers.env
sudo umount /mnt

# Now you have two USB drives:
# 1. The kldload ISO (boot drive)
# 2. The seed USB (answers file)

# Plug both into the target machine. Boot from the ISO.
# The autoinstall service detects the seed disk, reads answers.env,
# and runs the installer with zero interaction.

WebSocket API (network-based unattended install)

# Skip the USB. Send install commands over the network.
# Boot machines from the ISO (IPMI virtual media, PXE, whatever).
# Then trigger the install via WebSocket:

#!/usr/bin/env python3
import asyncio, websockets, json

async def install(host, config):
    async with websockets.connect(f"ws://{host}:8080/ws") as ws:
        await ws.send(json.dumps({"action": "install", **config}))
        async for msg in ws:
            data = json.loads(msg)
            print(f"[{host}] [{data.get('phase','')}] {data.get('message','')}")
            if data.get('status') == 'complete':
                break

config = {
    "distro": "debian",
    "disk": "/dev/sda",
    "hostname": "web-prod-01",
    "username": "admin",
    "password": "changeme",
    "profile": "server",
    "timezone": "America/Vancouver",
    "ssh_pubkey": "ssh-ed25519 AAAA... user@host",
    "zfs_topology": "mirror",
    "zfs_data_disks": "/dev/sdb",
    "net_method": "static",
    "net_ip": "10.0.1.50",
    "net_prefix": "24",
    "net_gw": "10.0.1.1",
    "net_dns": "1.1.1.1,8.8.8.8"
}

# Install 10 machines in parallel
hosts = [f"10.0.1.{i}" for i in range(50, 60)]
configs = [{**config, "hostname": f"web-prod-{i:02d}", "net_ip": f"10.0.1.{i}"} for i in range(50, 60)]
asyncio.run(asyncio.gather(*[install(h, c) for h, c in zip(hosts, configs)]))

PXE boot with HTTP answers

# For large deployments with existing PXE infrastructure:

# 1. Extract the kernel and initramfs from the ISO
mount -o loop kldload-free-latest.iso /mnt
cp /mnt/images/pxeboot/vmlinuz /tftpboot/kldload/
cp /mnt/images/pxeboot/initrd.img /tftpboot/kldload/
umount /mnt

# 2. Add a PXE menu entry
cat >> /tftpboot/pxelinux.cfg/default <<'EOF'
LABEL kldload
  MENU LABEL KLDload Install
  KERNEL kldload/vmlinuz
  INITRD kldload/initrd.img
  APPEND root=live:CDLABEL=KLDLOAD rd.live.image rd.live.overlay.overlayfs=1 kldload.answers=http://10.0.1.1/answers/answers.env
EOF

# 3. Serve the answers file via HTTP
# The installer downloads it on boot instead of reading from USB
cp answers.env /var/www/html/answers/answers.env

Three delivery mechanisms for unattended install: USB seed disk, WebSocket API, PXE+HTTP. Same ISO. Same installer. Same answers format. The only difference is how the answers reach the machine. USB for air-gapped environments. WebSocket for machines already booted into the live ISO. PXE for datacenter-scale deployments with existing network boot infrastructure. Pick the one that matches your environment.

The USB approach is underrated. No network infrastructure required. No PXE server. No TFTP. No DHCP options. Write a file to a USB stick. Plug it in. Boot. Walk away. It is the simplest possible automation and it works in environments where nothing else does — air-gapped facilities, remote sites, field deployments.

9. Packer path

Packer automates the process of building golden images. Instead of manually booting the ISO and clicking through the web UI, Packer launches a VM, sends the install commands, waits for completion, and exports the result. This is how you integrate kldload into a CI/CD pipeline.

Complete Packer HCL template

# kldload.pkr.hcl — build a kldload golden image with Packer

packer {
  required_plugins {
    qemu = {
      version = ">= 1.0.0"
      source  = "github.com/hashicorp/qemu"
    }
  }
}

variable "iso_url" {
  type    = string
  default = "live-build/output/kldload-free-centos-amd64-20260404.iso"
}

variable "iso_checksum" {
  type    = string
  default = "sha256:1080f7917d61aabe3c6fd6aeac4..."
}

source "qemu" "kldload" {
  iso_url          = var.iso_url
  iso_checksum     = var.iso_checksum
  output_directory = "output-kldload"
  vm_name          = "kldload-golden.qcow2"

  # VM hardware
  cpus             = 4
  memory           = 8192
  disk_size        = "40G"
  format           = "qcow2"
  accelerator      = "kvm"
  machine_type     = "q35"

  # UEFI boot
  qemuargs = [
    ["-bios", "/usr/share/edk2/ovmf/OVMF_CODE.fd"],
    ["-drive", "if=pflash,format=raw,readonly=on,file=/usr/share/edk2/ovmf/OVMF_CODE.fd"],
    ["-drive", "if=pflash,format=raw,file=output-kldload/OVMF_VARS.fd"]
  ]

  # Network — Packer connects via SSH after install
  ssh_username     = "admin"
  ssh_password     = "changeme"
  ssh_timeout      = "20m"
  ssh_port         = 22
  headless         = true

  # Boot command — wait for GRUB, then trigger install via WebSocket
  boot_wait        = "30s"
  boot_command     = ["<enter><wait120s>"]

  # Shutdown after provisioning
  shutdown_command  = "sudo shutdown -P now"
}

build {
  sources = ["source.qemu.kldload"]

  # Wait for the live ISO to boot, then trigger unattended install
  provisioner "shell-local" {
    inline = [
      "sleep 60",
      "python3 -c \"",
      "import asyncio, websockets, json",
      "async def install():",
      "    async with websockets.connect('ws://localhost:{{ .SSHPort }}-8080/ws') as ws:",
      "        await ws.send(json.dumps({",
      "            'action': 'install',",
      "            'distro': 'centos',",
      "            'disk': '/dev/vda',",
      "            'hostname': 'golden',",
      "            'username': 'admin',",
      "            'password': 'changeme',",
      "            'profile': 'server'",
      "        }))",
      "        async for msg in ws:",
      "            data = json.loads(msg)",
      "            if data.get('status') == 'complete': break",
      "asyncio.run(install())",
      "\""
    ]
  }

  # Post-install provisioning (runs via SSH after reboot)
  provisioner "shell" {
    inline = [
      "sudo dnf install -y nginx",
      "sudo systemctl enable nginx",
      "sudo ksnap"
    ]
  }

  # Seal the image for cloning
  provisioner "shell" {
    inline = [
      "sudo truncate -s 0 /etc/machine-id",
      "sudo rm -f /etc/ssh/ssh_host_*",
      "sudo rm -f /var/lib/dbus/machine-id",
      "sudo cloud-init clean --logs",
      "sudo fstrim -av"
    ]
  }
}

Run the Packer build

# Initialize Packer plugins
packer init kldload.pkr.hcl

# Validate the template
packer validate kldload.pkr.hcl

# Build the golden image
packer build kldload.pkr.hcl

# Output:
# ==> qemu.kldload: Creating hard drive output-kldload/kldload-golden.qcow2
# ==> qemu.kldload: Booting from ISO...
# ==> qemu.kldload: Waiting for SSH to become available...
# ==> qemu.kldload: Connected to SSH!
# ==> qemu.kldload: Provisioning with shell script...
# ==> qemu.kldload: Gracefully halting virtual machine...
# Build 'qemu.kldload' finished after 12 minutes 34 seconds.
#
# ==> Wait completed after 12 minutes 34 seconds
# ==> Builds finished. The artifacts of successful builds are:
# --> qemu.kldload: output-kldload/kldload-golden.qcow2

ls -lh output-kldload/kldload-golden.qcow2
# -rw-r--r-- 1 user user 1.8G kldload-golden.qcow2

Simpler alternative: answers file + boot_command

Instead of the WebSocket approach, embed the answers file in the ISO or use the HTTP method:

# Create a seed ISO with the answers file
mkdir -p /tmp/seed
cat > /tmp/seed/answers.env <<'EOF'
KLDLOAD_DISTRO=centos
KLDLOAD_PROFILE=server
KLDLOAD_DISK=/dev/vda
KLDLOAD_HOSTNAME=golden
KLDLOAD_USERNAME=admin
KLDLOAD_PASSWORD=changeme
KLDLOAD_FORCE_WIPE=1
EOF

# Create a seed ISO
genisoimage -V KLDLOAD-SEED -o seed.iso /tmp/seed/

# In the Packer template, attach as a second CD:
# cd_files = ["seed.iso"]
# The ISO boots, finds the seed disk, installs unattended.

Packer is the bridge between "I built an image by hand" and "my CI/CD pipeline builds images automatically." The Packer template encodes every decision you would make in the web UI. Run it on every commit, every release, every Tuesday at 3am. The output is a tested, sealed golden image ready for deployment. Same image, every time, no human involved.

The seed ISO approach is cleaner than the WebSocket approach for Packer. The answers file is a static artifact, easy to version control, easy to audit. The ISO boots, finds the answers, installs, powers off. Packer waits for SSH to come up after the reboot, runs any additional provisioners, seals, and exports. Clean pipeline.

10. Verification

Never trust. Always verify. Here is the complete verification checklist for a kldload system. Run these after every install, every clone, every upgrade.

Pool status

# The pool must be ONLINE with zero errors
sudo zpool status
# Expected: state: ONLINE, errors: No known data errors
# All vdevs should show 0 READ 0 WRITE 0 CKSUM

# Check pool health score
sudo zpool status -x
# all pools are healthy

# Verify datasets are mounted correctly
zfs list -o name,mountpoint,mounted
# NAME                     MOUNTPOINT  MOUNTED
# rpool/ROOT/web-prod-01   /           yes
# rpool/home               /home       yes
# rpool/srv                /srv        yes
# rpool/var                /var        yes
# rpool/var/log            /var/log    yes

# Verify compression is working
zfs get compressratio rpool
# NAME   PROPERTY       VALUE  SOURCE
# rpool  compressratio  1.79x  -

Service status

# Critical services that must be running
for svc in sshd zfs-zed sanoid.timer chronyd; do
  systemctl is-active "$svc" && echo "OK: $svc" || echo "FAIL: $svc"
done
# OK: sshd
# OK: zfs-zed
# OK: sanoid.timer
# OK: chronyd

# Desktop profile: also check
systemctl is-active gdm
# active

# Check for failed services
systemctl --failed
# Expected: 0 loaded units listed.

Network connectivity

# Verify IP address
ip -4 addr show scope global
# inet 10.0.1.50/24 brd 10.0.1.255 scope global ens3

# Verify default route
ip route show default
# default via 10.0.1.1 dev ens3 proto static metric 100

# Verify DNS resolution
dig +short google.com
# 142.250.80.46

# Verify SSH is listening
ss -tlnp | grep :22
# LISTEN 0 128 *:22 *:* users:(("sshd",pid=1234,fd=3))

# Verify firewall rules (nftables)
sudo nft list ruleset | head -20

# If WireGuard is configured:
sudo wg show
# interface: wg0
#   latest handshake: 3 seconds ago
#   transfer: 1.23 MiB received, 456.78 KiB sent

ZFS module verification

# Verify ZFS kernel module is loaded
lsmod | grep ^zfs
# zfs  4358144  6

# Verify module version matches userspace
cat /sys/module/zfs/version
# 2.2.7-1
zfs version
# zfs-2.2.7-1
# zfs-kmod-2.2.7-1

# Verify DKMS is tracking the module
dkms status
# zfs/2.2.7, 5.14.0-687.el9.x86_64, x86_64: installed

# Verify the initramfs has ZFS
lsinitrd /boot/initramfs-$(uname -r).img | grep zfs.ko
# -rw-r--r-- 1 root root 2490976 zfs.ko.xz

Monitoring verification

# Verify Sanoid is taking snapshots
sanoid --cron --verbose --readonly
# INFO: Snapshot rpool/ROOT/web-prod-01@autosnap_2026-04-04_14:00:00_hourly would be taken

# Check existing snapshots
zfs list -t snapshot -o name,creation,used | head -10

# Verify ZFS event daemon is watching for errors
journalctl -u zfs-zed --since "1 hour ago" --no-pager

# Quick disk health check
sudo smartctl -H /dev/sda   # SMART health (bare metal only)
# SMART overall-health self-assessment test result: PASSED

One-liner: full system verification

# Run all checks in one shot
sudo kst && \
  zpool status -x | grep -q "all pools are healthy" && echo "POOL: OK" && \
  systemctl is-active sshd zfs-zed sanoid.timer chronyd >/dev/null && echo "SERVICES: OK" && \
  ip route show default | grep -q via && echo "NETWORK: OK" && \
  lsmod | grep -q ^zfs && echo "ZFS MODULE: OK" && \
  echo "=== ALL CHECKS PASSED ==="

# Or use this script on multiple nodes:
for node in 10.0.1.{50..59}; do
  echo "--- $node ---"
  ssh -o ConnectTimeout=5 admin@$node "sudo kst && zpool status -x" 2>/dev/null || echo "UNREACHABLE"
done

Verification is not optional. Every kldload system should pass every check on this list. If any check fails, something is wrong and you should fix it before moving on. A pool with errors will lose data. A missing ZFS module will fail to boot after a kernel upgrade. A failed sanoid timer means no automatic snapshots — no safety net. These checks take 30 seconds. Running them saves hours of debugging later.

11. Common customizations

The ten most common post-install tasks with exact commands. Every customization starts with a snapshot so you can roll back.

1. Add a user

sudo useradd -m -s /bin/bash -G wheel deploy
echo 'deploy:secure-password-here' | sudo chpasswd
# Add SSH key
sudo mkdir -p /home/deploy/.ssh
echo "ssh-ed25519 AAAA... deploy@ci" | sudo tee /home/deploy/.ssh/authorized_keys
sudo chown -R deploy:deploy /home/deploy/.ssh
sudo chmod 700 /home/deploy/.ssh
sudo chmod 600 /home/deploy/.ssh/authorized_keys

2. Configure NFS exports

# Create a ZFS dataset for shared data
sudo zfs create -o compression=zstd -o sharenfs="rw=@10.0.1.0/24" rpool/shared

# Or use traditional exports
sudo dnf install -y nfs-utils    # or apt install nfs-kernel-server
echo '/srv/shared 10.0.1.0/24(rw,sync,no_subtree_check)' | sudo tee -a /etc/exports
sudo exportfs -ra
sudo systemctl enable --now nfs-server

# Verify
showmount -e localhost
# /srv/shared 10.0.1.0/24

3. Set up Docker / Podman on ZFS

# Create a ZFS dataset for container storage
sudo zfs create -o compression=zstd rpool/var/lib/containers
# or for Docker:
sudo zfs create -o compression=zstd rpool/var/lib/docker

# Podman is already installed on server/desktop profiles
podman run --rm hello-world

# For Docker (if you prefer it):
sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf install -y docker-ce docker-ce-cli containerd.io
sudo systemctl enable --now docker

# Configure Docker to use ZFS storage driver
sudo tee /etc/docker/daemon.json <<'EOF'
{
  "storage-driver": "zfs",
  "storage-opts": ["zfs.fsname=rpool/var/lib/docker"]
}
EOF
sudo systemctl restart docker

# Verify ZFS storage driver
docker info | grep "Storage Driver"
# Storage Driver: zfs

4. Enable NVIDIA GPU passthrough

# CentOS/RHEL/Rocky — install from RPMFusion
sudo dnf install -y epel-release
sudo dnf install -y https://mirrors.rpmfusion.org/free/el/rpmfusion-free-release-9.noarch.rpm \
  https://mirrors.rpmfusion.org/nonfree/el/rpmfusion-nonfree-release-9.noarch.rpm
sudo dnf install -y akmod-nvidia xorg-x11-drv-nvidia-cuda

# Wait for DKMS to compile the module (~2 min)
sudo dkms status
# nvidia/550.xxx, 5.14.0-687.el9.x86_64: installed

# Rebuild initramfs to include nvidia
sudo dracut -f

# Reboot and verify
sudo reboot
nvidia-smi
# +-----------------------------------------------+
# | NVIDIA-SMI 550.xxx   Driver: 550.xxx           |
# | GPU: NVIDIA GeForce RTX 4090                   |

5. Configure backup with Syncoid

# One-time setup: SSH key for automated replication
sudo ssh-keygen -t ed25519 -f /root/.ssh/syncoid -N ""
sudo ssh-copy-id -i /root/.ssh/syncoid.pub root@backup-server

# Test a manual sync
sudo syncoid --recursive --sshkey /root/.ssh/syncoid \
  rpool root@backup-server:tank/backups/$(hostname)

# Automate with systemd timer (see Section 5 above for full config)
sudo systemctl enable --now syncoid-backup.timer

6. Set up a web server with TLS

# Create a dataset for web content
sudo zfs create -o compression=zstd rpool/srv/www

# Install nginx and certbot
sudo dnf install -y nginx certbot python3-certbot-nginx
sudo systemctl enable --now nginx

# Get a TLS certificate
sudo certbot --nginx -d example.com -d www.example.com \
  --non-interactive --agree-tos --email admin@example.com

# Auto-renewal is configured automatically by certbot
sudo systemctl status certbot-renew.timer

7. Configure a firewall

# nftables is already installed. Create a basic ruleset:
sudo tee /etc/nftables/kldload.nft <<'EOF'
table inet filter {
  chain input {
    type filter hook input priority 0; policy drop;
    ct state established,related accept
    iif lo accept
    tcp dport 22 accept       comment "SSH"
    tcp dport 80 accept       comment "HTTP"
    tcp dport 443 accept      comment "HTTPS"
    udp dport 51820 accept    comment "WireGuard"
    icmp type echo-request accept
    counter drop
  }
  chain forward {
    type filter hook forward priority 0; policy drop;
  }
  chain output {
    type filter hook output priority 0; policy accept;
  }
}
EOF

sudo nft -f /etc/nftables/kldload.nft
sudo systemctl enable nftables

8. Set up KVM for running VMs

# Install KVM/libvirt (if not enabled during install)
sudo dnf install -y qemu-kvm libvirt virt-install
sudo systemctl enable --now libvirtd

# Create a ZFS dataset for VM images
sudo zfs create -o compression=zstd rpool/var/lib/libvirt/images

# Verify KVM is working
sudo virt-host-validate
# QEMU: Checking for hardware virtualization : PASS
# QEMU: Checking if device /dev/kvm exists   : PASS

9. Configure ZFS encryption on a dataset

# Create an encrypted dataset (separate from root)
sudo zfs create -o encryption=aes-256-gcm -o keylocation=prompt \
  -o keyformat=passphrase rpool/secrets
# Enter passphrase: ••••••••

# The dataset is encrypted at rest. Unlock on boot:
sudo zfs load-key rpool/secrets
sudo zfs mount rpool/secrets

# Verify encryption
zfs get encryption,keystatus rpool/secrets
# NAME           PROPERTY    VALUE          SOURCE
# rpool/secrets  encryption  aes-256-gcm    -
# rpool/secrets  keystatus   available      -

10. Set up Prometheus + Grafana monitoring

# Create datasets for monitoring data
sudo zfs create -o compression=zstd rpool/srv/prometheus
sudo zfs create -o compression=zstd rpool/srv/grafana

# Install node_exporter (metrics source)
sudo dnf install -y golang-github-prometheus-node-exporter
sudo systemctl enable --now node_exporter

# Run Prometheus + Grafana via podman (simplest path)
sudo podman run -d --name prometheus -p 9090:9090 \
  -v /srv/prometheus:/prometheus:Z \
  docker.io/prom/prometheus

sudo podman run -d --name grafana -p 3000:3000 \
  -v /srv/grafana:/var/lib/grafana:Z \
  docker.io/grafana/grafana

# Verify
curl -s http://localhost:9090/-/healthy  # Prometheus
curl -s http://localhost:3000/api/health # Grafana
# Open http://your-ip:3000 — default login admin/admin

12. Troubleshooting

Common issues during build, install, and first boot, with exact solutions.

Build: "No space left on device" during ISO build

# The build container needs ~15 GB of working space
# Docker/Podman defaults may not have enough overlay space

# Check available space in the container storage
df -h /var/lib/docker     # or /var/lib/containers for podman

# Fix: clean old images and containers
docker system prune -af   # or podman system prune -af

# Fix: increase Docker storage (if using a dedicated partition)
# Or symlink Docker storage to a larger disk:
sudo systemctl stop docker
sudo mv /var/lib/docker /big-disk/docker
sudo ln -s /big-disk/docker /var/lib/docker
sudo systemctl start docker

Build: ZFS DKMS fails to compile

# The build log shows: "DKMS make.log: Error 2"
# Usually a kernel header mismatch

# Check the build log inside the container
docker logs kldload-builder 2>&1 | grep -A 20 "DKMS"

# Fix: ensure kernel-devel matches the running kernel in the rootfs
# The build script installs kernel + kernel-devel together
# If the mirror has a newer kernel than kernel-devel, the build fails

# Solution: clean and rebuild
./deploy.sh clean
./deploy.sh builder-image
./deploy.sh build

Install: "No disks found" in the web UI

# The installer scans /sys/block for non-loop, non-ROM block devices
# VirtIO disks appear as /dev/vda, SCSI as /dev/sda, NVMe as /dev/nvme0n1

# Check from the live environment terminal:
lsblk
# If empty, the disk controller driver is missing from the live kernel

# Proxmox: make sure scsihw=virtio-scsi-single (not ide)
# KVM: use bus=virtio (not ide or sata)
# Bare metal: check BIOS settings — AHCI mode, not RAID mode

Install: "pool creation failed" or "vdev too small"

# ZFS requires at least ~2 GB for the pool after the ESP partition
# The installer creates a 512 MB ESP and uses the rest for ZFS

# Check that the disk is large enough:
lsblk -b /dev/vda | awk 'NR==2{print $4/1024/1024/1024 " GB"}'

# If the disk has existing partitions or ZFS labels:
# The installer should wipe them, but if it fails:
sudo wipefs -af /dev/vda
sudo sgdisk --zap-all /dev/vda
sudo zpool labelclear -f /dev/vda    # remove old ZFS labels

# Then retry the install

First boot: system drops to dracut emergency shell

# The initramfs couldn't import the ZFS pool
# Common causes:
# 1. ZFS module not in initramfs
# 2. Wrong pool name or dataset
# 3. Disk order changed (BIOS reordered devices)

# From the dracut shell, try:
modprobe zfs
zpool import -f rpool
exit   # dracut will retry booting

# If zfs module is missing from initramfs:
# Boot back to the ISO, mount the installed system, rebuild initramfs:
sudo zpool import -f -R /target rpool
sudo mount --bind /dev /target/dev
sudo mount --bind /proc /target/proc
sudo mount --bind /sys /target/sys
sudo chroot /target
dracut -f --kver $(ls /lib/modules/ | head -1)
exit
sudo zpool export rpool
sudo reboot

First boot: ZFSBootMenu doesn't appear

# UEFI firmware can't find the bootloader
# Check EFI boot entries from the live ISO:
efibootmgr -v
# Look for an entry named "KLDload" pointing to \EFI\kldload\vmlinuz.efi

# If missing, recreate the EFI entry:
sudo efibootmgr --create --disk /dev/vda --part 1 \
  --label "KLDload" --loader '\EFI\kldload\vmlinuz.efi'

# If the ESP is empty or corrupted:
# Mount it and check
sudo mount /dev/vda1 /mnt
ls -la /mnt/EFI/
# Should contain: kldload/ with vmlinuz.efi and initramfs.img

Runtime: ZFS module won't load after kernel upgrade

# After a kernel upgrade, DKMS must rebuild the ZFS module
# If it didn't, the module fails to load and ZFS is broken

# Check DKMS status
dkms status
# zfs/2.2.7, 5.14.0-687.el9.x86_64: installed
# zfs/2.2.7, 5.14.0-700.el9.x86_64: (missing)   ← problem

# Rebuild for the new kernel
sudo dkms install zfs/2.2.7 -k $(uname -r)

# Rebuild initramfs with the new module
sudo dracut -f

# If you're stuck on the broken kernel, rollback:
# Reboot → ZFSBootMenu → select the pre-upgrade boot environment → boot
# The old kernel + working ZFS module loads. Fix the DKMS issue from there.

Runtime: pool shows DEGRADED or FAULTED

# Check what's wrong
sudo zpool status -v
# Look for: state: DEGRADED or state: FAULTED
# Look for: vdevs with non-zero error counts

# DEGRADED mirror — one disk failed, data still accessible
# Replace the failed disk:
sudo zpool replace rpool /dev/sdb /dev/sdc   # sdc is the new disk
# Wait for resilver to complete:
sudo zpool status
# scan: resilver in progress, 45% done, 00:03:22 to go

# Scrub to verify data integrity
sudo zpool scrub rpool
# Wait for completion:
sudo zpool status
# scan: scrub repaired 0B, 100.00% done, no errors

Runtime: can't roll back — no snapshots

# If sanoid.timer isn't running, no automatic snapshots are being taken

# Check the timer
systemctl status sanoid.timer
# If inactive or failed:
sudo systemctl enable --now sanoid.timer

# Take an immediate manual snapshot
sudo ksnap

# Check that the Sanoid config exists
cat /etc/sanoid/sanoid.conf
# If missing, recreate it:
sudo tee /etc/sanoid/sanoid.conf <<'EOF'
[rpool/ROOT]
    use_template = production
    recursive = yes
[template_production]
    frequently = 0
    hourly = 24
    daily = 30
    monthly = 3
    yearly = 0
    autosnap = yes
    autoprune = yes
EOF

sudo systemctl restart sanoid.timer

Web UI: won't load on port 8080

# Check if the web UI service is running
systemctl status kldload-webui.service

# If failed, check the log
journalctl -u kldload-webui.service --no-pager -n 50

# Common cause: websockets Python module missing or wrong version
python3 -c "import websockets; print(websockets.__version__)"
# Needs v11+ with websockets.http11 API

# Fix: reinstall websockets
sudo pip3 install --force-reinstall 'websockets>=11'

# Restart the service
sudo systemctl restart kldload-webui.service

# Verify
curl -s http://localhost:8080 | head -1
# <!DOCTYPE html>

Most problems fall into three categories: build issues (disk space, package mirrors, DKMS compilation), install issues (disk detection, partition remnants, firmware configuration), and runtime issues (kernel upgrades breaking DKMS, pool degradation, missing snapshots). The fix for almost every runtime issue is the same: boot the last known-good snapshot via ZFSBootMenu and fix the problem from a working system. This is why boot environments and snapshots are not optional features — they are your escape hatch when things go wrong.

The complete picture

You started with nothing. You now have:

A custom Linux appliance with ZFS on root, boot environments, automatic snapshots, 30+ CLI tools, encrypted WireGuard networking, and your own packages. It is deployed to bare metal, KVM, Proxmox, or any cloud. You have a golden image for cloning, an unattended pipeline for repeatable installs, a Packer template for CI/CD integration, and verification scripts that prove every system is healthy. Every node is identical. Every node can roll back a bad upgrade in 15 seconds. Every node replicates to a backup server hourly. The whole thing is auditable bash scripts you can read and modify.

You built the image once. You deployed it everywhere. Same artifact. Every platform. Every time.

Total time from zero to production: ~45 minutes. 10 minutes to build the ISO. 5 minutes to boot and install. 5 minutes to verify. 10 minutes for golden image export. 10 minutes to deploy clones to your fleet. 5 minutes to set up unattended pipeline for next time. The rest of your career to appreciate not having to do it again.

You just did something that most organizations pay consultants to do over weeks. You built a custom Linux appliance from scratch with enterprise storage, encrypted networking, boot environments, automatic snapshots, and a full monitoring stack. You deployed it to every platform that matters. You built a golden image pipeline that produces identical clones. You wrote an unattended install pipeline and a Packer template for CI/CD. You have verification scripts that prove every system works. And you can read every line of code that made it happen.

The next time you need a machine — any machine, any distro, any platform — you don't start from scratch. You rebuild the ISO with your changes and deploy the image. Or you fire the Packer template. Or you plug in a seed USB and walk away. One source. One build. Every target. That's the image factory. That's kldload.

← Pick a recipe. Build a machine. Learn how it works. kldload + Packer + Terraform →