kldload builds the base. Automation layers on top. Never the other way around.
kldload is not an automation tool. It is an image factory. It produces fully-configured, ZFS-on-root base images that every automation tool in existence can consume — Packer, Terraform, Ansible, Salt, Puppet, cloud-init, shell scripts, or nothing at all. The ISO does attended and unattended installs. The answer file decides. The golden image workflow produces cloud-ready templates. Everything downstream — provisioning, configuration management, CI/CD — starts from a known-good base that kldload already built.
The thesis: Most automation starts with a blank OS and pushes configuration to it after the fact. kldload starts with a fully-configured image and pushes it everywhere. Image-based deployment is faster, more reproducible, and more reliable than configuration management alone. The image IS the configuration. The machine boots, it is already done. Your automation tools — if you use them — connect to a finished machine, not a blank one.
Packer does not replace kldload. Terraform does not replace kldload. Ansible does not replace kldload. kldload feeds INTO all of them. It is the first stage of the pipeline. Everything else is second stage.
I spent years doing it backwards. Install a minimal OS. Run Ansible against it. Wait 20 minutes for 400 tasks to converge. Hope nothing fails. Hope the package mirror is up. Hope the GPG keys have not rotated. Hope the Jinja template renders correctly on this specific version of this specific distro. Then do it again on the next machine. And the next. And the next.
The day I switched to image-based deployment, I deleted 3,000 lines of Ansible. The image boots in 15 seconds and is already configured. Ansible still runs — but it handles application-layer changes on top of a known-good base. 40 tasks instead of 400. 2 minutes instead of 20. Zero failures from package mirrors or GPG keys or template rendering. The base image is immutable. It was tested once, when it was built. It works the same every time.
Automation philosophy
There are two models for infrastructure automation. Configuration management starts with a bare OS and pushes state to it: install packages, write config files, restart services, repeat. Image-based deployment starts with a pre-built image that already contains everything and stamps it onto machines. kldload is firmly in the second camp. It produces images. Automation tools consume them.
Configuration management (push model)
Install a minimal OS. Point Ansible/Salt/Puppet at it. Wait for convergence. Every machine rebuilds itself from scratch every time. Package downloads, template rendering, service restarts — all happen on every run. Slow. Fragile. Depends on network, package mirrors, and the config management tool itself being reachable.
Image-based deployment (stamp model)
Build the image once. Test it. Stamp it onto every machine. Boot time is deploy time. No package downloads, no template rendering, no convergence. The image is the artifact. It was tested when it was built. It is identical on every machine. ZFS makes cloning free. kldload makes building free.
The ideal architecture uses both models together. kldload builds the base image (OS, ZFS, WireGuard, boot environments, package holds, snapshot timers). Configuration management handles the application layer on top (deploy your app, write its config, manage its secrets). The base never changes. The application layer changes constantly. Two layers, two tools, clean separation.
Where kldload fits in the pipeline
kldload is Stage 0. It produces the raw material — a bootable, ZFS-on-root, fully-configured base system. Everything else is downstream:
Stage 0: kldload ISO --> Install to disk / export golden image
Stage 1: Packer --> Layer application packages onto the golden image
Stage 2: Terraform --> Deploy the Packer artifact to infrastructure
Stage 3: Ansible --> Push runtime config, secrets, app deploys
Stage 4: Monitoring --> Observe everything with eBPF + Prometheus + Grafana
You can skip any stage. Use kldload alone with USB sticks for air-gapped deployments. Use kldload + Terraform for cloud. Use kldload + Ansible for existing infrastructure. The stages are independent. Mix and match.
The fundamental insight: Packer and Terraform cannot build what kldload builds. They do not know how to set up ZFS on root. They do not know how to configure boot environments. They do not know how to build a darksite. They do not know how to partition a disk with an ESP, set up ZFS pools, compile DKMS modules, and configure a bootloader that understands ZFS datasets. kldload does all of that. Packer and Terraform consume the result.
This is not a limitation. It is the correct architecture. The OS image factory is a specialized tool. The infrastructure orchestrator is a specialized tool. They do different jobs. Trying to make Terraform build an OS image is like trying to make a compiler also be a text editor. Use the right tool for each job.
Unattended installation
There are two ways to install: sit at the web UI and click through it, or write an answer file and never touch a keyboard. Same ISO. Same boot. Same result. The difference is whether a seed disk is present when the machine boots.
Most automation tools work backwards. You install the OS by hand, then push configuration to it with Ansible/Puppet/Salt/Chef after the fact. The OS is a blank canvas that gets painted by a remote orchestrator. That means: the orchestrator has to reach the machine (networking), authenticate to it (credentials), and run successfully (dependencies). Three things that can fail before your automation even starts.
kldload works forwards. The answer file is on a USB stick next to the ISO. The machine boots, finds the answers, installs itself. No network required. No orchestrator required. No credentials required. The machine configures itself from local data. When it comes up, it is already done. Your orchestrator — if you have one — connects to a finished machine, not a blank one.
This is the difference between push and pull. Push automation requires infrastructure to exist before it can create infrastructure. Pull automation requires a USB stick.
The answers file — complete reference
Seed disk: write a file, plug it in, walk away
Format a USB drive as FAT32, label it KLDLOAD-SEED, drop an answers.env file on it.
Insert it alongside the ISO. Boot. The system finds the seed disk, reads the answers,
installs to the target disk, and powers off. Zero interaction.
The answer file is environment variables. Every variable has a sane default. You only need to set what you want to change. Here is the complete reference:
# ═══════════════════════════════════════════════════════
# answers.env — kldload unattended install configuration
# ═══════════════════════════════════════════════════════
# ── Core ──────────────────────────────────────────────
KLDLOAD_DISTRO=debian # centos | debian | ubuntu | fedora | rhel | rocky | arch | alpine
KLDLOAD_PROFILE=server # desktop | server | core
KLDLOAD_DISK=/dev/sda # Target block device (wiped entirely)
KLDLOAD_HOSTNAME=web-prod-01 # System hostname
KLDLOAD_USERNAME=admin # Non-root admin user (gets passwordless sudo)
KLDLOAD_PASSWORD=changeme # Password for admin user
KLDLOAD_TIMEZONE=America/Vancouver # TZ database name
KLDLOAD_LOCALE=en_US.UTF-8 # System locale
KLDLOAD_KEYBOARD_LAYOUT=us # XKB keyboard layout
KLDLOAD_KEYBOARD_VARIANT= # XKB keyboard variant (optional)
# ── SSH ───────────────────────────────────────────────
KLDLOAD_SSH_PUBKEY="ssh-ed25519 AAAA... user@host"
KLDLOAD_ADMIN_SSH_PUBKEY="ssh-ed25519 AAAA... ops@infra"
KLDLOAD_GENERATE_SSH_KEY=1 # 1 = generate host keys on install
# ── ZFS ───────────────────────────────────────────────
KLDLOAD_STORAGE_MODE=zfs # zfs (only supported mode)
KLDLOAD_ZFS_TOPOLOGY=single # single | mirror | raidz1 | mirror-stripe
KLDLOAD_ZFS_ENCRYPT=0 # 1 = native ZFS encryption (passphrase prompt on boot)
KLDLOAD_ZFS_DATA_DISKS= # Additional disks for mirror/raidz (space-separated)
KLDLOAD_ZFS_SPECIAL_DISKS= # Special vdev (metadata) disks (space-separated)
KLDLOAD_ENABLE_ZFS=1 # Always 1 — ZFS is non-optional
KLDLOAD_FORCE_WIPE=1 # 1 = wipe disk without confirmation
# ── Networking ────────────────────────────────────────
KLDLOAD_NET_METHOD=dhcp # dhcp | static
KLDLOAD_NET_IFACE=eth0 # Interface for static config
KLDLOAD_NET_IP=10.0.1.50 # Static IP (required if static)
KLDLOAD_NET_PREFIX=24 # CIDR prefix
KLDLOAD_NET_GW=10.0.1.1 # Default gateway (required if static)
KLDLOAD_NET_DNS=1.1.1.1,8.8.8.8 # DNS servers (comma-separated)
# ── Features ──────────────────────────────────────────
KLDLOAD_ENABLE_KVM=1 # 1 = install libvirt + QEMU/KVM
KLDLOAD_ENABLE_EBPF=1 # 1 = install eBPF tools (bcc, bpftrace)
KLDLOAD_ENABLE_AI=0 # 1 = install AI/ML stack
KLDLOAD_NVIDIA_DRIVERS=0 # 1 = install NVIDIA drivers
KLDLOAD_WIREGUARD=0 # 1 = configure WireGuard interface
# ── Packages ──────────────────────────────────────────
KLDLOAD_EXTRA_PACKAGES= # Additional packages (comma or space separated)
KLDLOAD_KEEP_DARKSITE=0 # 1 = copy darksite to installed system
KLDLOAD_CUSTOM_MIRROR_URL= # Override default package mirror
# ── Cluster ───────────────────────────────────────────
KLDLOAD_INFRA_MODE=standalone # standalone | cluster
KLDLOAD_CLUSTER_CIDR= # WireGuard cluster CIDR (e.g., 10.100.0.0/24)
KLDLOAD_CLUSTER_DOMAIN=infra.local # Cluster DNS domain
KLDLOAD_CLUSTER_SIZE=16 # Max nodes in cluster
KLDLOAD_HUB_LAN= # Hub LAN interface for cluster
# ── Export (golden image) ─────────────────────────────
KLDLOAD_EXPORT_FORMAT=none # none | qcow2 | vmdk | vhd | ova | raw
KLDLOAD_EXPORT_SCP_HOST= # Remote host for SCP upload
KLDLOAD_EXPORT_SCP_USER=root # SCP user
KLDLOAD_EXPORT_SCP_PATH=/root/ # SCP destination path
KLDLOAD_EXPORT_SCP_KEY= # Path to SSH private key for SCP
KLDLOAD_EXPORT_SCP_PASS= # SSH password for SCP (if no key)
# ── Distro-specific ───────────────────────────────────
KLDLOAD_RELEASE=9 # CentOS/RHEL/Rocky release version
KLDLOAD_DEBIAN_SUITE=trixie # Debian suite (trixie, bookworm)
KLDLOAD_DEBIAN_MIRROR=https://mirror.it.ubc.ca/debian
# ── Bootloader ────────────────────────────────────────
KLDLOAD_BOOTLOADER_ID=KLDload # EFI boot entry name
That is the entire API for unattended deployment. Environment variables in a flat file on a FAT32 USB stick. No YAML. No Jinja templates. No 400-line kickstart file. No preseed with undocumented d-i directives. No curtin with YAML that changes syntax between Ubuntu releases.
Every variable has a sane default. A minimal answers file is three lines: distro, disk, hostname. Everything else falls to defaults — DHCP networking, server profile, UTC timezone, admin user with password "admin". For production you will set more, but for testing you can deploy a full ZFS-on-root system with three lines of configuration.
How seed disk detection works
The branch point: seed disk or human
kldload-autoinstall.service runs on every live boot. It scans all removable
media for a FAT32 partition labeled KLDLOAD-SEED containing answers.env.
If found: source the file, export every KLDLOAD_* variable, run the installer
with zero interaction. If not found: start the web UI at :8080 and wait for a human.
Same ISO. Same boot sequence. Same installer binary. The presence or absence of a seed
disk is the only branch point.
# Create a seed USB on any Linux machine:
mkfs.vfat -n KLDLOAD-SEED /dev/sdb1
mount /dev/sdb1 /mnt
cat > /mnt/answers.env << 'EOF'
KLDLOAD_DISTRO=rocky
KLDLOAD_DISK=/dev/nvme0n1
KLDLOAD_HOSTNAME=db-prod-01
KLDLOAD_PROFILE=server
KLDLOAD_USERNAME=sysadmin
KLDLOAD_PASSWORD='correct-horse-battery-staple'
KLDLOAD_TIMEZONE=America/Toronto
KLDLOAD_SSH_PUBKEY="ssh-ed25519 AAAA... ops@infra"
KLDLOAD_ZFS_TOPOLOGY=mirror
KLDLOAD_ZFS_DATA_DISKS=/dev/nvme1n1
KLDLOAD_NET_METHOD=static
KLDLOAD_NET_IP=10.0.1.50
KLDLOAD_NET_PREFIX=24
KLDLOAD_NET_GW=10.0.1.1
KLDLOAD_NET_DNS=10.0.1.1
EOF
umount /mnt
# Boot the machine with the kldload ISO + this USB.
# The machine installs itself. Zero interaction.
WebSocket API for scriptable installs
Skip the USB sticks entirely. The web UI exposes a WebSocket API on port 8080. Any script that speaks WebSocket can send install commands with JSON payloads. This is how you automate installation over the network when the machines are already booted into the live ISO.
#!/usr/bin/env python3
# install-remote.py — trigger unattended install via WebSocket
import asyncio, websockets, json
async def install(host):
async with websockets.connect(f"ws://{host}:8080/ws") as ws:
await ws.send(json.dumps({
"action": "install",
"distro": "debian",
"disk": "/dev/sda",
"hostname": "web-prod-01",
"username": "admin",
"password": "changeme",
"profile": "server",
"timezone": "America/Vancouver",
"ssh_pubkey": "ssh-ed25519 AAAA... user@host"
}))
# Stream install progress
async for msg in ws:
data = json.loads(msg)
print(f"[{data.get('phase','')}] {data.get('message','')}")
if data.get('status') == 'complete':
break
# Install 10 machines in parallel
hosts = [f"10.0.1.{i}" for i in range(50, 60)]
asyncio.run(asyncio.gather(*[install(h) for h in hosts]))
Want 50 machines? Write 50 answer files (one per hostname). Burn 50 USB sticks. Plug them in. Boot. Walk away. Come back to 50 installed machines with ZFS on root, WireGuard ready, snapshots running, boot environments configured. No PXE server. No TFTP. No DHCP options. No network boot infrastructure at all.
Or use the WebSocket API and install all 50 from a single laptop in your server room. Boot them from the ISO over IPMI virtual media, run the Python script, walk away. Same result, different delivery mechanism.
PXE boot workflow
For large deployments with existing PXE infrastructure, you can netboot the kldload live environment and supply answers via HTTP instead of USB:
# DHCP server config (ISC DHCP)
subnet 10.0.1.0 netmask 255.255.255.0 {
range 10.0.1.100 10.0.1.200;
option routers 10.0.1.1;
next-server 10.0.1.5; # TFTP server
filename "pxelinux.0"; # or shimx64.efi for UEFI
}
# TFTP: extract vmlinuz and initrd from the kldload ISO
mount -o loop kldload-free-1.0.2.iso /mnt/iso
cp /mnt/iso/isolinux/vmlinuz /tftpboot/
cp /mnt/iso/isolinux/initrd.img /tftpboot/
# PXE menu entry (pxelinux.cfg/default)
LABEL kldload
KERNEL vmlinuz
APPEND initrd=initrd.img root=live:http://10.0.1.5/kldload.squashfs \
rd.live.image rd.live.overlay.overlayfs \
kldload.answers=http://10.0.1.5/answers/answers.env
# HTTP server serves the squashfs and per-host answer files
# Use hostname-based answers: answers/${hostname}.env
# The installer checks KLDLOAD_ANSWERS_URL kernel parameter
Firstboot and systemd services
The machine finishes configuring itself on first power-on
kldload-firstboot.service runs once on the first boot of an installed system.
It reads the install manifest at /etc/kldload/install-manifest.env —
the record of every choice made during installation — and finishes what the installer started.
Package holds locked. Snapshot timers enabled. SSH keys generated. WireGuard interface ready.
Then the service disables itself. It never runs again.
Firstboot is where the answer file becomes permanent configuration. The installer writes the manifest. Firstboot reads it and acts. This is a clean separation: the installer puts files on disk, firstboot activates them. If firstboot fails, the manifest is still there — you can re-run it, inspect it, or fix whatever broke. Nothing is lost. Nothing is ephemeral. The state is on disk, in a file you can read.
What runs automatically
Systemd services and timers that handle the day-to-day without you.
/srv every 15 minutes. Service data is never more than 15 minutes stale.The package holds deserve explanation. The three most dangerous packages on a ZFS-on-root system are the kernel, the ZFS module, and the bootloader. If any of them update out of sync — new kernel without a matching ZFS module, new bootloader that does not know about ZFS — the machine will not boot. kldload holds all three. You upgrade them deliberately, with kupgrade, which snapshots first. You never wake up to a machine that auto-updated itself into a brick.
Golden image workflow
The golden image workflow is: install, configure, seal, export, clone.
kldload handles the first three stages automatically when you set KLDLOAD_EXPORT_FORMAT
in your answers file or select an export format in the web UI. The result is a cloud-init-ready
image in qcow2, vmdk, vhd, ova, or raw format.
The five stages of golden image production
Every golden image goes through the same lifecycle. kldload automates stages 1-4. Stage 5 is your deployment tool (Terraform, Packer, manual cloning).
Stage 1: Install kldload installs the OS to disk (ZFS on root, all packages)
Stage 2: Configure Users, SSH keys, networking, WireGuard, eBPF, services
Stage 3: Seal k_seal_image_for_clone() — clear machine-id, SSH host keys,
DHCP leases, cloud-init state. Enable cloud-init datasources.
Stage 4: Export kexport — export ZFS pool, qemu-img convert to target format
Stage 5: Deploy Clone/import the image on target infrastructure
What sealing does
The k_seal_image_for_clone() function prepares an installed system for
cloning by removing all machine-specific identity. Every clone gets unique identity
on first boot via cloud-init and systemd:
/etc/ssh/ssh_host_*). sshd-keygen regenerates unique keys on first boot. Without this, every clone has the same host key — a security disaster./var/lib/cloud/instances). cloud-init re-runs on first boot, applying new hostname, SSH keys, networking from whatever datasource is available./etc/kldload/install-manifest.env). Contains build-time passwords. Must not ship in a template image.Exporting with kexport
# Export from the command line (after install completes):
sudo kexport /dev/sda qcow2 /tmp/export/
# Exports the disk as: /tmp/export/kldload-debian-server.qcow2
# Export with custom name:
KEXPORT_NAME="debian13-base-v2" kexport /dev/sda qcow2 /tmp/export/
# Exports as: /tmp/export/debian13-base-v2.qcow2
# Supported formats:
# qcow2 — KVM/libvirt, Proxmox, OpenStack
# vmdk — VMware ESXi/Workstation
# vhd — Hyper-V, Azure
# ova — VMware/VirtualBox (OVF + vmdk in a tar)
# raw — Direct dd, bare metal, ZFS zvol import
Automated export via answers file
# answers.env — build and export a golden image automatically
KLDLOAD_DISTRO=rocky
KLDLOAD_DISK=/dev/sda
KLDLOAD_HOSTNAME=template
KLDLOAD_PROFILE=server
KLDLOAD_USERNAME=admin
KLDLOAD_PASSWORD=admin
# Export as qcow2 and SCP to the image server
KLDLOAD_EXPORT_FORMAT=qcow2
KLDLOAD_EXPORT_SCP_HOST=images.infra.local
KLDLOAD_EXPORT_SCP_USER=root
KLDLOAD_EXPORT_SCP_PATH=/var/lib/libvirt/images/
KLDLOAD_EXPORT_SCP_KEY=/root/.ssh/id_ed25519
# The installer will:
# 1. Install Rocky Linux with ZFS on root
# 2. Configure everything (users, SSH, networking, services)
# 3. Seal the image (clear machine-id, SSH keys, enable cloud-init)
# 4. Export to qcow2
# 5. SCP the qcow2 to images.infra.local:/var/lib/libvirt/images/
# Zero interaction. One USB stick. One boot cycle.
ZFS makes cloning free. A ZFS snapshot is O(1) — milliseconds regardless of image size. A ZFS clone is also O(1) — it shares all blocks with the parent until either writes new data. So you can produce one golden image with kldload, snapshot it, and clone it 1,000 times. Each clone takes less than a second to create and uses zero additional disk space until it diverges from the parent.
This is why kldload uses ZFS for everything, including the host hypervisor. You are not just getting a filesystem. You are getting an image distribution mechanism that is faster and more space-efficient than any purpose-built tool. zfs clone is faster than cp, faster than qemu-img create -b, faster than any linked clone mechanism in any hypervisor product. And the clone is a first-class dataset you can promote, snapshot, replicate, and encrypt independently.
Cloud-init integration
When kldload seals an image for export, it configures cloud-init with multi-datasource support. The image accepts configuration from any cloud platform (AWS, GCE, Azure, OpenStack) or from local datasources (NoCloud for KVM/Proxmox, ConfigDrive for bare metal). First-boot customization — hostname, users, SSH keys, networking, scripts — is handled by cloud-init, not by kldload. kldload builds the base. cloud-init personalizes the clone.
Datasource configuration
# /etc/cloud/cloud.cfg.d/99-kldload-datasource.cfg
# Written by k_seal_image_for_clone() during export
datasource_list: [ NoCloud, ConfigDrive, OpenStack, Azure, GCE, Ec2, None ]
This means a single kldload golden image works everywhere without modification. Deploy it on KVM with a NoCloud seed ISO, on Proxmox with cloud-init drive, on AWS with EC2 metadata, on Azure with Azure datasource. Same image. Different datasource. cloud-init handles it.
NoCloud seed ISO for KVM/Proxmox
# Create a NoCloud seed ISO for a specific VM:
mkdir -p /tmp/seed
cat > /tmp/seed/meta-data << 'EOF'
instance-id: web-prod-01
local-hostname: web-prod-01
EOF
cat > /tmp/seed/user-data << 'EOF'
#cloud-config
hostname: web-prod-01
fqdn: web-prod-01.infra.local
manage_etc_hosts: true
users:
- name: deploy
ssh_authorized_keys:
- ssh-ed25519 AAAA... deploy@ci
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
packages:
- nginx
- certbot
runcmd:
- systemctl enable --now nginx
- certbot --nginx -d web-prod-01.infra.local --agree-tos -m ops@infra.local
EOF
# Build the seed ISO
genisoimage -output /tmp/seed.iso -volid cidata -joliet -rock \
/tmp/seed/meta-data /tmp/seed/user-data
# Attach to VM as a CDROM
virsh attach-disk web-prod-01 /tmp/seed.iso sda \
--type cdrom --mode readonly --config
Proxmox cloud-init integration
# Import a kldload golden image into Proxmox as a template:
qm create 9000 --name kldload-rocky9-template --memory 4096 --cores 4 \
--net0 virtio,bridge=vmbr0
# Import the qcow2 as the VM's disk
qm importdisk 9000 /var/lib/images/kldload-rocky9.qcow2 local-zfs
# Attach the imported disk
qm set 9000 --scsihw virtio-scsi-single --scsi0 local-zfs:vm-9000-disk-0
# Add cloud-init drive
qm set 9000 --ide2 local-zfs:cloudinit
# Set boot order and convert to template
qm set 9000 --boot order=scsi0 --serial0 socket --vga serial0
qm template 9000
# Clone the template with cloud-init customization
qm clone 9000 101 --name web-prod-01 --full
qm set 101 --ciuser deploy --sshkeys /root/.ssh/authorized_keys \
--ipconfig0 ip=10.0.1.50/24,gw=10.0.1.1 --nameserver 10.0.1.1
qm start 101
The combination of kldload golden images + cloud-init gives you the same workflow as AWS AMIs, but on your own hardware. Build the image once (kldload). Store it as a template. Clone it. Customize the clone with cloud-init (hostname, SSH keys, networking, first-boot scripts). Boot. The VM is ready in 15 seconds. No Ansible run. No package downloads. No convergence. Just a clone of a known-good image, personalized by cloud-init.
This is exactly how AWS, GCE, and Azure work internally. They build golden images with Packer, store them as AMIs/images, launch instances from them, and personalize with cloud-init. kldload gives you the same workflow on bare metal, KVM, or Proxmox. The only difference is you own the image pipeline end to end.
Packer integration
Packer does not replace kldload. kldload is the Packer builder source. kldload builds the base image (OS + ZFS + boot environments + kldload tools). Packer takes that base and layers application-specific packages, configuration, and hardening on top. The result is a Packer artifact that is a kldload golden image with your application baked in.
Packer HCL: layer Nginx on a kldload base
# kldload-nginx.pkr.hcl
# Start from a kldload golden image, add Nginx + hardening
packer {
required_plugins {
qemu = {
version = ">= 1.1.0"
source = "github.com/hashicorp/qemu"
}
}
}
variable "base_image" {
type = string
default = "/var/lib/libvirt/images/kldload-rocky9-server.qcow2"
}
variable "output_dir" {
type = string
default = "/var/lib/libvirt/images/packer-output"
}
source "qemu" "kldload-nginx" {
# Use the kldload golden image as the base
disk_image = true
iso_url = var.base_image
iso_checksum = "none"
output_directory = var.output_dir
vm_name = "kldload-rocky9-nginx.qcow2"
format = "qcow2"
# VM configuration
memory = 4096
cpus = 4
accelerator = "kvm"
machine_type = "q35"
disk_size = "50G"
# Networking — cloud-init sets up SSH access
ssh_username = "admin"
ssh_password = "admin"
ssh_timeout = "5m"
shutdown_command = "sudo shutdown -h now"
# NoCloud seed ISO for cloud-init
cd_files = ["cloud-init/meta-data", "cloud-init/user-data"]
cd_label = "cidata"
# QEMU flags for ZFS
qemuargs = [
["-cpu", "host"],
["-serial", "mon:stdio"],
]
}
build {
sources = ["source.qemu.kldload-nginx"]
# Wait for cloud-init to finish
provisioner "shell" {
inline = ["cloud-init status --wait"]
}
# Install and configure Nginx
provisioner "shell" {
inline = [
"sudo dnf install -y nginx certbot python3-certbot-nginx",
"sudo systemctl enable nginx",
"sudo firewall-cmd --permanent --add-service=http --add-service=https",
"sudo firewall-cmd --reload",
]
}
# Copy Nginx configuration
provisioner "file" {
source = "configs/nginx.conf"
destination = "/tmp/nginx.conf"
}
provisioner "shell" {
inline = [
"sudo cp /tmp/nginx.conf /etc/nginx/nginx.conf",
"sudo nginx -t",
]
}
# Security hardening
provisioner "shell" {
script = "scripts/harden.sh"
}
# Re-seal the image for cloning (clean cloud-init state)
provisioner "shell" {
inline = [
"sudo cloud-init clean --logs",
"sudo truncate -s 0 /etc/machine-id",
"sudo rm -f /etc/ssh/ssh_host_*",
"sudo rm -f /var/lib/NetworkManager/*.lease",
"sudo rm -f /root/.bash_history",
"sudo sync",
]
}
}
# Build the Packer image:
packer init kldload-nginx.pkr.hcl
packer build kldload-nginx.pkr.hcl
# Result: /var/lib/libvirt/images/packer-output/kldload-rocky9-nginx.qcow2
# This is a kldload golden image with Nginx baked in.
# ZFS on root. Boot environments. Package holds. Snapshot timers.
# Plus Nginx, certbot, firewall rules, and security hardening.
# Ready to clone and deploy.
Packer for multiple distros
# Build the same application image on multiple kldload distro bases:
variable "distros" {
type = map(string)
default = {
rocky9 = "/var/lib/libvirt/images/kldload-rocky9-server.qcow2"
debian13 = "/var/lib/libvirt/images/kldload-debian13-server.qcow2"
ubuntu24 = "/var/lib/libvirt/images/kldload-ubuntu24-server.qcow2"
}
}
# Use dynamic source blocks to build all three in parallel:
# packer build -parallel-builds=3 kldload-multi.pkr.hcl
The key insight: kldload handles the hard part. ZFS on root with boot environments, DKMS kernel modules, EFI boot entries, package holds, snapshot timers, WireGuard configuration — all of that is in the base image before Packer ever touches it. Packer just adds your application on top. The Packer build takes 2 minutes instead of 30 because the base is already built. And if Packer fails, the base image is still intact — you just fix your Packer config and rebuild the application layer.
Compare to using Packer alone: you would need a kickstart/preseed file to automate the OS install, wait for the full install to complete (15-30 minutes), then run your provisioners. With kldload as the base, the OS install is already done. Packer boots a pre-installed image and layers your changes. Faster, simpler, more reliable.
Terraform integration
Terraform deploys kldload golden images to infrastructure. The libvirt provider creates KVM VMs from kldload qcow2 images. The ZFS integration creates VMs from ZFS clones for instant provisioning. Terraform does not build the image — kldload or Packer does that. Terraform stamps it onto target hosts.
Terraform libvirt provider: deploy from qcow2
# main.tf — deploy kldload golden images with Terraform
terraform {
required_providers {
libvirt = {
source = "dmacvicar/libvirt"
version = "~> 0.8"
}
}
}
provider "libvirt" {
uri = "qemu+ssh://root@hypervisor.infra.local/system"
}
# Upload the kldload golden image as a base volume
resource "libvirt_volume" "kldload_base" {
name = "kldload-rocky9-base.qcow2"
pool = "default"
source = "/var/lib/libvirt/images/kldload-rocky9-server.qcow2"
format = "qcow2"
}
# Create a cloud-init disk for the VM
resource "libvirt_cloudinit_disk" "web_init" {
name = "web-prod-01-init.iso"
pool = "default"
user_data = <<-EOF
#cloud-config
hostname: web-prod-01
fqdn: web-prod-01.infra.local
manage_etc_hosts: true
users:
- name: deploy
ssh_authorized_keys:
- ${file("~/.ssh/id_ed25519.pub")}
sudo: ALL=(ALL) NOPASSWD:ALL
packages:
- nginx
runcmd:
- systemctl enable --now nginx
EOF
network_config = <<-EOF
version: 2
ethernets:
ens3:
addresses: [10.0.1.50/24]
gateway4: 10.0.1.1
nameservers:
addresses: [10.0.1.1]
EOF
}
# Clone the base volume for this VM
resource "libvirt_volume" "web_disk" {
name = "web-prod-01.qcow2"
pool = "default"
base_volume_id = libvirt_volume.kldload_base.id
size = 107374182400 # 100GB
}
# Create the VM
resource "libvirt_domain" "web_prod_01" {
name = "web-prod-01"
memory = 4096
vcpu = 4
cpu {
mode = "host-passthrough"
}
machine = "q35"
cloudinit = libvirt_cloudinit_disk.web_init.id
disk {
volume_id = libvirt_volume.web_disk.id
}
network_interface {
bridge = "br0"
}
console {
type = "pty"
target_type = "serial"
target_port = "0"
}
graphics {
type = "vnc"
listen_type = "address"
}
}
Terraform with ZFS clones: instant provisioning
# deploy-zfs.tf — use ZFS clones instead of qcow2 copies
# This requires a custom null_resource + local-exec approach
# because the libvirt provider does not natively speak ZFS.
variable "vm_count" {
default = 5
}
variable "vm_names" {
type = list(string)
default = ["web-01", "web-02", "web-03", "web-04", "web-05"]
}
# Create ZFS clones from the golden image snapshot
resource "null_resource" "zfs_clone" {
count = var.vm_count
provisioner "local-exec" {
command = <<-SCRIPT
# Snapshot the golden image (idempotent — snapshot already exists is OK)
zfs snapshot rpool/vms/golden-rocky9@template 2>/dev/null || true
# Clone the snapshot — instant, zero-copy
zfs clone rpool/vms/golden-rocky9@template \
rpool/vms/${var.vm_names[count.index]}
# Set refreservation=none for thin provisioning
zfs set refreservation=none rpool/vms/${var.vm_names[count.index]}
SCRIPT
}
provisioner "local-exec" {
when = destroy
command = <<-SCRIPT
virsh destroy ${var.vm_names[count.index]} 2>/dev/null || true
virsh undefine ${var.vm_names[count.index]} --nvram 2>/dev/null || true
zfs destroy rpool/vms/${var.vm_names[count.index]}
SCRIPT
}
}
# Define the VMs using virsh
resource "null_resource" "vm_define" {
count = var.vm_count
depends_on = [null_resource.zfs_clone]
provisioner "local-exec" {
command = <<-SCRIPT
virt-install \
--name ${var.vm_names[count.index]} \
--ram 4096 --vcpus 4 --cpu host \
--machine q35 --os-variant rocky9 \
--disk path=/dev/zvol/rpool/vms/${var.vm_names[count.index]},bus=virtio,cache=none \
--network bridge=br0,model=virtio \
--boot uefi --tpm backend.type=emulator,backend.version=2.0,model=tpm-crb \
--serial pty --console pty \
--graphics vnc --noautoconsole --import
SCRIPT
}
}
# 5 VMs from ZFS clones. Total creation time: ~5 seconds.
# Each VM shares all blocks with the golden image until it diverges.
# Disk usage for 5 x 100GB VMs: ~100GB (not 500GB).
The ZFS clone approach is dramatically faster and more space-efficient than the qcow2 copy approach. The libvirt Terraform provider copies the entire base image for each VM. For a 20GB qcow2, that is 5 copies x 20GB = 100GB of disk I/O and 5 minutes of waiting. With ZFS clones, all 5 VMs are created in under 5 seconds with zero additional disk space. The clones share every block with the parent until they write new data. This is not an optimization. It is a fundamental architectural advantage of using ZFS for VM storage.
Configuration management: Ansible, Salt, Puppet
Configuration management tools work on top of kldload base images. kldload handles the OS layer (ZFS, boot environments, kernel modules, base services). Config management handles the application layer (deploy your app, manage its config, rotate its secrets). This separation is deliberate and important.
Why image + config management beats config management alone
Config management alone means every machine rebuilds itself from scratch. Package mirrors must be reachable. GPG keys must be valid. Templates must render correctly on this specific OS version. With kldload as the base, the OS is already built and tested. Config management only handles 40 tasks (app layer) instead of 400 (full stack). Faster. More reliable. Fewer failure modes.
What kldload handles (do NOT manage these with Ansible)
ZFS pools and datasets. Boot environment configuration. Kernel and ZFS DKMS modules. Package holds. Snapshot timers. EFI boot entries. Bootloader configuration. WireGuard base interface. eBPF tool installation. These are in the image. They are tested. They work. Do not let Ansible touch them.
Ansible: application layer on kldload base
# inventory.yml
all:
hosts:
web-prod-01:
ansible_host: 10.0.1.50
web-prod-02:
ansible_host: 10.0.1.51
web-prod-03:
ansible_host: 10.0.1.52
vars:
ansible_user: deploy
ansible_become: true
# kldload base images have SSH, sudo, and Python pre-installed
# No bootstrap required. Ansible works immediately.
# playbooks/web-server.yml
---
- name: Configure web servers on kldload base
hosts: all
become: true
tasks:
# Application packages — NOT base OS packages (kldload handles those)
- name: Install application packages
ansible.builtin.dnf:
name:
- nginx
- certbot
- python3-certbot-nginx
- redis
- postgresql-server
state: present
# Application configuration
- name: Deploy Nginx configuration
ansible.builtin.template:
src: templates/nginx.conf.j2
dest: /etc/nginx/nginx.conf
mode: '0644'
notify: restart nginx
# Application service
- name: Enable and start services
ansible.builtin.systemd:
name: "{{ item }}"
state: started
enabled: true
loop:
- nginx
- redis
- postgresql
# ZFS dataset for application data (uses kldload's ZFS pool)
- name: Create application data dataset
community.general.zfs:
name: rpool/srv/webapp
state: present
extra_zfs_properties:
compression: lz4
recordsize: 128K
mountpoint: /srv/webapp
# Application-level snapshot policy (supplements kldload's base snapshots)
- name: Create application snapshot timer
ansible.builtin.copy:
dest: /etc/systemd/system/webapp-snapshot.timer
content: |
[Unit]
Description=Snapshot webapp data every 5 minutes
[Timer]
OnCalendar=*:0/5
[Install]
WantedBy=timers.target
notify: reload systemd
handlers:
- name: restart nginx
ansible.builtin.systemd:
name: nginx
state: restarted
- name: reload systemd
ansible.builtin.systemd:
daemon_reload: true
Salt: state files on kldload base
# /srv/salt/web/init.sls
nginx:
pkg.installed: []
service.running:
- enable: True
- watch:
- file: /etc/nginx/nginx.conf
/etc/nginx/nginx.conf:
file.managed:
- source: salt://web/files/nginx.conf
- mode: 644
# Use kldload's ZFS pool for application data
rpool/srv/webapp:
zfs.filesystem_present:
- properties:
compression: lz4
recordsize: 128K
mountpoint: /srv/webapp
Notice what these playbooks and state files do NOT contain: kernel configuration, ZFS pool creation, bootloader setup, package holds, snapshot timer configuration, boot environment management, WireGuard base setup, eBPF tool installation. All of that is in the kldload golden image. The config management handles application deployment and nothing else. This is the correct boundary.
The result: your Ansible runs take 2 minutes instead of 20. Your Salt highstate applies in 30 seconds instead of 10 minutes. Because the OS layer is already done. It was done when the image was built. It works the same on every machine. The only thing that changes machine to machine is the application layer, and that is all your config management needs to handle.
CI/CD pipelines
Build kldload images in CI. Test them with KVM. Promote to production. The pipeline is: build ISO, install to VM, export golden image, test, promote. kldload's containerized build pipeline runs in any CI system with Docker or Podman.
GitHub Actions: build and test kldload images
# .github/workflows/build-golden-image.yml
name: Build Golden Image
on:
push:
branches: [main]
paths:
- 'build/**'
- 'live-build/**'
- 'profiles/**'
jobs:
build-iso:
runs-on: [self-hosted, kvm] # Needs KVM-capable runner
steps:
- uses: actions/checkout@v4
- name: Build kldload ISO
run: |
./deploy.sh clean
./deploy.sh builder-image
PROFILE=server ./deploy.sh build
- name: Install to test VM
run: |
# Create a test disk
qemu-img create -f qcow2 /tmp/test-disk.qcow2 50G
# Boot the ISO with an answers file
cat > /tmp/answers.env << 'EOF'
KLDLOAD_DISTRO=rocky
KLDLOAD_DISK=/dev/vda
KLDLOAD_HOSTNAME=ci-test
KLDLOAD_PROFILE=server
KLDLOAD_USERNAME=ci
KLDLOAD_PASSWORD=ci
KLDLOAD_EXPORT_FORMAT=qcow2
EOF
# Create seed ISO
mkdir -p /tmp/seed
cp /tmp/answers.env /tmp/seed/answers.env
genisoimage -o /tmp/seed.iso -V KLDLOAD-SEED -J -R /tmp/seed/
# Run the install in QEMU (headless)
timeout 1800 qemu-system-x86_64 \
-machine q35,accel=kvm -cpu host -m 4096 -smp 4 \
-drive file=/tmp/test-disk.qcow2,format=qcow2,if=virtio \
-cdrom live-build/output/kldload-free-*.iso \
-drive file=/tmp/seed.iso,format=raw,if=virtio \
-nographic -serial mon:stdio \
-boot d
- name: Validate golden image
run: |
# Boot the installed image and verify
timeout 120 qemu-system-x86_64 \
-machine q35,accel=kvm -cpu host -m 4096 -smp 4 \
-drive file=/tmp/test-disk.qcow2,format=qcow2,if=virtio \
-nographic -serial mon:stdio \
-net nic -net user,hostfwd=tcp::2222-:22 &
sleep 30
# SSH into the VM and validate
ssh -o StrictHostKeyChecking=no -p 2222 ci@localhost \
'zpool status && systemctl is-active kldload-snapshot.timer'
- name: Upload golden image
if: github.ref == 'refs/heads/main'
run: |
scp /tmp/export/*.qcow2 images@images.infra.local:/var/lib/golden/
GitLab CI: multi-distro image pipeline
# .gitlab-ci.yml
stages:
- build
- test
- promote
variables:
PROFILE: server
build-iso:
stage: build
tags: [kvm, privileged]
script:
- ./deploy.sh builder-image
- PROFILE=${PROFILE} ./deploy.sh build
artifacts:
paths:
- live-build/output/*.iso
expire_in: 7 days
.test-distro:
stage: test
tags: [kvm, privileged]
script:
- |
# Create answers file for this distro
cat > /tmp/answers.env << EOF
KLDLOAD_DISTRO=${DISTRO}
KLDLOAD_DISK=/dev/vda
KLDLOAD_HOSTNAME=ci-${DISTRO}
KLDLOAD_PROFILE=${PROFILE}
KLDLOAD_USERNAME=ci
KLDLOAD_PASSWORD=ci
KLDLOAD_EXPORT_FORMAT=qcow2
EOF
# Install and export (see build script above)
./ci/install-and-export.sh
# Validate
./ci/validate-image.sh /tmp/export/*.qcow2
artifacts:
paths:
- /tmp/export/*.qcow2
test-rocky:
extends: .test-distro
variables:
DISTRO: rocky
test-debian:
extends: .test-distro
variables:
DISTRO: debian
test-ubuntu:
extends: .test-distro
variables:
DISTRO: ubuntu
promote:
stage: promote
only:
- main
script:
- |
for img in /tmp/export/*.qcow2; do
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
BASENAME=$(basename "$img" .qcow2)
scp "$img" images@images.infra.local:/var/lib/golden/${BASENAME}-${TIMESTAMP}.qcow2
# Update the "latest" symlink
ssh images@images.infra.local \
"ln -sf /var/lib/golden/${BASENAME}-${TIMESTAMP}.qcow2 \
/var/lib/golden/${BASENAME}-latest.qcow2"
done
The CI pipeline builds the ISO, installs it to a VM, exports a golden image, validates it, and promotes it to the image server. Every merge to main produces tested, validated golden images for every supported distro. The images are timestamped and symlinked. Terraform always pulls "latest". Rollback is changing a symlink.
This is the same pipeline Netflix uses for their AMI baking. The same pipeline Google uses for their GCE images. The same pipeline every serious infrastructure team uses. Build once, test once, stamp everywhere. kldload makes the "build once" part trivial because it handles ZFS on root, boot environments, and all the hard OS-level configuration that Packer alone cannot do.
Fleet management
Managing 10 machines is SSH and shell scripts. Managing 100 machines is Ansible and cron. Managing 1,000 machines is ZFS replication, WireGuard backplane, and automated snapshot policies. kldload gives you the primitives for all three scales.
ZFS replication for fleet updates
The fastest way to update a fleet of kldload hosts is ZFS send/receive. Build the updated golden image on one machine. Snapshot it. Send the incremental delta to every other machine. The delta contains only the changed blocks — typically a few hundred megabytes even for a major OS update. Each host receives the snapshot, creates a new boot environment from it, and switches to it on next reboot.
# Build machine: create an updated golden image
# (After running kupgrade or installing new packages)
zfs snapshot rpool/ROOT/rocky@v2.0-2026-04-04
# Send the incremental delta to every fleet host
for host in web-{01..50}.infra.local; do
zfs send -i rpool/ROOT/rocky@v1.0 rpool/ROOT/rocky@v2.0-2026-04-04 \
| ssh root@${host} "zfs receive rpool/ROOT/rocky-v2"
done
# On each host: set the new boot environment as default
for host in web-{01..50}.infra.local; do
ssh root@${host} "kbe set-default rocky-v2"
done
# Rolling reboot (one at a time, verify before proceeding)
for host in web-{01..50}.infra.local; do
echo "Rebooting ${host}..."
ssh root@${host} "reboot"
sleep 30
until ssh root@${host} "kbe current" 2>/dev/null | grep -q rocky-v2; do
sleep 5
done
echo "${host} is on rocky-v2"
done
WireGuard backplane for management
# Every kldload host gets a WireGuard interface for management traffic.
# This is separate from production networking. Management traffic
# is encrypted, authenticated, and travels over a dedicated mesh.
# On the management hub (your jump box / bastion):
cat > /etc/wireguard/wg-mgmt.conf << 'EOF'
[Interface]
PrivateKey =
Address = 10.100.0.1/24
ListenPort = 51820
# web-01
[Peer]
PublicKey =
AllowedIPs = 10.100.0.10/32
# web-02
[Peer]
PublicKey =
AllowedIPs = 10.100.0.11/32
# ... repeat for every fleet host
EOF
# On each fleet host (via answers file or Ansible):
cat > /etc/wireguard/wg-mgmt.conf << 'EOF'
[Interface]
PrivateKey =
Address = 10.100.0.10/24
[Peer]
PublicKey =
Endpoint = hub.infra.local:51820
AllowedIPs = 10.100.0.0/24
PersistentKeepalive = 25
EOF
systemctl enable --now wg-quick@wg-mgmt
# Now you can SSH to any fleet host over WireGuard:
ssh root@10.100.0.10 # web-01 via encrypted backplane
ssh root@10.100.0.11 # web-02 via encrypted backplane
# ZFS replication also runs over WireGuard:
zfs send -i rpool/ROOT/rocky@v1.0 rpool/ROOT/rocky@v2.0 \
| ssh root@10.100.0.10 "zfs receive rpool/ROOT/rocky-v2"
Sanoid/Syncoid for automated backup
# /etc/sanoid/sanoid.conf — snapshot retention policy
# kldload installs sanoid by default on desktop and server profiles
[rpool/ROOT]
use_template = production
recursive = yes
[rpool/srv]
use_template = production
recursive = yes
[rpool/home]
use_template = production
recursive = yes
[template_production]
frequently = 4 # 4 x 15-minute snapshots (1 hour)
hourly = 24 # 24 hourly snapshots (1 day)
daily = 30 # 30 daily snapshots (1 month)
monthly = 12 # 12 monthly snapshots (1 year)
yearly = 2 # 2 yearly snapshots
autosnap = yes
autoprune = yes
# Syncoid: replicate to backup server automatically
# /etc/cron.d/kldload-syncoid
# Replicate boot environments every hour
0 * * * * root syncoid --recursive --no-privilege-elevation \
rpool/ROOT root@backup.infra.local:backup/web-01/ROOT
# Replicate service data every 15 minutes
*/15 * * * * root syncoid --recursive --no-privilege-elevation \
rpool/srv root@backup.infra.local:backup/web-01/srv
# Replicate home directories every hour
0 * * * * root syncoid --recursive --no-privilege-elevation \
rpool/home root@backup.infra.local:backup/web-01/home
# Syncoid uses incremental sends — only changed blocks transfer.
# First run: full send (minutes to hours depending on data).
# Subsequent runs: seconds to minutes (just the delta).
The fleet management story with kldload is: build one golden image. Stamp it onto every machine (USB, PXE, or ZFS clone). Manage the backplane with WireGuard. Push updates with ZFS send/receive. Back up with Sanoid/Syncoid. Every machine has boot environments, so a bad update is one kbe rollback away from being fixed. Every machine has ZFS snapshots, so a bad config change is one zfs rollback away from being fixed. Every machine has WireGuard, so management traffic is encrypted even on untrusted networks.
This is the same architecture that large enterprises use, except they pay Broadcom or Red Hat six figures a year for the privilege. kldload gives you the same thing with standard Linux tools. The technology is free. The knowledge is on this page.
Scripting patterns
Practical bash patterns for automating kldload operations. These are the building blocks for your own automation scripts. Copy them. Modify them. Chain them together.
Bulk VM creation from a golden image
#!/bin/bash
# create-fleet.sh — create N VMs from a kldload golden image
set -euo pipefail
GOLDEN="rpool/vms/golden-rocky9"
POOL="rpool/vms"
BRIDGE="br0"
RAM=4096
VCPUS=4
# Snapshot the golden image (if not already done)
zfs snapshot "${GOLDEN}@template" 2>/dev/null || true
for i in $(seq 1 "$1"); do
NAME="web-$(printf '%02d' "$i")"
echo "Creating ${NAME}..."
# ZFS clone (instant, zero-copy)
zfs clone "${GOLDEN}@template" "${POOL}/${NAME}"
zfs set refreservation=none "${POOL}/${NAME}"
# Create and start the VM
virt-install \
--name "${NAME}" \
--ram "${RAM}" --vcpus "${VCPUS}" --cpu host \
--machine q35 --os-variant rocky9 \
--disk "path=/dev/zvol/${POOL}/${NAME},bus=virtio,cache=none" \
--network "bridge=${BRIDGE},model=virtio" \
--boot uefi \
--tpm backend.type=emulator,backend.version=2.0,model=tpm-crb \
--serial pty --console pty \
--graphics vnc --noautoconsole --import
echo "${NAME} created and started"
done
echo "Done. Created $1 VMs in $(printf '%d' "$SECONDS") seconds."
# Typical output: "Done. Created 20 VMs in 12 seconds."
Snapshot all VMs before maintenance
#!/bin/bash
# snap-all.sh — snapshot every running VM before maintenance
set -euo pipefail
TIMESTAMP=$(date +%Y%m%dT%H%M%S)
TAG="${1:-pre-maintenance}"
for vm in $(virsh list --name); do
# Find the zvol backing this VM
ZVOL=$(virsh domblklist "${vm}" | awk '/zvol/ {print $2}' | sed 's|/dev/zvol/||')
if [[ -n "${ZVOL}" ]]; then
SNAPNAME="${ZVOL}@${TAG}-${TIMESTAMP}"
echo "Snapshotting ${SNAPNAME}..."
zfs snapshot "${SNAPNAME}"
fi
done
echo "All VMs snapshotted with tag: ${TAG}-${TIMESTAMP}"
echo "To rollback any VM: zfs rollback @${TAG}-${TIMESTAMP}"
Fleet health check
#!/bin/bash
# fleet-health.sh — check health of all kldload hosts
set -euo pipefail
HOSTS_FILE="${1:-/etc/kldload/fleet-hosts.txt}"
# File format: one hostname or IP per line
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[0;33m'
NC='\033[0m'
printf "%-25s %-10s %-15s %-10s %-10s %-15s\n" \
"HOST" "SSH" "ZFS" "BOOT-ENV" "SNAPS" "LAST-SCRUB"
while IFS= read -r host; do
[[ -z "${host}" || "${host}" =~ ^# ]] && continue
# SSH check
if ssh -o ConnectTimeout=5 -o BatchMode=yes "root@${host}" true 2>/dev/null; then
SSH="${GREEN}OK${NC}"
else
SSH="${RED}FAIL${NC}"
printf "%-25s %-10b %-15s %-10s %-10s %-15s\n" "${host}" "${SSH}" "-" "-" "-" "-"
continue
fi
# ZFS pool health
POOL_HEALTH=$(ssh "root@${host}" "zpool status -x rpool 2>/dev/null" || echo "DEGRADED")
if echo "${POOL_HEALTH}" | grep -q "healthy"; then
ZFS="${GREEN}ONLINE${NC}"
else
ZFS="${RED}DEGRADED${NC}"
fi
# Current boot environment
BOOT_ENV=$(ssh "root@${host}" "kbe current 2>/dev/null" || echo "unknown")
# Snapshot count
SNAP_COUNT=$(ssh "root@${host}" "zfs list -t snapshot -H -o name rpool 2>/dev/null | wc -l" || echo "?")
# Last scrub
LAST_SCRUB=$(ssh "root@${host}" "zpool status rpool 2>/dev/null | grep 'scan:' | awk '{print \$NF}'" || echo "never")
printf "%-25s %-10b %-15b %-10s %-10s %-15s\n" \
"${host}" "${SSH}" "${ZFS}" "${BOOT_ENV}" "${SNAP_COUNT}" "${LAST_SCRUB}"
done < "${HOSTS_FILE}"
Cron jobs for maintenance
# /etc/cron.d/kldload-maintenance
# Standard cron jobs for a kldload production host
# ZFS scrub — weekly on Sunday at 2 AM
0 2 * * 0 root zpool scrub rpool
# Prune old snapshots (sanoid handles this, but belt-and-suspenders)
0 3 * * * root sanoid --cron
# ZFS ARC stats to Prometheus textfile collector
*/5 * * * * root /usr/local/bin/kldload-arc-stats > /var/lib/node_exporter/textfile/zfs-arc.prom
# Boot environment cleanup — remove boot envs older than 30 days
0 4 * * 0 root kbe prune --older-than 30d
# Check for ZFS pool errors and alert
*/10 * * * * root zpool status -x rpool | grep -v "healthy" && \
curl -s -X POST "https://hooks.slack.com/services/T.../B.../..." \
-d '{"text":"ZFS pool rpool is degraded on '$(hostname)'"}'
# Verify ZFS checksums match (paranoid mode)
0 5 1 * * root zpool status -v rpool | grep -E "CKSUM|errors" \
>> /var/log/kldload/zfs-integrity.log
Automated golden image refresh
#!/bin/bash
# refresh-golden.sh — rebuild golden images monthly
# Run this on your build server via cron or CI trigger
set -euo pipefail
BUILD_DIR="/opt/kldload-free"
OUTPUT_DIR="/var/lib/golden"
TIMESTAMP=$(date +%Y%m%d)
cd "${BUILD_DIR}"
git pull origin main
# Rebuild the ISO with latest packages
./deploy.sh builder-image
PROFILE=server ./deploy.sh build
ISO=$(ls -t live-build/output/*.iso | head -1)
# For each distro, install + export a golden image
for DISTRO in rocky debian ubuntu centos; do
echo "Building golden image for ${DISTRO}..."
DISK="/tmp/golden-${DISTRO}.qcow2"
qemu-img create -f qcow2 "${DISK}" 50G
# Create answers file
cat > /tmp/answers-${DISTRO}.env << EOF
KLDLOAD_DISTRO=${DISTRO}
KLDLOAD_DISK=/dev/vda
KLDLOAD_HOSTNAME=golden-${DISTRO}
KLDLOAD_PROFILE=server
KLDLOAD_USERNAME=admin
KLDLOAD_PASSWORD=admin
KLDLOAD_EXPORT_FORMAT=qcow2
EOF
# Create seed ISO
mkdir -p /tmp/seed-${DISTRO}
cp /tmp/answers-${DISTRO}.env /tmp/seed-${DISTRO}/answers.env
genisoimage -o /tmp/seed-${DISTRO}.iso -V KLDLOAD-SEED -J -R /tmp/seed-${DISTRO}/
# Install (headless QEMU)
timeout 1800 qemu-system-x86_64 \
-machine q35,accel=kvm -cpu host -m 4096 -smp 4 \
-drive file="${DISK}",format=qcow2,if=virtio \
-cdrom "${ISO}" \
-drive file=/tmp/seed-${DISTRO}.iso,format=raw,if=virtio \
-nographic -serial mon:stdio -boot d
# Copy to output
OUTNAME="kldload-${DISTRO}-server-${TIMESTAMP}.qcow2"
cp "${DISK}" "${OUTPUT_DIR}/${OUTNAME}"
ln -sf "${OUTPUT_DIR}/${OUTNAME}" "${OUTPUT_DIR}/kldload-${DISTRO}-server-latest.qcow2"
echo "${DISTRO}: ${OUTPUT_DIR}/${OUTNAME}"
rm -f "${DISK}" /tmp/seed-${DISTRO}.iso /tmp/answers-${DISTRO}.env
rm -rf /tmp/seed-${DISTRO}
done
echo "All golden images refreshed: ${TIMESTAMP}"
virsh + ZFS: common operations
# ── Common virsh + ZFS operations ──────────────────────
# List all VMs with their ZFS storage usage
for vm in $(virsh list --all --name); do
ZVOL=$(virsh domblklist "${vm}" 2>/dev/null | awk '/zvol/ {print $2}' | sed 's|/dev/zvol/||')
if [[ -n "${ZVOL}" ]]; then
USED=$(zfs get -H -o value used "${ZVOL}" 2>/dev/null || echo "?")
REFER=$(zfs get -H -o value referenced "${ZVOL}" 2>/dev/null || echo "?")
RATIO=$(zfs get -H -o value compressratio "${ZVOL}" 2>/dev/null || echo "?")
STATE=$(virsh domstate "${vm}" 2>/dev/null || echo "?")
printf "%-20s %-10s %-10s %-10s %-8s\n" "${vm}" "${STATE}" "${USED}" "${REFER}" "${RATIO}"
fi
done
# Clone a running VM (snapshot + clone + define)
VM="web-prod-01"
CLONE="web-staging-01"
ZVOL="rpool/vms/${VM}"
TAG=$(date +%Y%m%dT%H%M%S)
zfs snapshot "${ZVOL}@clone-${TAG}"
zfs clone "${ZVOL}@clone-${TAG}" "rpool/vms/${CLONE}"
zfs set refreservation=none "rpool/vms/${CLONE}"
# Dump and modify the VM XML
virsh dumpxml "${VM}" > /tmp/${CLONE}.xml
sed -i "s/${VM}/${CLONE}/g" /tmp/${CLONE}.xml
# Generate new UUID and MAC
NEW_UUID=$(uuidgen)
NEW_MAC=$(printf '52:54:00:%02x:%02x:%02x' $((RANDOM%256)) $((RANDOM%256)) $((RANDOM%256)))
sed -i "s|.* |${NEW_UUID} |" /tmp/${CLONE}.xml
sed -i "s|
These are not theoretical examples. They are the actual commands I run in production. The bulk VM creation script creates 20 VMs in 12 seconds. The snapshot script snapshots every running VM before maintenance in under 2 seconds. The fleet health check runs in parallel and shows you the entire fleet in one table. The golden image refresh rebuilds every distro image monthly and timestamps them for rollback.
None of this requires a product. No Proxmox. No VMware. No Ansible Tower. No Jenkins. Standard bash scripts, standard Linux commands, standard ZFS operations. The only thing kldload adds is the base image and the tools that make the common operations one-liners instead of ten-liners. Everything on this page works on any kldload-installed machine with the server or desktop profile.
Disk labeling
Every disk has a passport
Structured disk labels encode physical location, ZFS pool membership, warranty, and RMA information. When a disk fails at 3 AM, the replacement procedure is on the label. No spreadsheet. No CMDB lookup. No guessing which disk in which slot.
PHYSICAL LOCATION
Region: CA-WEST-1 Datacenter: YVR01 Rack: R12-08 Slot: SLOT07
ZFS INFORMATION
Pool: prd-caw1-db-gold-nvme VDEV: slot07 Layout: draid2:10d:2c:128s
LIFECYCLE
Asset ID: UB-DSK-CAW1-88322 Warranty: 2028-02-12
RMA: https://cdw.ca/rma/S6ZUNX0R123456A
This is not cosmetic. On a 60-disk JBOD, replacing the wrong disk destroys a RAIDZ vdev. The label tells you: this is slot 7, it is part of this pool, it is in this vdev, and here is the CDW RMA link for the exact model. The person replacing it at 3 AM does not need to know ZFS. They need to read the label, pull the disk, click the link, and plug in the replacement. ZFS resilvers automatically. The label makes the human part foolproof.
Putting it all together
Here is the complete automation pipeline for a production kldload deployment, from nothing to a running fleet:
PROFILE=server ./deploy.sh build produces the kldload ISO. Containerized. Reproducible. Takes 10-20 minutes.kexport produces a qcow2. The image is sealed for cloning: machine-id cleared, SSH keys removed, cloud-init enabled.zfs send the delta to every host. Create a new boot environment. Rolling reboot. Rollback is kbe rollback.The entire pipeline is standard Linux commands. podman builds the ISO. qemu installs it. zfs clone deploys it. cloud-init personalizes it. ansible configures the app layer. zfs send backs it up. zfs send updates it. kbe rollback fixes it if an update goes wrong. Every tool in the chain is free, open source, and runs on any Linux machine.
There is no vendor. There is no license. There is no subscription. There is no phone-home. There is no nag screen. There is no "enterprise edition" with the features you actually need behind a paywall. The entire stack is yours. The knowledge is on this page. Build your infrastructure. Own it completely.