kldload kldload — your Linux re-packer your Linux re-packer — for freegt; kldload — infrastructure, your way — for freemdash; pick your distro, get ZFS on root

Automation

kldload builds the base. Automation layers on top. Never the other way around.

kldload is not an automation tool. It is an image factory. It produces fully-configured, ZFS-on-root base images that every automation tool in existence can consume — Packer, Terraform, Ansible, Salt, Puppet, cloud-init, shell scripts, or nothing at all. The ISO does attended and unattended installs. The answer file decides. The golden image workflow produces cloud-ready templates. Everything downstream — provisioning, configuration management, CI/CD — starts from a known-good base that kldload already built.

The thesis: Most automation starts with a blank OS and pushes configuration to it after the fact. kldload starts with a fully-configured image and pushes it everywhere. Image-based deployment is faster, more reproducible, and more reliable than configuration management alone. The image IS the configuration. The machine boots, it is already done. Your automation tools — if you use them — connect to a finished machine, not a blank one.

Packer does not replace kldload. Terraform does not replace kldload. Ansible does not replace kldload. kldload feeds INTO all of them. It is the first stage of the pipeline. Everything else is second stage.

I spent years doing it backwards. Install a minimal OS. Run Ansible against it. Wait 20 minutes for 400 tasks to converge. Hope nothing fails. Hope the package mirror is up. Hope the GPG keys have not rotated. Hope the Jinja template renders correctly on this specific version of this specific distro. Then do it again on the next machine. And the next. And the next.

The day I switched to image-based deployment, I deleted 3,000 lines of Ansible. The image boots in 15 seconds and is already configured. Ansible still runs — but it handles application-layer changes on top of a known-good base. 40 tasks instead of 400. 2 minutes instead of 20. Zero failures from package mirrors or GPG keys or template rendering. The base image is immutable. It was tested once, when it was built. It works the same every time.

Automation philosophy

There are two models for infrastructure automation. Configuration management starts with a bare OS and pushes state to it: install packages, write config files, restart services, repeat. Image-based deployment starts with a pre-built image that already contains everything and stamps it onto machines. kldload is firmly in the second camp. It produces images. Automation tools consume them.

Configuration management (push model)

Install a minimal OS. Point Ansible/Salt/Puppet at it. Wait for convergence. Every machine rebuilds itself from scratch every time. Package downloads, template rendering, service restarts — all happen on every run. Slow. Fragile. Depends on network, package mirrors, and the config management tool itself being reachable.

Every deploy is a fresh compilation. You hope the compiler works the same every time.

Image-based deployment (stamp model)

Build the image once. Test it. Stamp it onto every machine. Boot time is deploy time. No package downloads, no template rendering, no convergence. The image is the artifact. It was tested when it was built. It is identical on every machine. ZFS makes cloning free. kldload makes building free.

Every deploy is copying a binary. The compilation happened once, at build time.

The ideal architecture uses both models together. kldload builds the base image (OS, ZFS, WireGuard, boot environments, package holds, snapshot timers). Configuration management handles the application layer on top (deploy your app, write its config, manage its secrets). The base never changes. The application layer changes constantly. Two layers, two tools, clean separation.

Where kldload fits in the pipeline

kldload is Stage 0. It produces the raw material — a bootable, ZFS-on-root, fully-configured base system. Everything else is downstream:

Stage 0: kldload ISO  -->  Install to disk / export golden image
Stage 1: Packer       -->  Layer application packages onto the golden image
Stage 2: Terraform    -->  Deploy the Packer artifact to infrastructure
Stage 3: Ansible      -->  Push runtime config, secrets, app deploys
Stage 4: Monitoring   -->  Observe everything with eBPF + Prometheus + Grafana

You can skip any stage. Use kldload alone with USB sticks for air-gapped deployments. Use kldload + Terraform for cloud. Use kldload + Ansible for existing infrastructure. The stages are independent. Mix and match.

The fundamental insight: Packer and Terraform cannot build what kldload builds. They do not know how to set up ZFS on root. They do not know how to configure boot environments. They do not know how to build a darksite. They do not know how to partition a disk with an ESP, set up ZFS pools, compile DKMS modules, and configure a bootloader that understands ZFS datasets. kldload does all of that. Packer and Terraform consume the result.

This is not a limitation. It is the correct architecture. The OS image factory is a specialized tool. The infrastructure orchestrator is a specialized tool. They do different jobs. Trying to make Terraform build an OS image is like trying to make a compiler also be a text editor. Use the right tool for each job.

Unattended installation

There are two ways to install: sit at the web UI and click through it, or write an answer file and never touch a keyboard. Same ISO. Same boot. Same result. The difference is whether a seed disk is present when the machine boots.

Most automation tools work backwards. You install the OS by hand, then push configuration to it with Ansible/Puppet/Salt/Chef after the fact. The OS is a blank canvas that gets painted by a remote orchestrator. That means: the orchestrator has to reach the machine (networking), authenticate to it (credentials), and run successfully (dependencies). Three things that can fail before your automation even starts.

kldload works forwards. The answer file is on a USB stick next to the ISO. The machine boots, finds the answers, installs itself. No network required. No orchestrator required. No credentials required. The machine configures itself from local data. When it comes up, it is already done. Your orchestrator — if you have one — connects to a finished machine, not a blank one.

This is the difference between push and pull. Push automation requires infrastructure to exist before it can create infrastructure. Pull automation requires a USB stick.

The answers file — complete reference

Seed disk: write a file, plug it in, walk away

Format a USB drive as FAT32, label it KLDLOAD-SEED, drop an answers.env file on it. Insert it alongside the ISO. Boot. The system finds the seed disk, reads the answers, installs to the target disk, and powers off. Zero interaction.

The answer file is environment variables. Every variable has a sane default. You only need to set what you want to change. Here is the complete reference:

# ═══════════════════════════════════════════════════════
# answers.env — kldload unattended install configuration
# ═══════════════════════════════════════════════════════

# ── Core ──────────────────────────────────────────────
KLDLOAD_DISTRO=debian              # centos | debian | ubuntu | fedora | rhel | rocky | arch | alpine
KLDLOAD_PROFILE=server             # desktop | server | core
KLDLOAD_DISK=/dev/sda              # Target block device (wiped entirely)
KLDLOAD_HOSTNAME=web-prod-01       # System hostname
KLDLOAD_USERNAME=admin             # Non-root admin user (gets passwordless sudo)
KLDLOAD_PASSWORD=changeme          # Password for admin user
KLDLOAD_TIMEZONE=America/Vancouver # TZ database name
KLDLOAD_LOCALE=en_US.UTF-8         # System locale
KLDLOAD_KEYBOARD_LAYOUT=us         # XKB keyboard layout
KLDLOAD_KEYBOARD_VARIANT=          # XKB keyboard variant (optional)

# ── SSH ───────────────────────────────────────────────
KLDLOAD_SSH_PUBKEY="ssh-ed25519 AAAA... user@host"
KLDLOAD_ADMIN_SSH_PUBKEY="ssh-ed25519 AAAA... ops@infra"
KLDLOAD_GENERATE_SSH_KEY=1         # 1 = generate host keys on install

# ── ZFS ───────────────────────────────────────────────
KLDLOAD_STORAGE_MODE=zfs           # zfs (only supported mode)
KLDLOAD_ZFS_TOPOLOGY=single        # single | mirror | raidz1 | mirror-stripe
KLDLOAD_ZFS_ENCRYPT=0              # 1 = native ZFS encryption (passphrase prompt on boot)
KLDLOAD_ZFS_DATA_DISKS=            # Additional disks for mirror/raidz (space-separated)
KLDLOAD_ZFS_SPECIAL_DISKS=         # Special vdev (metadata) disks (space-separated)
KLDLOAD_ENABLE_ZFS=1               # Always 1 — ZFS is non-optional
KLDLOAD_FORCE_WIPE=1               # 1 = wipe disk without confirmation

# ── Networking ────────────────────────────────────────
KLDLOAD_NET_METHOD=dhcp            # dhcp | static
KLDLOAD_NET_IFACE=eth0             # Interface for static config
KLDLOAD_NET_IP=10.0.1.50           # Static IP (required if static)
KLDLOAD_NET_PREFIX=24              # CIDR prefix
KLDLOAD_NET_GW=10.0.1.1            # Default gateway (required if static)
KLDLOAD_NET_DNS=1.1.1.1,8.8.8.8   # DNS servers (comma-separated)

# ── Features ──────────────────────────────────────────
KLDLOAD_ENABLE_KVM=1               # 1 = install libvirt + QEMU/KVM
KLDLOAD_ENABLE_EBPF=1              # 1 = install eBPF tools (bcc, bpftrace)
KLDLOAD_ENABLE_AI=0                # 1 = install AI/ML stack
KLDLOAD_NVIDIA_DRIVERS=0           # 1 = install NVIDIA drivers
KLDLOAD_WIREGUARD=0                # 1 = configure WireGuard interface

# ── Packages ──────────────────────────────────────────
KLDLOAD_EXTRA_PACKAGES=            # Additional packages (comma or space separated)
KLDLOAD_KEEP_DARKSITE=0            # 1 = copy darksite to installed system
KLDLOAD_CUSTOM_MIRROR_URL=         # Override default package mirror

# ── Cluster ───────────────────────────────────────────
KLDLOAD_INFRA_MODE=standalone      # standalone | cluster
KLDLOAD_CLUSTER_CIDR=              # WireGuard cluster CIDR (e.g., 10.100.0.0/24)
KLDLOAD_CLUSTER_DOMAIN=infra.local # Cluster DNS domain
KLDLOAD_CLUSTER_SIZE=16            # Max nodes in cluster
KLDLOAD_HUB_LAN=                   # Hub LAN interface for cluster

# ── Export (golden image) ─────────────────────────────
KLDLOAD_EXPORT_FORMAT=none         # none | qcow2 | vmdk | vhd | ova | raw
KLDLOAD_EXPORT_SCP_HOST=           # Remote host for SCP upload
KLDLOAD_EXPORT_SCP_USER=root       # SCP user
KLDLOAD_EXPORT_SCP_PATH=/root/     # SCP destination path
KLDLOAD_EXPORT_SCP_KEY=            # Path to SSH private key for SCP
KLDLOAD_EXPORT_SCP_PASS=           # SSH password for SCP (if no key)

# ── Distro-specific ───────────────────────────────────
KLDLOAD_RELEASE=9                  # CentOS/RHEL/Rocky release version
KLDLOAD_DEBIAN_SUITE=trixie        # Debian suite (trixie, bookworm)
KLDLOAD_DEBIAN_MIRROR=https://mirror.it.ubc.ca/debian

# ── Bootloader ────────────────────────────────────────
KLDLOAD_BOOTLOADER_ID=KLDload     # EFI boot entry name

That is the entire API for unattended deployment. Environment variables in a flat file on a FAT32 USB stick. No YAML. No Jinja templates. No 400-line kickstart file. No preseed with undocumented d-i directives. No curtin with YAML that changes syntax between Ubuntu releases.

Every variable has a sane default. A minimal answers file is three lines: distro, disk, hostname. Everything else falls to defaults — DHCP networking, server profile, UTC timezone, admin user with password "admin". For production you will set more, but for testing you can deploy a full ZFS-on-root system with three lines of configuration.

How seed disk detection works

The branch point: seed disk or human

kldload-autoinstall.service runs on every live boot. It scans all removable media for a FAT32 partition labeled KLDLOAD-SEED containing answers.env. If found: source the file, export every KLDLOAD_* variable, run the installer with zero interaction. If not found: start the web UI at :8080 and wait for a human. Same ISO. Same boot sequence. Same installer binary. The presence or absence of a seed disk is the only branch point.

# Create a seed USB on any Linux machine:
mkfs.vfat -n KLDLOAD-SEED /dev/sdb1
mount /dev/sdb1 /mnt
cat > /mnt/answers.env << 'EOF'
KLDLOAD_DISTRO=rocky
KLDLOAD_DISK=/dev/nvme0n1
KLDLOAD_HOSTNAME=db-prod-01
KLDLOAD_PROFILE=server
KLDLOAD_USERNAME=sysadmin
KLDLOAD_PASSWORD='correct-horse-battery-staple'
KLDLOAD_TIMEZONE=America/Toronto
KLDLOAD_SSH_PUBKEY="ssh-ed25519 AAAA... ops@infra"
KLDLOAD_ZFS_TOPOLOGY=mirror
KLDLOAD_ZFS_DATA_DISKS=/dev/nvme1n1
KLDLOAD_NET_METHOD=static
KLDLOAD_NET_IP=10.0.1.50
KLDLOAD_NET_PREFIX=24
KLDLOAD_NET_GW=10.0.1.1
KLDLOAD_NET_DNS=10.0.1.1
EOF
umount /mnt

# Boot the machine with the kldload ISO + this USB.
# The machine installs itself. Zero interaction.

WebSocket API for scriptable installs

Skip the USB sticks entirely. The web UI exposes a WebSocket API on port 8080. Any script that speaks WebSocket can send install commands with JSON payloads. This is how you automate installation over the network when the machines are already booted into the live ISO.

#!/usr/bin/env python3
# install-remote.py — trigger unattended install via WebSocket
import asyncio, websockets, json

async def install(host):
    async with websockets.connect(f"ws://{host}:8080/ws") as ws:
        await ws.send(json.dumps({
            "action": "install",
            "distro": "debian",
            "disk": "/dev/sda",
            "hostname": "web-prod-01",
            "username": "admin",
            "password": "changeme",
            "profile": "server",
            "timezone": "America/Vancouver",
            "ssh_pubkey": "ssh-ed25519 AAAA... user@host"
        }))
        # Stream install progress
        async for msg in ws:
            data = json.loads(msg)
            print(f"[{data.get('phase','')}] {data.get('message','')}")
            if data.get('status') == 'complete':
                break

# Install 10 machines in parallel
hosts = [f"10.0.1.{i}" for i in range(50, 60)]
asyncio.run(asyncio.gather(*[install(h) for h in hosts]))

Want 50 machines? Write 50 answer files (one per hostname). Burn 50 USB sticks. Plug them in. Boot. Walk away. Come back to 50 installed machines with ZFS on root, WireGuard ready, snapshots running, boot environments configured. No PXE server. No TFTP. No DHCP options. No network boot infrastructure at all.

Or use the WebSocket API and install all 50 from a single laptop in your server room. Boot them from the ISO over IPMI virtual media, run the Python script, walk away. Same result, different delivery mechanism.

PXE boot workflow

For large deployments with existing PXE infrastructure, you can netboot the kldload live environment and supply answers via HTTP instead of USB:

# DHCP server config (ISC DHCP)
subnet 10.0.1.0 netmask 255.255.255.0 {
  range 10.0.1.100 10.0.1.200;
  option routers 10.0.1.1;
  next-server 10.0.1.5;           # TFTP server
  filename "pxelinux.0";          # or shimx64.efi for UEFI
}

# TFTP: extract vmlinuz and initrd from the kldload ISO
mount -o loop kldload-free-1.0.2.iso /mnt/iso
cp /mnt/iso/isolinux/vmlinuz /tftpboot/
cp /mnt/iso/isolinux/initrd.img /tftpboot/

# PXE menu entry (pxelinux.cfg/default)
LABEL kldload
  KERNEL vmlinuz
  APPEND initrd=initrd.img root=live:http://10.0.1.5/kldload.squashfs \
         rd.live.image rd.live.overlay.overlayfs \
         kldload.answers=http://10.0.1.5/answers/answers.env

# HTTP server serves the squashfs and per-host answer files
# Use hostname-based answers: answers/${hostname}.env
# The installer checks KLDLOAD_ANSWERS_URL kernel parameter

Firstboot and systemd services

The machine finishes configuring itself on first power-on

kldload-firstboot.service runs once on the first boot of an installed system. It reads the install manifest at /etc/kldload/install-manifest.env — the record of every choice made during installation — and finishes what the installer started. Package holds locked. Snapshot timers enabled. SSH keys generated. WireGuard interface ready. Then the service disables itself. It never runs again.

Firstboot is where the answer file becomes permanent configuration. The installer writes the manifest. Firstboot reads it and acts. This is a clean separation: the installer puts files on disk, firstboot activates them. If firstboot fails, the manifest is still there — you can re-run it, inspect it, or fix whatever broke. Nothing is lost. Nothing is ephemeral. The state is on disk, in a file you can read.

What runs automatically

Systemd services and timers that handle the day-to-day without you.

kldload-snapshot.timer

Hourly boot environment snapshot. Always have a known-good OS state.

kldload-srv-snapshot.timer

Snapshot /srv every 15 minutes. Service data is never more than 15 minutes stale.

kldload-package-holds.service

Marks kernel, ZFS, and bootloader packages as held. No surprise DKMS breakage from unattended upgrades.

kldload-webui.service

Web UI on :8080. Auto-starts. Auto-restarts on failure. Watchdog monitored.

kldload-autoinstall.service

Seed disk scanner on live boot. Triggers unattended install or starts web UI.

kldload-firstboot.service

One-time post-install configuration. Runs once, disables itself.

The package holds deserve explanation. The three most dangerous packages on a ZFS-on-root system are the kernel, the ZFS module, and the bootloader. If any of them update out of sync — new kernel without a matching ZFS module, new bootloader that does not know about ZFS — the machine will not boot. kldload holds all three. You upgrade them deliberately, with kupgrade, which snapshots first. You never wake up to a machine that auto-updated itself into a brick.

Golden image workflow

The golden image workflow is: install, configure, seal, export, clone. kldload handles the first three stages automatically when you set KLDLOAD_EXPORT_FORMAT in your answers file or select an export format in the web UI. The result is a cloud-init-ready image in qcow2, vmdk, vhd, ova, or raw format.

The five stages of golden image production

Every golden image goes through the same lifecycle. kldload automates stages 1-4. Stage 5 is your deployment tool (Terraform, Packer, manual cloning).

Stage 1: Install      kldload installs the OS to disk (ZFS on root, all packages)
Stage 2: Configure    Users, SSH keys, networking, WireGuard, eBPF, services
Stage 3: Seal         k_seal_image_for_clone() — clear machine-id, SSH host keys,
                      DHCP leases, cloud-init state. Enable cloud-init datasources.
Stage 4: Export       kexport — export ZFS pool, qemu-img convert to target format
Stage 5: Deploy       Clone/import the image on target infrastructure

What sealing does

The k_seal_image_for_clone() function prepares an installed system for cloning by removing all machine-specific identity. Every clone gets unique identity on first boot via cloud-init and systemd:

machine-id

Truncated to empty. systemd regenerates a unique machine-id on first boot. This is the primary machine identity — DHCP clients use it, journald uses it, dbus uses it.

SSH host keys

Deleted (/etc/ssh/ssh_host_*). sshd-keygen regenerates unique keys on first boot. Without this, every clone has the same host key — a security disaster.

DHCP leases

Removed from NetworkManager, dhclient, and dhcp directories. Stale leases cause IP conflicts when clones boot on the same network.

cloud-init state

Instance directory wiped (/var/lib/cloud/instances). cloud-init re-runs on first boot, applying new hostname, SSH keys, networking from whatever datasource is available.

cloud-init datasources

Configured to accept: NoCloud, ConfigDrive, OpenStack, Azure, GCE, Ec2, None. The image works on any platform without modification.

Install manifest

Removed (/etc/kldload/install-manifest.env). Contains build-time passwords. Must not ship in a template image.

Logs and history

Bash history, wtmp, btmp, lastlog cleared. Clean slate for the first real user.

Exporting with kexport

# Export from the command line (after install completes):
sudo kexport /dev/sda qcow2 /tmp/export/
# Exports the disk as: /tmp/export/kldload-debian-server.qcow2

# Export with custom name:
KEXPORT_NAME="debian13-base-v2" kexport /dev/sda qcow2 /tmp/export/
# Exports as: /tmp/export/debian13-base-v2.qcow2

# Supported formats:
#   qcow2  — KVM/libvirt, Proxmox, OpenStack
#   vmdk   — VMware ESXi/Workstation
#   vhd    — Hyper-V, Azure
#   ova    — VMware/VirtualBox (OVF + vmdk in a tar)
#   raw    — Direct dd, bare metal, ZFS zvol import

Automated export via answers file

# answers.env — build and export a golden image automatically
KLDLOAD_DISTRO=rocky
KLDLOAD_DISK=/dev/sda
KLDLOAD_HOSTNAME=template
KLDLOAD_PROFILE=server
KLDLOAD_USERNAME=admin
KLDLOAD_PASSWORD=admin

# Export as qcow2 and SCP to the image server
KLDLOAD_EXPORT_FORMAT=qcow2
KLDLOAD_EXPORT_SCP_HOST=images.infra.local
KLDLOAD_EXPORT_SCP_USER=root
KLDLOAD_EXPORT_SCP_PATH=/var/lib/libvirt/images/
KLDLOAD_EXPORT_SCP_KEY=/root/.ssh/id_ed25519

# The installer will:
# 1. Install Rocky Linux with ZFS on root
# 2. Configure everything (users, SSH, networking, services)
# 3. Seal the image (clear machine-id, SSH keys, enable cloud-init)
# 4. Export to qcow2
# 5. SCP the qcow2 to images.infra.local:/var/lib/libvirt/images/
# Zero interaction. One USB stick. One boot cycle.

ZFS makes cloning free. A ZFS snapshot is O(1) — milliseconds regardless of image size. A ZFS clone is also O(1) — it shares all blocks with the parent until either writes new data. So you can produce one golden image with kldload, snapshot it, and clone it 1,000 times. Each clone takes less than a second to create and uses zero additional disk space until it diverges from the parent.

This is why kldload uses ZFS for everything, including the host hypervisor. You are not just getting a filesystem. You are getting an image distribution mechanism that is faster and more space-efficient than any purpose-built tool. zfs clone is faster than cp, faster than qemu-img create -b, faster than any linked clone mechanism in any hypervisor product. And the clone is a first-class dataset you can promote, snapshot, replicate, and encrypt independently.

Cloud-init integration

When kldload seals an image for export, it configures cloud-init with multi-datasource support. The image accepts configuration from any cloud platform (AWS, GCE, Azure, OpenStack) or from local datasources (NoCloud for KVM/Proxmox, ConfigDrive for bare metal). First-boot customization — hostname, users, SSH keys, networking, scripts — is handled by cloud-init, not by kldload. kldload builds the base. cloud-init personalizes the clone.

Datasource configuration

# /etc/cloud/cloud.cfg.d/99-kldload-datasource.cfg
# Written by k_seal_image_for_clone() during export
datasource_list: [ NoCloud, ConfigDrive, OpenStack, Azure, GCE, Ec2, None ]

This means a single kldload golden image works everywhere without modification. Deploy it on KVM with a NoCloud seed ISO, on Proxmox with cloud-init drive, on AWS with EC2 metadata, on Azure with Azure datasource. Same image. Different datasource. cloud-init handles it.

NoCloud seed ISO for KVM/Proxmox

# Create a NoCloud seed ISO for a specific VM:
mkdir -p /tmp/seed
cat > /tmp/seed/meta-data << 'EOF'
instance-id: web-prod-01
local-hostname: web-prod-01
EOF

cat > /tmp/seed/user-data << 'EOF'
#cloud-config
hostname: web-prod-01
fqdn: web-prod-01.infra.local
manage_etc_hosts: true

users:
  - name: deploy
    ssh_authorized_keys:
      - ssh-ed25519 AAAA... deploy@ci
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash

packages:
  - nginx
  - certbot

runcmd:
  - systemctl enable --now nginx
  - certbot --nginx -d web-prod-01.infra.local --agree-tos -m ops@infra.local
EOF

# Build the seed ISO
genisoimage -output /tmp/seed.iso -volid cidata -joliet -rock \
  /tmp/seed/meta-data /tmp/seed/user-data

# Attach to VM as a CDROM
virsh attach-disk web-prod-01 /tmp/seed.iso sda \
  --type cdrom --mode readonly --config

Proxmox cloud-init integration

# Import a kldload golden image into Proxmox as a template:
qm create 9000 --name kldload-rocky9-template --memory 4096 --cores 4 \
  --net0 virtio,bridge=vmbr0

# Import the qcow2 as the VM's disk
qm importdisk 9000 /var/lib/images/kldload-rocky9.qcow2 local-zfs

# Attach the imported disk
qm set 9000 --scsihw virtio-scsi-single --scsi0 local-zfs:vm-9000-disk-0

# Add cloud-init drive
qm set 9000 --ide2 local-zfs:cloudinit

# Set boot order and convert to template
qm set 9000 --boot order=scsi0 --serial0 socket --vga serial0
qm template 9000

# Clone the template with cloud-init customization
qm clone 9000 101 --name web-prod-01 --full
qm set 101 --ciuser deploy --sshkeys /root/.ssh/authorized_keys \
  --ipconfig0 ip=10.0.1.50/24,gw=10.0.1.1 --nameserver 10.0.1.1
qm start 101

The combination of kldload golden images + cloud-init gives you the same workflow as AWS AMIs, but on your own hardware. Build the image once (kldload). Store it as a template. Clone it. Customize the clone with cloud-init (hostname, SSH keys, networking, first-boot scripts). Boot. The VM is ready in 15 seconds. No Ansible run. No package downloads. No convergence. Just a clone of a known-good image, personalized by cloud-init.

This is exactly how AWS, GCE, and Azure work internally. They build golden images with Packer, store them as AMIs/images, launch instances from them, and personalize with cloud-init. kldload gives you the same workflow on bare metal, KVM, or Proxmox. The only difference is you own the image pipeline end to end.

Packer integration

Packer does not replace kldload. kldload is the Packer builder source. kldload builds the base image (OS + ZFS + boot environments + kldload tools). Packer takes that base and layers application-specific packages, configuration, and hardening on top. The result is a Packer artifact that is a kldload golden image with your application baked in.

Packer HCL: layer Nginx on a kldload base

# kldload-nginx.pkr.hcl
# Start from a kldload golden image, add Nginx + hardening

packer {
  required_plugins {
    qemu = {
      version = ">= 1.1.0"
      source  = "github.com/hashicorp/qemu"
    }
  }
}

variable "base_image" {
  type    = string
  default = "/var/lib/libvirt/images/kldload-rocky9-server.qcow2"
}

variable "output_dir" {
  type    = string
  default = "/var/lib/libvirt/images/packer-output"
}

source "qemu" "kldload-nginx" {
  # Use the kldload golden image as the base
  disk_image       = true
  iso_url          = var.base_image
  iso_checksum     = "none"
  output_directory = var.output_dir
  vm_name          = "kldload-rocky9-nginx.qcow2"
  format           = "qcow2"

  # VM configuration
  memory      = 4096
  cpus        = 4
  accelerator = "kvm"
  machine_type = "q35"
  disk_size   = "50G"

  # Networking — cloud-init sets up SSH access
  ssh_username     = "admin"
  ssh_password     = "admin"
  ssh_timeout      = "5m"
  shutdown_command  = "sudo shutdown -h now"

  # NoCloud seed ISO for cloud-init
  cd_files = ["cloud-init/meta-data", "cloud-init/user-data"]
  cd_label = "cidata"

  # QEMU flags for ZFS
  qemuargs = [
    ["-cpu", "host"],
    ["-serial", "mon:stdio"],
  ]
}

build {
  sources = ["source.qemu.kldload-nginx"]

  # Wait for cloud-init to finish
  provisioner "shell" {
    inline = ["cloud-init status --wait"]
  }

  # Install and configure Nginx
  provisioner "shell" {
    inline = [
      "sudo dnf install -y nginx certbot python3-certbot-nginx",
      "sudo systemctl enable nginx",
      "sudo firewall-cmd --permanent --add-service=http --add-service=https",
      "sudo firewall-cmd --reload",
    ]
  }

  # Copy Nginx configuration
  provisioner "file" {
    source      = "configs/nginx.conf"
    destination = "/tmp/nginx.conf"
  }

  provisioner "shell" {
    inline = [
      "sudo cp /tmp/nginx.conf /etc/nginx/nginx.conf",
      "sudo nginx -t",
    ]
  }

  # Security hardening
  provisioner "shell" {
    script = "scripts/harden.sh"
  }

  # Re-seal the image for cloning (clean cloud-init state)
  provisioner "shell" {
    inline = [
      "sudo cloud-init clean --logs",
      "sudo truncate -s 0 /etc/machine-id",
      "sudo rm -f /etc/ssh/ssh_host_*",
      "sudo rm -f /var/lib/NetworkManager/*.lease",
      "sudo rm -f /root/.bash_history",
      "sudo sync",
    ]
  }
}

# Build the Packer image:
packer init kldload-nginx.pkr.hcl
packer build kldload-nginx.pkr.hcl

# Result: /var/lib/libvirt/images/packer-output/kldload-rocky9-nginx.qcow2
# This is a kldload golden image with Nginx baked in.
# ZFS on root. Boot environments. Package holds. Snapshot timers.
# Plus Nginx, certbot, firewall rules, and security hardening.
# Ready to clone and deploy.

Packer for multiple distros

# Build the same application image on multiple kldload distro bases:
variable "distros" {
  type = map(string)
  default = {
    rocky9  = "/var/lib/libvirt/images/kldload-rocky9-server.qcow2"
    debian13 = "/var/lib/libvirt/images/kldload-debian13-server.qcow2"
    ubuntu24 = "/var/lib/libvirt/images/kldload-ubuntu24-server.qcow2"
  }
}

# Use dynamic source blocks to build all three in parallel:
# packer build -parallel-builds=3 kldload-multi.pkr.hcl

The key insight: kldload handles the hard part. ZFS on root with boot environments, DKMS kernel modules, EFI boot entries, package holds, snapshot timers, WireGuard configuration — all of that is in the base image before Packer ever touches it. Packer just adds your application on top. The Packer build takes 2 minutes instead of 30 because the base is already built. And if Packer fails, the base image is still intact — you just fix your Packer config and rebuild the application layer.

Compare to using Packer alone: you would need a kickstart/preseed file to automate the OS install, wait for the full install to complete (15-30 minutes), then run your provisioners. With kldload as the base, the OS install is already done. Packer boots a pre-installed image and layers your changes. Faster, simpler, more reliable.

Terraform integration

Terraform deploys kldload golden images to infrastructure. The libvirt provider creates KVM VMs from kldload qcow2 images. The ZFS integration creates VMs from ZFS clones for instant provisioning. Terraform does not build the image — kldload or Packer does that. Terraform stamps it onto target hosts.

Terraform libvirt provider: deploy from qcow2

# main.tf — deploy kldload golden images with Terraform
terraform {
  required_providers {
    libvirt = {
      source  = "dmacvicar/libvirt"
      version = "~> 0.8"
    }
  }
}

provider "libvirt" {
  uri = "qemu+ssh://root@hypervisor.infra.local/system"
}

# Upload the kldload golden image as a base volume
resource "libvirt_volume" "kldload_base" {
  name   = "kldload-rocky9-base.qcow2"
  pool   = "default"
  source = "/var/lib/libvirt/images/kldload-rocky9-server.qcow2"
  format = "qcow2"
}

# Create a cloud-init disk for the VM
resource "libvirt_cloudinit_disk" "web_init" {
  name = "web-prod-01-init.iso"
  pool = "default"

  user_data = <<-EOF
    #cloud-config
    hostname: web-prod-01
    fqdn: web-prod-01.infra.local
    manage_etc_hosts: true
    users:
      - name: deploy
        ssh_authorized_keys:
          - ${file("~/.ssh/id_ed25519.pub")}
        sudo: ALL=(ALL) NOPASSWD:ALL
    packages:
      - nginx
    runcmd:
      - systemctl enable --now nginx
  EOF

  network_config = <<-EOF
    version: 2
    ethernets:
      ens3:
        addresses: [10.0.1.50/24]
        gateway4: 10.0.1.1
        nameservers:
          addresses: [10.0.1.1]
  EOF
}

# Clone the base volume for this VM
resource "libvirt_volume" "web_disk" {
  name           = "web-prod-01.qcow2"
  pool           = "default"
  base_volume_id = libvirt_volume.kldload_base.id
  size           = 107374182400  # 100GB
}

# Create the VM
resource "libvirt_domain" "web_prod_01" {
  name   = "web-prod-01"
  memory = 4096
  vcpu   = 4

  cpu {
    mode = "host-passthrough"
  }

  machine = "q35"

  cloudinit = libvirt_cloudinit_disk.web_init.id

  disk {
    volume_id = libvirt_volume.web_disk.id
  }

  network_interface {
    bridge = "br0"
  }

  console {
    type        = "pty"
    target_type = "serial"
    target_port = "0"
  }

  graphics {
    type        = "vnc"
    listen_type = "address"
  }
}

Terraform with ZFS clones: instant provisioning

# deploy-zfs.tf — use ZFS clones instead of qcow2 copies
# This requires a custom null_resource + local-exec approach
# because the libvirt provider does not natively speak ZFS.

variable "vm_count" {
  default = 5
}

variable "vm_names" {
  type    = list(string)
  default = ["web-01", "web-02", "web-03", "web-04", "web-05"]
}

# Create ZFS clones from the golden image snapshot
resource "null_resource" "zfs_clone" {
  count = var.vm_count

  provisioner "local-exec" {
    command = <<-SCRIPT
      # Snapshot the golden image (idempotent — snapshot already exists is OK)
      zfs snapshot rpool/vms/golden-rocky9@template 2>/dev/null || true

      # Clone the snapshot — instant, zero-copy
      zfs clone rpool/vms/golden-rocky9@template \
        rpool/vms/${var.vm_names[count.index]}

      # Set refreservation=none for thin provisioning
      zfs set refreservation=none rpool/vms/${var.vm_names[count.index]}
    SCRIPT
  }

  provisioner "local-exec" {
    when    = destroy
    command = <<-SCRIPT
      virsh destroy ${var.vm_names[count.index]} 2>/dev/null || true
      virsh undefine ${var.vm_names[count.index]} --nvram 2>/dev/null || true
      zfs destroy rpool/vms/${var.vm_names[count.index]}
    SCRIPT
  }
}

# Define the VMs using virsh
resource "null_resource" "vm_define" {
  count      = var.vm_count
  depends_on = [null_resource.zfs_clone]

  provisioner "local-exec" {
    command = <<-SCRIPT
      virt-install \
        --name ${var.vm_names[count.index]} \
        --ram 4096 --vcpus 4 --cpu host \
        --machine q35 --os-variant rocky9 \
        --disk path=/dev/zvol/rpool/vms/${var.vm_names[count.index]},bus=virtio,cache=none \
        --network bridge=br0,model=virtio \
        --boot uefi --tpm backend.type=emulator,backend.version=2.0,model=tpm-crb \
        --serial pty --console pty \
        --graphics vnc --noautoconsole --import
    SCRIPT
  }
}

# 5 VMs from ZFS clones. Total creation time: ~5 seconds.
# Each VM shares all blocks with the golden image until it diverges.
# Disk usage for 5 x 100GB VMs: ~100GB (not 500GB).

The ZFS clone approach is dramatically faster and more space-efficient than the qcow2 copy approach. The libvirt Terraform provider copies the entire base image for each VM. For a 20GB qcow2, that is 5 copies x 20GB = 100GB of disk I/O and 5 minutes of waiting. With ZFS clones, all 5 VMs are created in under 5 seconds with zero additional disk space. The clones share every block with the parent until they write new data. This is not an optimization. It is a fundamental architectural advantage of using ZFS for VM storage.

Configuration management: Ansible, Salt, Puppet

Configuration management tools work on top of kldload base images. kldload handles the OS layer (ZFS, boot environments, kernel modules, base services). Config management handles the application layer (deploy your app, manage its config, rotate its secrets). This separation is deliberate and important.

Why image + config management beats config management alone

Config management alone means every machine rebuilds itself from scratch. Package mirrors must be reachable. GPG keys must be valid. Templates must render correctly on this specific OS version. With kldload as the base, the OS is already built and tested. Config management only handles 40 tasks (app layer) instead of 400 (full stack). Faster. More reliable. Fewer failure modes.

Image = compiled binary. Config management = runtime configuration. Both, not either/or.

What kldload handles (do NOT manage these with Ansible)

ZFS pools and datasets. Boot environment configuration. Kernel and ZFS DKMS modules. Package holds. Snapshot timers. EFI boot entries. Bootloader configuration. WireGuard base interface. eBPF tool installation. These are in the image. They are tested. They work. Do not let Ansible touch them.

If it is in the kldload image, it is not Ansible's job. Separation of concerns.

Ansible: application layer on kldload base

# inventory.yml
all:
  hosts:
    web-prod-01:
      ansible_host: 10.0.1.50
    web-prod-02:
      ansible_host: 10.0.1.51
    web-prod-03:
      ansible_host: 10.0.1.52
  vars:
    ansible_user: deploy
    ansible_become: true
    # kldload base images have SSH, sudo, and Python pre-installed
    # No bootstrap required. Ansible works immediately.

# playbooks/web-server.yml
---
- name: Configure web servers on kldload base
  hosts: all
  become: true
  tasks:

    # Application packages — NOT base OS packages (kldload handles those)
    - name: Install application packages
      ansible.builtin.dnf:
        name:
          - nginx
          - certbot
          - python3-certbot-nginx
          - redis
          - postgresql-server
        state: present

    # Application configuration
    - name: Deploy Nginx configuration
      ansible.builtin.template:
        src: templates/nginx.conf.j2
        dest: /etc/nginx/nginx.conf
        mode: '0644'
      notify: restart nginx

    # Application service
    - name: Enable and start services
      ansible.builtin.systemd:
        name: "{{ item }}"
        state: started
        enabled: true
      loop:
        - nginx
        - redis
        - postgresql

    # ZFS dataset for application data (uses kldload's ZFS pool)
    - name: Create application data dataset
      community.general.zfs:
        name: rpool/srv/webapp
        state: present
        extra_zfs_properties:
          compression: lz4
          recordsize: 128K
          mountpoint: /srv/webapp

    # Application-level snapshot policy (supplements kldload's base snapshots)
    - name: Create application snapshot timer
      ansible.builtin.copy:
        dest: /etc/systemd/system/webapp-snapshot.timer
        content: |
          [Unit]
          Description=Snapshot webapp data every 5 minutes
          [Timer]
          OnCalendar=*:0/5
          [Install]
          WantedBy=timers.target
      notify: reload systemd

  handlers:
    - name: restart nginx
      ansible.builtin.systemd:
        name: nginx
        state: restarted

    - name: reload systemd
      ansible.builtin.systemd:
        daemon_reload: true

Salt: state files on kldload base

# /srv/salt/web/init.sls
nginx:
  pkg.installed: []
  service.running:
    - enable: True
    - watch:
      - file: /etc/nginx/nginx.conf

/etc/nginx/nginx.conf:
  file.managed:
    - source: salt://web/files/nginx.conf
    - mode: 644

# Use kldload's ZFS pool for application data
rpool/srv/webapp:
  zfs.filesystem_present:
    - properties:
        compression: lz4
        recordsize: 128K
        mountpoint: /srv/webapp

Notice what these playbooks and state files do NOT contain: kernel configuration, ZFS pool creation, bootloader setup, package holds, snapshot timer configuration, boot environment management, WireGuard base setup, eBPF tool installation. All of that is in the kldload golden image. The config management handles application deployment and nothing else. This is the correct boundary.

The result: your Ansible runs take 2 minutes instead of 20. Your Salt highstate applies in 30 seconds instead of 10 minutes. Because the OS layer is already done. It was done when the image was built. It works the same on every machine. The only thing that changes machine to machine is the application layer, and that is all your config management needs to handle.

CI/CD pipelines

Build kldload images in CI. Test them with KVM. Promote to production. The pipeline is: build ISO, install to VM, export golden image, test, promote. kldload's containerized build pipeline runs in any CI system with Docker or Podman.

GitHub Actions: build and test kldload images

# .github/workflows/build-golden-image.yml
name: Build Golden Image
on:
  push:
    branches: [main]
    paths:
      - 'build/**'
      - 'live-build/**'
      - 'profiles/**'

jobs:
  build-iso:
    runs-on: [self-hosted, kvm]  # Needs KVM-capable runner
    steps:
      - uses: actions/checkout@v4

      - name: Build kldload ISO
        run: |
          ./deploy.sh clean
          ./deploy.sh builder-image
          PROFILE=server ./deploy.sh build

      - name: Install to test VM
        run: |
          # Create a test disk
          qemu-img create -f qcow2 /tmp/test-disk.qcow2 50G

          # Boot the ISO with an answers file
          cat > /tmp/answers.env << 'EOF'
          KLDLOAD_DISTRO=rocky
          KLDLOAD_DISK=/dev/vda
          KLDLOAD_HOSTNAME=ci-test
          KLDLOAD_PROFILE=server
          KLDLOAD_USERNAME=ci
          KLDLOAD_PASSWORD=ci
          KLDLOAD_EXPORT_FORMAT=qcow2
          EOF

          # Create seed ISO
          mkdir -p /tmp/seed
          cp /tmp/answers.env /tmp/seed/answers.env
          genisoimage -o /tmp/seed.iso -V KLDLOAD-SEED -J -R /tmp/seed/

          # Run the install in QEMU (headless)
          timeout 1800 qemu-system-x86_64 \
            -machine q35,accel=kvm -cpu host -m 4096 -smp 4 \
            -drive file=/tmp/test-disk.qcow2,format=qcow2,if=virtio \
            -cdrom live-build/output/kldload-free-*.iso \
            -drive file=/tmp/seed.iso,format=raw,if=virtio \
            -nographic -serial mon:stdio \
            -boot d

      - name: Validate golden image
        run: |
          # Boot the installed image and verify
          timeout 120 qemu-system-x86_64 \
            -machine q35,accel=kvm -cpu host -m 4096 -smp 4 \
            -drive file=/tmp/test-disk.qcow2,format=qcow2,if=virtio \
            -nographic -serial mon:stdio \
            -net nic -net user,hostfwd=tcp::2222-:22 &

          sleep 30
          # SSH into the VM and validate
          ssh -o StrictHostKeyChecking=no -p 2222 ci@localhost \
            'zpool status && systemctl is-active kldload-snapshot.timer'

      - name: Upload golden image
        if: github.ref == 'refs/heads/main'
        run: |
          scp /tmp/export/*.qcow2 images@images.infra.local:/var/lib/golden/

GitLab CI: multi-distro image pipeline

# .gitlab-ci.yml
stages:
  - build
  - test
  - promote

variables:
  PROFILE: server

build-iso:
  stage: build
  tags: [kvm, privileged]
  script:
    - ./deploy.sh builder-image
    - PROFILE=${PROFILE} ./deploy.sh build
  artifacts:
    paths:
      - live-build/output/*.iso
    expire_in: 7 days

.test-distro:
  stage: test
  tags: [kvm, privileged]
  script:
    - |
      # Create answers file for this distro
      cat > /tmp/answers.env << EOF
      KLDLOAD_DISTRO=${DISTRO}
      KLDLOAD_DISK=/dev/vda
      KLDLOAD_HOSTNAME=ci-${DISTRO}
      KLDLOAD_PROFILE=${PROFILE}
      KLDLOAD_USERNAME=ci
      KLDLOAD_PASSWORD=ci
      KLDLOAD_EXPORT_FORMAT=qcow2
      EOF

      # Install and export (see build script above)
      ./ci/install-and-export.sh

      # Validate
      ./ci/validate-image.sh /tmp/export/*.qcow2
  artifacts:
    paths:
      - /tmp/export/*.qcow2

test-rocky:
  extends: .test-distro
  variables:
    DISTRO: rocky

test-debian:
  extends: .test-distro
  variables:
    DISTRO: debian

test-ubuntu:
  extends: .test-distro
  variables:
    DISTRO: ubuntu

promote:
  stage: promote
  only:
    - main
  script:
    - |
      for img in /tmp/export/*.qcow2; do
        TIMESTAMP=$(date +%Y%m%d-%H%M%S)
        BASENAME=$(basename "$img" .qcow2)
        scp "$img" images@images.infra.local:/var/lib/golden/${BASENAME}-${TIMESTAMP}.qcow2
        # Update the "latest" symlink
        ssh images@images.infra.local \
          "ln -sf /var/lib/golden/${BASENAME}-${TIMESTAMP}.qcow2 \
                  /var/lib/golden/${BASENAME}-latest.qcow2"
      done

The CI pipeline builds the ISO, installs it to a VM, exports a golden image, validates it, and promotes it to the image server. Every merge to main produces tested, validated golden images for every supported distro. The images are timestamped and symlinked. Terraform always pulls "latest". Rollback is changing a symlink.

This is the same pipeline Netflix uses for their AMI baking. The same pipeline Google uses for their GCE images. The same pipeline every serious infrastructure team uses. Build once, test once, stamp everywhere. kldload makes the "build once" part trivial because it handles ZFS on root, boot environments, and all the hard OS-level configuration that Packer alone cannot do.

Fleet management

Managing 10 machines is SSH and shell scripts. Managing 100 machines is Ansible and cron. Managing 1,000 machines is ZFS replication, WireGuard backplane, and automated snapshot policies. kldload gives you the primitives for all three scales.

ZFS replication for fleet updates

The fastest way to update a fleet of kldload hosts is ZFS send/receive. Build the updated golden image on one machine. Snapshot it. Send the incremental delta to every other machine. The delta contains only the changed blocks — typically a few hundred megabytes even for a major OS update. Each host receives the snapshot, creates a new boot environment from it, and switches to it on next reboot.

# Build machine: create an updated golden image
# (After running kupgrade or installing new packages)
zfs snapshot rpool/ROOT/rocky@v2.0-2026-04-04

# Send the incremental delta to every fleet host
for host in web-{01..50}.infra.local; do
  zfs send -i rpool/ROOT/rocky@v1.0 rpool/ROOT/rocky@v2.0-2026-04-04 \
    | ssh root@${host} "zfs receive rpool/ROOT/rocky-v2"
done

# On each host: set the new boot environment as default
for host in web-{01..50}.infra.local; do
  ssh root@${host} "kbe set-default rocky-v2"
done

# Rolling reboot (one at a time, verify before proceeding)
for host in web-{01..50}.infra.local; do
  echo "Rebooting ${host}..."
  ssh root@${host} "reboot"
  sleep 30
  until ssh root@${host} "kbe current" 2>/dev/null | grep -q rocky-v2; do
    sleep 5
  done
  echo "${host} is on rocky-v2"
done

WireGuard backplane for management

# Every kldload host gets a WireGuard interface for management traffic.
# This is separate from production networking. Management traffic
# is encrypted, authenticated, and travels over a dedicated mesh.

# On the management hub (your jump box / bastion):
cat > /etc/wireguard/wg-mgmt.conf << 'EOF'
[Interface]
PrivateKey = 
Address = 10.100.0.1/24
ListenPort = 51820

# web-01
[Peer]
PublicKey = 
AllowedIPs = 10.100.0.10/32

# web-02
[Peer]
PublicKey = 
AllowedIPs = 10.100.0.11/32

# ... repeat for every fleet host
EOF

# On each fleet host (via answers file or Ansible):
cat > /etc/wireguard/wg-mgmt.conf << 'EOF'
[Interface]
PrivateKey = 
Address = 10.100.0.10/24

[Peer]
PublicKey = 
Endpoint = hub.infra.local:51820
AllowedIPs = 10.100.0.0/24
PersistentKeepalive = 25
EOF

systemctl enable --now wg-quick@wg-mgmt

# Now you can SSH to any fleet host over WireGuard:
ssh root@10.100.0.10   # web-01 via encrypted backplane
ssh root@10.100.0.11   # web-02 via encrypted backplane

# ZFS replication also runs over WireGuard:
zfs send -i rpool/ROOT/rocky@v1.0 rpool/ROOT/rocky@v2.0 \
  | ssh root@10.100.0.10 "zfs receive rpool/ROOT/rocky-v2"

Sanoid/Syncoid for automated backup

# /etc/sanoid/sanoid.conf — snapshot retention policy
# kldload installs sanoid by default on desktop and server profiles

[rpool/ROOT]
  use_template = production
  recursive = yes

[rpool/srv]
  use_template = production
  recursive = yes

[rpool/home]
  use_template = production
  recursive = yes

[template_production]
  frequently = 4          # 4 x 15-minute snapshots (1 hour)
  hourly = 24             # 24 hourly snapshots (1 day)
  daily = 30              # 30 daily snapshots (1 month)
  monthly = 12            # 12 monthly snapshots (1 year)
  yearly = 2              # 2 yearly snapshots
  autosnap = yes
  autoprune = yes

# Syncoid: replicate to backup server automatically
# /etc/cron.d/kldload-syncoid

# Replicate boot environments every hour
0 * * * * root syncoid --recursive --no-privilege-elevation \
  rpool/ROOT root@backup.infra.local:backup/web-01/ROOT

# Replicate service data every 15 minutes
*/15 * * * * root syncoid --recursive --no-privilege-elevation \
  rpool/srv root@backup.infra.local:backup/web-01/srv

# Replicate home directories every hour
0 * * * * root syncoid --recursive --no-privilege-elevation \
  rpool/home root@backup.infra.local:backup/web-01/home

# Syncoid uses incremental sends — only changed blocks transfer.
# First run: full send (minutes to hours depending on data).
# Subsequent runs: seconds to minutes (just the delta).

The fleet management story with kldload is: build one golden image. Stamp it onto every machine (USB, PXE, or ZFS clone). Manage the backplane with WireGuard. Push updates with ZFS send/receive. Back up with Sanoid/Syncoid. Every machine has boot environments, so a bad update is one kbe rollback away from being fixed. Every machine has ZFS snapshots, so a bad config change is one zfs rollback away from being fixed. Every machine has WireGuard, so management traffic is encrypted even on untrusted networks.

This is the same architecture that large enterprises use, except they pay Broadcom or Red Hat six figures a year for the privilege. kldload gives you the same thing with standard Linux tools. The technology is free. The knowledge is on this page.

Scripting patterns

Practical bash patterns for automating kldload operations. These are the building blocks for your own automation scripts. Copy them. Modify them. Chain them together.

Bulk VM creation from a golden image

#!/bin/bash
# create-fleet.sh — create N VMs from a kldload golden image
set -euo pipefail

GOLDEN="rpool/vms/golden-rocky9"
POOL="rpool/vms"
BRIDGE="br0"
RAM=4096
VCPUS=4

# Snapshot the golden image (if not already done)
zfs snapshot "${GOLDEN}@template" 2>/dev/null || true

for i in $(seq 1 "$1"); do
  NAME="web-$(printf '%02d' "$i")"
  echo "Creating ${NAME}..."

  # ZFS clone (instant, zero-copy)
  zfs clone "${GOLDEN}@template" "${POOL}/${NAME}"
  zfs set refreservation=none "${POOL}/${NAME}"

  # Create and start the VM
  virt-install \
    --name "${NAME}" \
    --ram "${RAM}" --vcpus "${VCPUS}" --cpu host \
    --machine q35 --os-variant rocky9 \
    --disk "path=/dev/zvol/${POOL}/${NAME},bus=virtio,cache=none" \
    --network "bridge=${BRIDGE},model=virtio" \
    --boot uefi \
    --tpm backend.type=emulator,backend.version=2.0,model=tpm-crb \
    --serial pty --console pty \
    --graphics vnc --noautoconsole --import

  echo "${NAME} created and started"
done

echo "Done. Created $1 VMs in $(printf '%d' "$SECONDS") seconds."
# Typical output: "Done. Created 20 VMs in 12 seconds."

Snapshot all VMs before maintenance

#!/bin/bash
# snap-all.sh — snapshot every running VM before maintenance
set -euo pipefail

TIMESTAMP=$(date +%Y%m%dT%H%M%S)
TAG="${1:-pre-maintenance}"

for vm in $(virsh list --name); do
  # Find the zvol backing this VM
  ZVOL=$(virsh domblklist "${vm}" | awk '/zvol/ {print $2}' | sed 's|/dev/zvol/||')

  if [[ -n "${ZVOL}" ]]; then
    SNAPNAME="${ZVOL}@${TAG}-${TIMESTAMP}"
    echo "Snapshotting ${SNAPNAME}..."
    zfs snapshot "${SNAPNAME}"
  fi
done

echo "All VMs snapshotted with tag: ${TAG}-${TIMESTAMP}"
echo "To rollback any VM: zfs rollback @${TAG}-${TIMESTAMP}"

Fleet health check

#!/bin/bash
# fleet-health.sh — check health of all kldload hosts
set -euo pipefail

HOSTS_FILE="${1:-/etc/kldload/fleet-hosts.txt}"
# File format: one hostname or IP per line

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[0;33m'
NC='\033[0m'

printf "%-25s %-10s %-15s %-10s %-10s %-15s\n" \
  "HOST" "SSH" "ZFS" "BOOT-ENV" "SNAPS" "LAST-SCRUB"

while IFS= read -r host; do
  [[ -z "${host}" || "${host}" =~ ^# ]] && continue

  # SSH check
  if ssh -o ConnectTimeout=5 -o BatchMode=yes "root@${host}" true 2>/dev/null; then
    SSH="${GREEN}OK${NC}"
  else
    SSH="${RED}FAIL${NC}"
    printf "%-25s %-10b %-15s %-10s %-10s %-15s\n" "${host}" "${SSH}" "-" "-" "-" "-"
    continue
  fi

  # ZFS pool health
  POOL_HEALTH=$(ssh "root@${host}" "zpool status -x rpool 2>/dev/null" || echo "DEGRADED")
  if echo "${POOL_HEALTH}" | grep -q "healthy"; then
    ZFS="${GREEN}ONLINE${NC}"
  else
    ZFS="${RED}DEGRADED${NC}"
  fi

  # Current boot environment
  BOOT_ENV=$(ssh "root@${host}" "kbe current 2>/dev/null" || echo "unknown")

  # Snapshot count
  SNAP_COUNT=$(ssh "root@${host}" "zfs list -t snapshot -H -o name rpool 2>/dev/null | wc -l" || echo "?")

  # Last scrub
  LAST_SCRUB=$(ssh "root@${host}" "zpool status rpool 2>/dev/null | grep 'scan:' | awk '{print \$NF}'" || echo "never")

  printf "%-25s %-10b %-15b %-10s %-10s %-15s\n" \
    "${host}" "${SSH}" "${ZFS}" "${BOOT_ENV}" "${SNAP_COUNT}" "${LAST_SCRUB}"

done < "${HOSTS_FILE}"

Cron jobs for maintenance

# /etc/cron.d/kldload-maintenance
# Standard cron jobs for a kldload production host

# ZFS scrub — weekly on Sunday at 2 AM
0 2 * * 0 root zpool scrub rpool

# Prune old snapshots (sanoid handles this, but belt-and-suspenders)
0 3 * * * root sanoid --cron

# ZFS ARC stats to Prometheus textfile collector
*/5 * * * * root /usr/local/bin/kldload-arc-stats > /var/lib/node_exporter/textfile/zfs-arc.prom

# Boot environment cleanup — remove boot envs older than 30 days
0 4 * * 0 root kbe prune --older-than 30d

# Check for ZFS pool errors and alert
*/10 * * * * root zpool status -x rpool | grep -v "healthy" && \
  curl -s -X POST "https://hooks.slack.com/services/T.../B.../..." \
  -d '{"text":"ZFS pool rpool is degraded on '$(hostname)'"}'

# Verify ZFS checksums match (paranoid mode)
0 5 1 * * root zpool status -v rpool | grep -E "CKSUM|errors" \
  >> /var/log/kldload/zfs-integrity.log

Automated golden image refresh

#!/bin/bash
# refresh-golden.sh — rebuild golden images monthly
# Run this on your build server via cron or CI trigger
set -euo pipefail

BUILD_DIR="/opt/kldload-free"
OUTPUT_DIR="/var/lib/golden"
TIMESTAMP=$(date +%Y%m%d)

cd "${BUILD_DIR}"
git pull origin main

# Rebuild the ISO with latest packages
./deploy.sh builder-image
PROFILE=server ./deploy.sh build

ISO=$(ls -t live-build/output/*.iso | head -1)

# For each distro, install + export a golden image
for DISTRO in rocky debian ubuntu centos; do
  echo "Building golden image for ${DISTRO}..."

  DISK="/tmp/golden-${DISTRO}.qcow2"
  qemu-img create -f qcow2 "${DISK}" 50G

  # Create answers file
  cat > /tmp/answers-${DISTRO}.env << EOF
KLDLOAD_DISTRO=${DISTRO}
KLDLOAD_DISK=/dev/vda
KLDLOAD_HOSTNAME=golden-${DISTRO}
KLDLOAD_PROFILE=server
KLDLOAD_USERNAME=admin
KLDLOAD_PASSWORD=admin
KLDLOAD_EXPORT_FORMAT=qcow2
EOF

  # Create seed ISO
  mkdir -p /tmp/seed-${DISTRO}
  cp /tmp/answers-${DISTRO}.env /tmp/seed-${DISTRO}/answers.env
  genisoimage -o /tmp/seed-${DISTRO}.iso -V KLDLOAD-SEED -J -R /tmp/seed-${DISTRO}/

  # Install (headless QEMU)
  timeout 1800 qemu-system-x86_64 \
    -machine q35,accel=kvm -cpu host -m 4096 -smp 4 \
    -drive file="${DISK}",format=qcow2,if=virtio \
    -cdrom "${ISO}" \
    -drive file=/tmp/seed-${DISTRO}.iso,format=raw,if=virtio \
    -nographic -serial mon:stdio -boot d

  # Copy to output
  OUTNAME="kldload-${DISTRO}-server-${TIMESTAMP}.qcow2"
  cp "${DISK}" "${OUTPUT_DIR}/${OUTNAME}"
  ln -sf "${OUTPUT_DIR}/${OUTNAME}" "${OUTPUT_DIR}/kldload-${DISTRO}-server-latest.qcow2"

  echo "${DISTRO}: ${OUTPUT_DIR}/${OUTNAME}"
  rm -f "${DISK}" /tmp/seed-${DISTRO}.iso /tmp/answers-${DISTRO}.env
  rm -rf /tmp/seed-${DISTRO}
done

echo "All golden images refreshed: ${TIMESTAMP}"

virsh + ZFS: common operations

# ── Common virsh + ZFS operations ──────────────────────

# List all VMs with their ZFS storage usage
for vm in $(virsh list --all --name); do
  ZVOL=$(virsh domblklist "${vm}" 2>/dev/null | awk '/zvol/ {print $2}' | sed 's|/dev/zvol/||')
  if [[ -n "${ZVOL}" ]]; then
    USED=$(zfs get -H -o value used "${ZVOL}" 2>/dev/null || echo "?")
    REFER=$(zfs get -H -o value referenced "${ZVOL}" 2>/dev/null || echo "?")
    RATIO=$(zfs get -H -o value compressratio "${ZVOL}" 2>/dev/null || echo "?")
    STATE=$(virsh domstate "${vm}" 2>/dev/null || echo "?")
    printf "%-20s %-10s %-10s %-10s %-8s\n" "${vm}" "${STATE}" "${USED}" "${REFER}" "${RATIO}"
  fi
done

# Clone a running VM (snapshot + clone + define)
VM="web-prod-01"
CLONE="web-staging-01"
ZVOL="rpool/vms/${VM}"
TAG=$(date +%Y%m%dT%H%M%S)

zfs snapshot "${ZVOL}@clone-${TAG}"
zfs clone "${ZVOL}@clone-${TAG}" "rpool/vms/${CLONE}"
zfs set refreservation=none "rpool/vms/${CLONE}"

# Dump and modify the VM XML
virsh dumpxml "${VM}" > /tmp/${CLONE}.xml
sed -i "s/${VM}/${CLONE}/g" /tmp/${CLONE}.xml
# Generate new UUID and MAC
NEW_UUID=$(uuidgen)
NEW_MAC=$(printf '52:54:00:%02x:%02x:%02x' $((RANDOM%256)) $((RANDOM%256)) $((RANDOM%256)))
sed -i "s|.*|${NEW_UUID}|" /tmp/${CLONE}.xml
sed -i "s|



      
These are not theoretical examples. They are the actual commands I run in production. The bulk VM creation script creates 20 VMs in 12 seconds. The snapshot script snapshots every running VM before maintenance in under 2 seconds. The fleet health check runs in parallel and shows you the entire fleet in one table. The golden image refresh rebuilds every distro image monthly and timestamps them for rollback.

None of this requires a product. No Proxmox. No VMware. No Ansible Tower. No Jenkins. Standard bash scripts, standard Linux commands, standard ZFS operations. The only thing kldload adds is the base image and the tools that make the common operations one-liners instead of ten-liners. Everything on this page works on any kldload-installed machine with the server or desktop profile.


      
      
      

      Disk labeling

      
        Every disk has a passport
        Structured disk labels encode physical location, ZFS pool membership,
        warranty, and RMA information. When a disk fails at 3 AM, the replacement procedure
        is on the label. No spreadsheet. No CMDB lookup. No guessing which disk in which slot.
PHYSICAL LOCATION
  Region: CA-WEST-1  Datacenter: YVR01  Rack: R12-08  Slot: SLOT07

ZFS INFORMATION
  Pool: prd-caw1-db-gold-nvme  VDEV: slot07  Layout: draid2:10d:2c:128s

LIFECYCLE
  Asset ID: UB-DSK-CAW1-88322  Warranty: 2028-02-12
  RMA: https://cdw.ca/rma/S6ZUNX0R123456A
      

      
This is not cosmetic. On a 60-disk JBOD, replacing the wrong disk destroys a RAIDZ vdev. The label tells you: this is slot 7, it is part of this pool, it is in this vdev, and here is the CDW RMA link for the exact model. The person replacing it at 3 AM does not need to know ZFS. They need to read the label, pull the disk, click the link, and plug in the replacement. ZFS resilvers automatically. The label makes the human part foolproof.


      
      
      

      Putting it all together

      
        Here is the complete automation pipeline for a production kldload deployment,
        from nothing to a running fleet:
      

      
        
          1. Build
          PROFILE=server ./deploy.sh build produces the kldload ISO. Containerized. Reproducible. Takes 10-20 minutes.
        
        
          2. Golden image
          Boot the ISO with an answers file. Install completes in 5-15 minutes. kexport produces a qcow2. The image is sealed for cloning: machine-id cleared, SSH keys removed, cloud-init enabled.
        
        
          3. Packer (optional)
          Layer application packages onto the golden image. Packer build takes 2-5 minutes. The result is a kldload base + your application, ready for deployment.
        
        
          4. Deploy
          Terraform clones the golden image via libvirt or ZFS clone. Cloud-init personalizes each instance. VM boots in 15 seconds. Ready for traffic.
        
        
          5. Configure
          Ansible pushes application-layer config (app deploy, secrets rotation, feature flags). 40 tasks, 2 minutes. Not 400 tasks, 20 minutes.
        
        
          6. Observe
          eBPF tools, Prometheus node_exporter, and Grafana dashboards are in the base image. Observability is on from boot. No additional setup.
        
        
          7. Backup
          Sanoid snapshots locally. Syncoid replicates to backup hosts. ZFS send/receive over WireGuard. Encrypted. Incremental. Automatic.
        
        
          8. Update
          Build a new golden image. zfs send the delta to every host. Create a new boot environment. Rolling reboot. Rollback is kbe rollback.
        
      

      
The entire pipeline is standard Linux commands. podman builds the ISO. qemu installs it. zfs clone deploys it. cloud-init personalizes it. ansible configures the app layer. zfs send backs it up. zfs send updates it. kbe rollback fixes it if an update goes wrong. Every tool in the chain is free, open source, and runs on any Linux machine.

There is no vendor. There is no license. There is no subscription. There is no phone-home. There is no nag screen. There is no "enterprise edition" with the features you actually need behind a paywall. The entire stack is yours. The knowledge is on this page. Build your infrastructure. Own it completely.



      
        ← Everything from a browser. Nothing you can't also do from a terminal.
        ZFS on root. Every distro. Automatic. Identical. →