kldload — your Linux construction kit

Build Your Own · Expert

Custom Postinstallers — from zero to 100 microservices in seconds.

This is the advanced guide. We're going to build a complete deployment pipeline that starts with a blank disk and ends with a running Kubernetes cluster — or 100 Firecracker microVMs — or whatever you want. The secret is postinstall.sh: a hook that runs after kldload finishes installing the base system. Everything after that is yours.

This isn't theory. This is a real production pattern used to deploy 15-node Kubernetes clusters with etcd, load balancers, control planes, workers, monitoring, and GitOps — all from sealed ISOs that work without internet. You can build the same thing.

What is a postinstaller?

The hook point

kldload installs the base system: kernel, ZFS, bootloader, tools. When it's done, it looks for /root/darksite/postinstall.sh on the target system. If it exists, it runs it. That's your entry point. Everything you put in that script runs with root privileges on a freshly-installed system.

#!/bin/bash
# postinstall.sh — runs after kldload finishes the base install
# You have: root access, ZFS on root, network (if configured), all base packages
# You do: whatever you want

echo "My custom postinstaller is running!"
dnf install -y nginx
systemctl enable --now nginx
echo "<h1>Built by kldload</h1>" > /usr/share/nginx/html/index.html

# Signal completion
touch /root/.postinstall_done
poweroff

postinstall.sh is the recipe card. The base system is the kitchen. You decide what gets cooked.

Understanding chroots

What a chroot actually is

A chroot ("change root") makes a directory look like the root filesystem to a process. When kldload installs to /target, it creates a complete Linux system there. Then it does chroot /target to run commands inside that system as if it were booted.

This is how kldload installs packages, builds DKMS modules, and rebuilds the initramfs without ever booting the target system. It's the same technique every Linux installer uses — from Debian's debootstrap to Arch's pacstrap to Red Hat's anaconda.

# The installer creates a complete system at /target
debootstrap trixie /target https://deb.debian.org/debian

# Mount system filesystems so chroot commands work
mount --bind /dev  /target/dev
mount --bind /proc /target/proc
mount --bind /sys  /target/sys

# Now run commands "inside" the target system
chroot /target apt-get install -y nginx
chroot /target systemctl enable nginx

# When done, unmount and the target is a complete, bootable system
umount /target/sys /target/proc /target/dev

A chroot is a portal into the target system. You reach through it to install packages, configure services, and set up users — all before the system ever boots.

The darksite pattern: baking everything in

What is a darksite?

A "darksite" is an air-gapped deployment — no internet, no upstream repos, no cloud APIs. Everything the system needs must be baked into the ISO or carried on the USB drive. This includes:

APT/DNF packages — a complete local repository snapshot
Container images — OCI tarballs loaded into containerd/Docker on first boot
Ansible playbooks — the entire orchestration tree
Helm charts — bundled for offline Kubernetes deployments
TLS certificates — pre-generated PKI for etcd, API server, etc.
WireGuard keys — hub keypairs for mesh networking
Configuration files — per-node or per-role configs baked in

A darksite ISO is a shipping container. Everything needed for the destination is packed inside. Nothing is downloaded at deploy time. Nothing phones home.

Payload directory structure

payload/darksite/
├── postinstall.sh              # Entry point — runs on first boot
├── apply.py                    # Cluster convergence orchestrator
├── cluster-seed/
│   └── peers.json              # Node inventory (WG IPs, roles)
├── ansible/
│   ├── ansible.cfg             # Ansible configuration
│   ├── site.yml                # Main playbook (imports all roles)
│   ├── group_vars/             # Per-group configuration
│   ├── host_vars/              # Per-host configuration
│   ├── roles/                  # Role implementations
│   │   ├── etcd_cluster/       # etcd setup + PKI
│   │   ├── k8s_pkgs/           # kubelet, kubeadm, kubectl
│   │   ├── kubeadm_init/       # Control plane initialization
│   │   ├── kubeadm_join_cp/    # Join additional control planes
│   │   ├── kubeadm_join_worker/# Join workers
│   │   ├── lb_haproxy/         # Load balancer config
│   │   ├── prometheus_config/  # Monitoring
│   │   ├── helm/               # Helm 3 bootstrap
│   │   └── ingress_nginx/      # Ingress controller
│   └── artifacts/
│       ├── etcd-pki/           # Pre-generated etcd certificates
│       ├── join_cp.sh          # Control plane join script
│       └── join_worker.sh      # Worker join script
├── helm/
│   └── bootstrap.sh            # Helm chart installation
└── systemd/
    ├── darksite-apply.service   # Runs apply.py on first boot
    └── darksite-wg-reflector.*  # WireGuard peer sync

This entire tree gets embedded in the ISO. On first boot, postinstall.sh unpacks it and the system bootstraps itself from the payload. No internet. No external dependencies.

Example: Building a Kubernetes cluster from postinstall.sh

Here's the real-world pattern. One master node, multiple workers. Each gets a customized ISO with a role-specific postinstaller. The master opens a WireGuard enrollment window, workers connect and register, then Ansible converges the cluster.

Step 1: The master postinstaller

postinstall-master.sh

#!/bin/bash
set -euo pipefail

# ── Phase 1: Base packages ──
dnf install -y python3 wireguard-tools nftables salt-master chrony

# ── Phase 2: WireGuard hub (star topology) ──
# Master is the center. Every worker connects back to us.
wg genkey | tee /etc/wireguard/wg1.key | wg pubkey > /etc/wireguard/wg1.pub

cat > /etc/wireguard/wg1.conf <<EOF
[Interface]
Address = 10.78.0.1/16
ListenPort = 51821
PrivateKey = $(cat /etc/wireguard/wg1.key)
# Peers added dynamically during enrollment
EOF

systemctl enable --now wg-quick@wg1

# ── Phase 3: Export hub metadata ──
# Workers fetch this to know how to reach us
cat > /srv/wg/hub.env <<EOF
HUB_LAN=$(hostname -I | awk '{print $1}')
WG1_PUB=$(cat /etc/wireguard/wg1.pub)
WG1_PORT=51821
WG1_NET=10.78.0.0/16
EOF

# ── Phase 4: Enrollment window ──
# Workers can add themselves as peers during this window
touch /srv/wg/ENROLL_ENABLED

# ── Phase 5: Salt master ──
systemctl enable --now salt-master

# ── Phase 6: Wait for workers, then converge ──
# This runs as a systemd service (darksite-apply.service)
# apply.py waits for all minions, then runs Ansible

touch /root/.postinstall_done

The master is the hub of a star. It generates keys, opens the door for workers, and orchestrates the cluster once everyone has arrived.

Step 2: The worker postinstaller

postinstall-worker.sh

#!/bin/bash
set -euo pipefail

# ── Phase 1: Base packages ──
dnf install -y wireguard-tools salt-minion prometheus-node-exporter

# ── Phase 2: Read hub metadata (baked into ISO) ──
source /root/darksite/cluster-seed/hub.env

# ── Phase 3: WireGuard spoke (connect back to hub) ──
wg genkey | tee /etc/wireguard/wg1.key | wg pubkey > /etc/wireguard/wg1.pub

cat > /etc/wireguard/wg1.conf <<EOF
[Interface]
Address = ${MY_WG1_IP}/32
PrivateKey = $(cat /etc/wireguard/wg1.key)

[Peer]
PublicKey = ${WG1_PUB}
Endpoint = ${HUB_LAN}:${WG1_PORT}
AllowedIPs = ${WG1_NET}
PersistentKeepalive = 25
EOF

systemctl enable --now wg-quick@wg1

# ── Phase 4: Auto-enroll with hub ──
# SSH to master and register our WireGuard public key
ssh -o StrictHostKeyChecking=no -i /root/darksite/enroll_key \
    root@${HUB_LAN} "wg-add-peer $(cat /etc/wireguard/wg1.pub) ${MY_WG1_IP} wg1"

# ── Phase 5: Salt minion (points to master) ──
echo "master: ${HUB_LAN}" > /etc/salt/minion.d/master.conf
systemctl enable --now salt-minion

touch /root/.postinstall_done

The worker connects back to the hub, registers itself, and waits for instructions. Zero manual configuration.

Step 3: The convergence orchestrator

apply.py — runs on the master after all workers are enrolled

# Simplified convergence flow:

# 1. Wait for all Salt minions to check in
while salt_minion_count < expected_count:
    salt "*" test.ping
    sleep 3

# 2. Push SSH keys for Ansible
salt "*" cmd.run "mkdir -p /home/ansible/.ssh"
# ... distribute ansible user's pubkey to all nodes

# 3. Run Ansible playbook
ansible-playbook /srv/ansible/site.yml

# Ansible runs in order:
#   00_preflight.yml      → verify connectivity
#   02_common.yml         → kernel tuning, base packages
#   03_containerd.yml     → container runtime
#   04_k8s_packages.yml   → kubelet, kubeadm, kubectl
#   05_etcd_pki.yml       → distribute pre-generated certs
#   06_etcd_cluster.yml   → bootstrap 3-node etcd
#   07_loadbalancers.yml  → HAProxy for API server VIP
#   08_cp_init.yml        → kubeadm init (first control plane)
#   09_cp_join.yml        → join remaining control planes
#   10_worker_join.yml    → join workers
#   11_cilium.yml         → CNI networking
#   12_monitoring.yml     → Prometheus + Grafana
#   13_helm.yml           → Helm 3
#   99_verify.yml         → kubectl get nodes

The entire Ansible tree is baked into the ISO. No git clone. No downloading roles. The payload directory contains every playbook, role, template, and certificate. The cluster converges from local files.

The two-poweroff pattern

Why the system powers off twice

Boot 1: ISO installer
  ├── Preseed-driven install (no prompts)
  ├── Late command copies darksite payload to /target
  ├── Enables bootstrap.service
  └── POWEROFF ← first poweroff (installer done)

Boot 2: From disk (ISO ejected)
  ├── bootstrap.service runs postinstall.sh
  ├── Packages installed, WireGuard configured
  ├── Salt minion registered
  └── POWEROFF ← second poweroff (postinstall done)

Boot 3: Production
  ├── All services running
  ├── WireGuard mesh active
  ├── Salt connected to master
  └── Ready for Ansible convergence

This separation is deliberate. The first poweroff proves the base install worked. The second poweroff proves the postinstaller worked. The third boot is production. Each phase is independently verifiable. If any phase fails, you know exactly where.

Assembly line: Station 1 builds the frame. Station 2 installs the engine. Station 3 starts the car. Each station signs off before the next one starts.

Snapshot, clone, and replicate

The golden image pattern

Once you have a working system (post-postinstall), snapshot it. That snapshot becomes your golden image. Clone it for every new node. Each clone takes milliseconds and uses zero extra space.

# After postinstall completes, snapshot the golden state
zfs snapshot rpool/ROOT/kldload-node@golden

# Clone for each new node (instant, zero space)
zfs clone rpool/ROOT/kldload-node@golden rpool/ROOT/worker-01
zfs clone rpool/ROOT/kldload-node@golden rpool/ROOT/worker-02
zfs clone rpool/ROOT/kldload-node@golden rpool/ROOT/worker-03

# Or replicate to another machine
zfs send rpool/ROOT/kldload-node@golden | ssh kvm-host zfs recv tank/golden/worker

# Create a ZVOL from the golden image for KVM
zfs send rpool/ROOT/kldload-node@golden | zfs recv rpool/vms/worker-01
# Boot as a VM — instant deployment

Bake one cake. Cut as many slices as you need. Each slice is a running server.

Firecracker microVMs: 100 instances in seconds

From ZFS clone to microVM in 125ms

Firecracker is Amazon's microVM hypervisor. It boots a Linux kernel in 125 milliseconds with ~5MB of memory overhead. Combined with ZFS clones, you can spray hundreds of isolated microVMs across a machine in seconds.

#!/bin/bash
# spray-microvms.sh — deploy 100 microVMs from a golden image

GOLDEN="rpool/vms/golden-microvm"
KERNEL="/boot/vmlinuz-$(uname -r)"
COUNT=100

# Snapshot the golden image once
zfs snapshot "${GOLDEN}@base"

for i in $(seq 1 $COUNT); do
    VM_NAME="micro-$(printf '%03d' $i)"

    # Clone the golden image (instant, zero space)
    zfs clone "${GOLDEN}@base" "rpool/vms/${VM_NAME}"

    # Get the zvol device path
    ROOTFS="/dev/zvol/rpool/vms/${VM_NAME}"

    # Launch Firecracker microVM
    firecracker --no-api \
        --boot-source "kernel_image_path=${KERNEL}" \
        --boot-source "boot_args=console=ttyS0 root=/dev/vda ro" \
        --drives "drive_id=rootfs,path_on_host=${ROOTFS},is_root_device=true" \
        --machine-config "vcpu_count=1,mem_size_mib=128" \
        --network-interfaces "iface_id=eth0,guest_mac=AA:FC:00:00:${i}" \
        &

    echo "Launched ${VM_NAME}"
done

echo "Deployed ${COUNT} microVMs from golden image"
# Total time: ~15 seconds for 100 VMs
# Total extra disk: ~0 until VMs diverge (CoW)

100 shipping containers, each with its own isolated Linux kernel, deployed in the time it takes to pour a coffee. Each one uses zero extra disk until it writes something unique. That's the power of ZFS clones + Firecracker.

Hardware as a Service: the cron job pattern

Sell your hardware by the hour

# crontab -e

# Customer A: 6am-2pm
0  6 * * * /usr/local/bin/deploy-customer-a.sh
0 14 * * * /usr/local/bin/teardown-customer-a.sh

# Customer B: 2pm-10pm
0 14 * * * /usr/local/bin/deploy-customer-b.sh
0 22 * * * /usr/local/bin/teardown-customer-b.sh

# Nightly maintenance: 10pm-6am
0 22 * * * /usr/local/bin/scrub-and-snapshot.sh

#!/bin/bash
# deploy-customer-a.sh

# Clone from golden image (instant)
for i in $(seq 1 20); do
    zfs clone rpool/golden/customer-a@latest rpool/vms/cust-a-$(printf '%02d' $i)
done

# Boot all VMs
for vm in /dev/zvol/rpool/vms/cust-a-*; do
    virsh start "$(basename $vm)" &
done

echo "Customer A environment live — 20 VMs deployed"

#!/bin/bash
# teardown-customer-a.sh

# Snapshot for billing/audit
zfs snapshot -r rpool/vms/cust-a@teardown-$(date +%Y%m%d-%H%M)

# Destroy all VMs (instant — CoW means no disk cleanup)
for vm in $(virsh list --name | grep cust-a); do
    virsh destroy "$vm" 2>/dev/null
    virsh undefine "$vm" --nvram 2>/dev/null
done

# Destroy clones (instant — only divergent blocks freed)
for ds in $(zfs list -H -o name | grep rpool/vms/cust-a-); do
    zfs destroy "$ds"
done

echo "Customer A environment torn down"

Same hardware, three customers, three shifts. Each gets a fresh environment. Snapshots for billing. Zero residual data between tenants. This is what cloud providers do — you just cut out the middleman.

Infrastructure as Code — baked in, not bolted on

The payload IS the infrastructure

Traditional IaC pulls code from git at deploy time. Darksite IaC bakes it into the artifact. The ISO contains the complete Ansible tree:

Roles — etcd, containerd, kubeadm, haproxy, prometheus, helm, ingress
Group vars — per-role configuration (k8s versions, network CIDRs, feature flags)
Host vars — per-node IPs, roles, WireGuard addresses
Artifacts — pre-generated PKI certificates, join scripts, helm charts
Templates — Jinja2 templates for haproxy.cfg, prometheus.yml, etcd.conf, kubeadm-config

Nothing is downloaded at deploy time. The playbook runs against local files. The certificates are pre-generated. The container images are pre-pulled. The artifact is the deployment.

The drop-off points

A postinstaller has natural "drop-off points" where you can stop and use the system as-is, or continue adding more layers. Each point is a valid, working system.

Level 0

Base kldload install. ZFS on root, boot environments, tools. You're here when kldload finishes. Everything below is postinstall.sh territory.

Level 1

+ packages. Install your application stack. nginx, postgres, redis, whatever. dnf install in postinstall.sh. Snapshot. Done.

Level 2

+ WireGuard. Encrypted mesh networking. Nodes can talk to each other securely. Hub-and-spoke topology. Enrollment window for adding new nodes.

Level 3

+ Salt/Ansible. Fleet orchestration. Push configs to all nodes. Run highstate. Converge the cluster. This is where single-node becomes multi-node.

Level 4

+ Kubernetes. etcd cluster, control planes, workers, CNI, ingress. A complete container platform deployed from postinstall.sh and Ansible roles baked into the ISO.

Level 5

+ Firecracker. microVMs sprayed from ZFS golden images. 125ms cold-start. 100 instances in seconds. Lambda-style workloads on bare metal. This is the ceiling — and it's not even close to the limit.

Every level above is just bash scripts and package installs. There's no proprietary orchestrator. No vendor SDK. No magic binary. postinstall.sh is bash. Ansible roles are YAML. Kubernetes is kubeadm init. Firecracker is a single binary. ZFS clones are one command. You can audit every step. You can modify every step. You can build every step yourself. That's the point.

← Build Overview ↑ Top