kldload 1.1.0 — Hardware Reality
212 commits since v1.0.4. This release absorbs everything that would have been 1.0.5 (never tagged) plus the entire F44 cutover.
The platform has been re-architected. The web UI is now a single sub-tabbed console behind a Go-native single-port TLS reverse proxy (kldload-proxy) that fronts every service — Grafana, Prometheus, Headlamp, Bob, ttyd-k9s, libvirt console — on one URL with one certificate. eBPF runtime security via Tetragon is wired all the way to Grafana panels. klab — the multi-distro test sandbox — graduated from "1.0.5 promise" to "the lab you build everything on": ZFS instant-clone goldens, OpenZFS test-suite runner across 7 distros, WireGuard mesh, deterministic networking, eight Grafana dashboards.
The live environment cut over from CentOS Stream 9 (kernel 5.14, OpenZFS 2.2) to Fedora 44 (kernel 6.19, OpenZFS 2.4.1, shim 15.8, dnf5). The install path was rewritten end-to-end against real hardware until it stopped fighting the firmware and started routing through it.
curl -L -o /tmp/k.iso https://dl.kldload.com/kldload-free-latest.iso
1. Live environment: CentOS Stream 9 → Fedora 44
The 1.0.x line shipped on CentOS Stream 9 with kernel 5.14 and OpenZFS 2.2. Fine for VMs. On real hardware with newer NVMe controllers, recent NVIDIA cards, USB 3.2 sticks, and Secure Boot firmwares from 2024 onwards, the boot path was getting frayed at the edges. 1.1.0 cuts the live env over to Fedora 44.
| Component | 1.0.x | 1.1.0 |
|---|---|---|
| Kernel | 5.14.0-el9 | 6.19.14-fc44 |
| OpenZFS | 2.2.x | 2.4.1 |
| shim | 15.6 | 15.8 |
| dnf | dnf4 | dnf5 |
| Builder image | centos-stream:9 | fedora:44 |
| Init / sessions | systemd 252 | systemd 256 + dbus-broker |
| NVMe / Wi-Fi 7 / BT 5.4 firmware | partial | full |
| USB-C / Thunderbolt boot | needs rd.retry | no tuning |
| Rescue toolset | basic | gparted, testdisk, ddrescue, fsarchiver |
2. kldload-proxy — single-port TLS reverse proxy
The biggest architectural change since 1.0.0. Pre-1.1.0 the platform was a constellation of services on different ports, each with its own self-signed cert. 1.1.0 introduces a Go-native single-port TLS reverse proxy that fronts the whole stack:
https://<host>/ → kldload-webui https://<host>/grafana/ → Grafana (3000) https://<host>/prometheus/ → Prometheus (9090) https://<host>/headlamp/ → k8s Headlamp (4466) https://<host>/console/ → libvirt VNC console https://<host>/k9s/ → ttyd-k9s embedded terminal wss://<host>/ → Bob chat over WebSocket
- One cert issued by
kldload-cacovers every backend - TLS terminated once at the proxy; backends run plain HTTP on the loopback
- WebSocket-aware — preserves
Connection: Upgrade,Transfer-Encoding: chunked - Concurrent cert issuance serialized + atomic install
- Stage-2 nginx + Headlamp scaffold — second proxy layer for sub-path routing, optional
3. Unified web UI — sub-tabbed workspaces
Single-page app with sub-tabbed workspaces for every resource type, behind the proxy, with Grafana / Headlamp / k9s embedded as iframes that share the same TLS cert. HTTPS out of the box on port 8443, self-signed cert auto-renewed, explicit "Grant microphone access" button next to the mic, install/klab subprocesses run under systemd-run transient units so a crashing install can’t blow up the UI tracking it.
| Workspace | Sub-tabs |
|---|---|
| Kubernetes | Nodes / Pods / Deployments / Services / Events / Apply YAML |
| KVM | Overview / VMs / Networks / Storage / Snapshots / Log |
| klab | Status / Goldens / Operations / eBPF |
| Tests → ZFS Suite | Run / Results / History / Audit / Live log |
| Ansible | Playbook upload / dynamic inventory / run |
| Helm | Chart upload / repo / one-click deploy |
| ZFS | Pools / Datasets / Snapshots / Health |
| Bob AI | Voice / chat / multi-terminal / agentic |
| Observability | Grafana iframe (kiosk=1) |
Plus: encrypted credentials store — AES-256-GCM-encrypted ZFS dataset (rpool/kldload/secrets) on installed systems; live ISO refuses to adopt a target-disk rpool; secret values live as 0600 files inside the encrypted dataset and are never logged.
4. klab — the multi-distro test sandbox
By 1.1.0 klab is the substrate the rest of the platform runs on: hypervisor + multi-distro test runner + WireGuard mesh + observability stack + web UI in one. Run the real OpenZFS test suite across 7 distros from one host: CentOS Stream 9, Debian 13, Ubuntu 24.04, Fedora 44, Rocky 9, RHEL 9, Arch.
kzfs-test — OpenZFS test matrix runner
Each distro is a ZFS-cloned VM brought up from a golden in ~100ms, runs the suite, reports back. Tear down. Spawn another.
- Separate
klab-ztest-<distro>goldens with everyzfs-tests.shprereq - RHEL 9 with
subscription-manager+ credential redaction - Streaming
[PASS]/[FAIL]/[SKIP]output, prefixed per-distro - Parallel cap at 80% of host cores
Auto debug bundles
klab-vm-debug-bundle fires automatically on any FAIL > 0 or watchdog timeout. Generates .tar.gz + paste-ready ISSUE.md with zdb, kstat, D-state stacks, all-task stacks, SMART, packages.
WireGuard mesh + static IPs
Every klab node auto-joins a /24 mesh on first boot. Site VMs get deterministic addresses across rebuilds. Mesh is the transport for Prometheus federation + Tetragon event shipping.
ZFS Lab profile
Test-suite-first install profile. Builds the klab-ztest-* goldens on firstboot, lean K8s (1 CP, 0 workers) alongside, skips the full 4-node cluster. Ollama/Bob enabled.
5. OpenZFS observability stack
The demo you’d bring to OpenZFS maintainers. Click a test failure → see kernel stack, last zio, D-state blocked task, SMART status, ARC state — all correlated on one timeline. Bundle is paste-ready for upstream.
| Exporter | Port | What it provides |
|---|---|---|
zfs_exporter | 9134 | per-pool/dataset metrics (fragmentation, dedup, compression, free/alloc) |
smartctl_exporter | 9633 | SMART attributes per disk |
ebpf_exporter | 9435 | biolatency histograms per block device |
loki | 3100 | single-node log aggregator, 7-day retention |
promtail | 9080 | journald + kernel + zfs-dbgmsg + klab logs → Loki |
arcstats-exporter | textfile | ARC stats from /proc/spl/kstat/zfs/arcstats |
zpool-scrub-exporter | textfile | scrub age/duration/errors |
klab-exporter | textfile | VM list, IPs, golden image age, generations |
Eight Grafana dashboards bundled: zfs-pool-health, scrub-history, compression-trend, disk-health-smart, block-io-latency, kernel-messages, klab-test-matrix, klab-test-debug.
zed → Loki: /etc/zfs/zed.d/all-loki.sh pushes every zpool event (checksum error, resilver, scrub, trim, vdev state change) into Loki with {class, pool} labels.
6. Tetragon — eBPF runtime security
Tetragon now ships in 1.1.0 with full Grafana plumbing.
- Process flow tracing — every exec, fork, exit captured
- Syscall flow — read/write/connect/socket events
- Packet flow attribution — packets tagged with the originating process, visible in the Traffic Map
- kprobe-based observation — kernel-level visibility, no application instrumentation
zfs-execsTracingPolicy auditszfs/zpool/zdb/ztest/zfs-tests.shexecve events- eBPF deep-dive demos in
kube-demo(demos 22-24)
This is the "advanced messaging" surface — kernel-to-application event correlation, not just metric aggregation.
7. Bob AI — eyes, ears, and a voice
Bob graduated from "kldload-aware chat" to "agent that can actually do things on the system." Bob can answer "why is hubble-relay not ready?" by calling k8s_events → k8s_describe → k8s_logs → kernel_dmesg in sequence.
18 read-only diagnostic tools
k8s: get_pods, get_nodes, describe, logs, events. Prom: prom_query. ZFS: zfs_status, zfs_arc_stats. Host: host_vitals, top_procs, ss_sockets. CNI/eBPF: hubble_observe, cilium_*, tetragon_watch, kernel_dmesg, bpftrace_oneliner. Self: doctor_check.
Open-mic voice + WebSocket proxy
Click the mic once, talk forever. TTS self-talk ignored. Bob chat is WebSocket-proxied via kldload-proxy, not direct browser → Ollama (CORS + auth stay server-side).
Tesseract OCR + tool-call rescue
Paste a screenshot of log output, Bob reads the text without needing a 6 GB vision model. Text-emitted tool calls (LLM forgets the JSON envelope) caught and parsed instead of hallucinated.
Post-quantum SSH KEX
Every kldload sshd advertises sntrup761x25519-sha512 + mlkem768x25519-sha256 on top of the classical KEX list. Modern clients no longer print the "store now, decrypt later" advisory.
8. Operator console — 24-key tmux drawer
Every watchable pane is one keystroke away. The drawer reshapes the main content area via a CSS variable so panels, VMs, logs never hide behind it. Every key is a toggle.
| Group | Keys |
|---|---|
| Primary panes | F2 k9s · F3 ZFS test tail · F4 logs · F5 firehose · F6 htop · F7 k8s events · F8 hubble observe · F9 zpool iostat · F10 scratch · F11 tcplife · F12 tcptop |
| Deep-dive (Shift+F) | warnings-only events, dmesg --follow, doctor loop, iotop, kubectl top, cilium drops, zfs iostat -l, kinspect picker, tcpretrans, tcpconnect |
| Trace group (Alt+letter) | execsnoop, opensnoop, biosnoop, killsnoop, iftop, nethogs |
| HUD popups | VMs & DHCP, cluster state, ZFS pools + ARC, WireGuard + routes, disk+mem+cpu, uptime+who+last-logins |
9. Hardware enablement — laptop-ready desktop
~350 MB of hardware enablement, codecs, fonts. The 1.0.4 desktop booted clean but then you had a laptop with no WiFi, no webcam, no printer, and Liberation Sans. Not anymore.
Firmware
linux-firmware (all distros) + Fedora 43+ split workaround (explicit iwlwifi-{dvm,mvm,mld}-firmware, realtek-firmware, atheros-firmware) + fwupd for LVFS BIOS updates.
CPU / Platform
microcode_ctl, thermald, tlp, powertop, brightnessctl, ddcutil, TPM2, FIDO2, smartcard (opensc, pcsc-lite, libfido2).
GPU + video decode
Mesa DRI/Vulkan/VA, vulkan-loader, intel-media-driver, libva. Hardware H.264/H.265 decode — without it modern Firefox/Teams pin the CPU on any video stream.
Audio (PipeWire stack)
alsa-sof-firmware for Intel SOF DSP, full pipewire stack (alsa, gstreamer, libcamera, codec-aptx), wireplumber, gstreamer plugins. Modern ThinkPads/XPSes need this for any audio at all.
Wayland portal
xdg-desktop-portal + gnome/gtk backends. Without these, "Share Screen" in Zoom/Teams/Discord/Firefox/OBS produces a black rectangle.
Cameras + input
libcamera-tools for Intel IPU6 (recent ThinkPads, XPS, Framework). libwacom + Wacom drivers. bluez. Touchpad gestures via xorg-x11-drv-libinput.
Print + scan + cellular + VPN
cups, hplip, sane-backends. ModemManager, NetworkManager-wwan/ppp. NetworkManager-{openvpn,openconnect} with GNOME plugins.
180 fonts, all profiles
Liberation, DejaVu, Noto (sans/serif/mono/CJK/emoji), Cascadia Code, JetBrains Mono, Fira Code, Adobe Source, Inter, Roboto, STIX. Up from ~10 stock. Installed on every profile so SSH terminals get the full set.
10. Install reliability — the hardware-truth pass
Each item below was found by installing onto real hardware and watching it fail. Each is a discrete commit with a comment in the source explaining the failure mode.
Kernel staging (Rocky 9 fix)
Multi-kernel installs (dnf pulling in both kernel-697.el9 and kernel-611.49.1.el9_7) leave one kernel without a usable initramfs. 1.1.0 picks the highest-versioned kernel that has BOTH a vmlinuz AND a matching initramfs.
NVIDIA DKMS race
Chroot-time DKMS fails on conftest macros.h corruption. Installer parks nouveau-blacklist + xorg.conf, firstboot retries on running kernel, restores parked configs, regenerates initramfs.
MOK code-signing leaf cert
Real Secure Boot fix — previous approach reused the kldload-ca root as MOK (CA:TRUE, multi-EKU, Authenticode-incompatible). 1.1.0 generates a dedicated leaf with no v3_ca extensions, single codeSigning EKU.
Multi-sig kernel re-signing
sbsign appends rather than replaces. 1.1.0 strips the existing signature with sbattach --remove first, then signs once with the kldload MOK leaf — single-signature, portable across shim versions.
Hostid propagation
ZFS hostid must match: live env → target /etc/hostid → initramfs → pool stamp. Final force-sync right before dracut runs ensures the initramfs is built with the right value.
dracut --no-hostonly everywhere
Installer-generated initramfs ships drivers for every kernel module, not just what was loaded on the live ISO at install. No more "wrong NIC driver on first boot."
Debian Trixie GDM 48 bug
Trixie's GDM 48 has a systemd-integration bug. Switched to LightDM. Added libpam-gnome-keyring, libpam-systemd, dbus-x11, xdg-desktop-portal-gnome/gtk. nginx user www-data not nginx.
Rocky/RHEL/CentOS desktop GDM crash
Without gnome-session-xsession, /usr/share/xsessions/ is empty and GDM 40 on NVIDIA hardware (Wayland auto-disabled) crashes "no session desktop files installed." Added to the package list.
darksite chroot file:// resolution
dnf --installroot resolves file:// against installroot. Fixed by bind-mounting /root/darksite to /run/kldload-darksite — same path inside and outside chroot.
F44 bootupd model
shim/grub2 RPMs install to /usr/lib/efi/... on F44, not /boot/efi/EFI/<distro>/. Installer extended search paths and stages files to ESP from /usr/lib/efi/.
chroot command -v trap
command is a bash builtin, NOT on PATH inside chroot. Several call sites silently treated "tool exists" as "tool doesn’t exist." Fixed to direct-path tests.
Boot menu — rootdelay + Compatibility
Promoted compat cmdline to default, added rootdelay + rd.retry for slow USBs, added a Compatibility entry for HP / 2-second USB carriers.
11. Secure Boot architecture
The shipped chain is the smallest possible Secure Boot trust footprint for a ZFS-on-root distro. The only thing we sign is zfs.ko and the staged vmlinuz (with the kldload MOK key). Everything else is already signed by trusted vendor certs.
firmware
└ shim.efi (Microsoft-signed, distro-shipped)
└ grubx64.efi (RH/CentOS/Rocky/Fedora distro-signed,
already trusted by shim’s vendor cert)
└ vmlinuz (distro-signed by the same vendor cert,
re-signed with kldload MOK leaf in 1.1.0)
└ zfs.ko (MOK-signed via DKMS sign_tool)
- Per-install MOK leaf cert generated fresh per install, single EKU, no CA reuse
kldload-ca initcreates the per-install CA (separate from the MOK)kldload-grub-refresh.pathauto-refreshes/boot/efi/EFI/BOOT/grubx64.efion distro upgradekldload-secure-bootCLI —enable | disable | status | reenroll- Unified trust root — same kldload CA roots TLS certs (browser trust) + MOK signing (kernel module trust)
12. Kubernetes / kube-cluster
- Pre-flight libvirt default network before
virt-install— fixes bootstrap on fresh hosts wherevirbr0has no carrier yet - kubeconfig + kubectl published to host BEFORE step 7 — you can
kubectlimmediately - Atomic download of cloud qcow2 — fixes a race where parallel invocations corrupted the source image
- AI / Ollama enabled by default for k8s/kvm/zfslab tiles
metrics-serverauto-installs — powerskubectl top+ live CPU/Mem overlays- Cilium-operator scrape only fires on the node it’s actually running on
kube-demo— 21+ interactive demos, eBPF deep-dives at 22-24
13. kspawn — ZFS-native multi-runtime cluster spawner
New top-level CLI: kspawn spawn --name web --distro debian --count 5 clones 5 VMs from the klab debian golden in parallel, injects cloud-init (hostname + SSH key per node), boots them, writes a JSON manifest at /var/lib/kspawn/clusters/<name>/manifest.json. Subcommands: spawn / list / status / ssh / destroy. Because all state is derivable from the manifest + ZFS clones, there is nothing to "upgrade" — destroy and re-spawn.
14. Known issues
| Issue | Status | Workaround |
|---|---|---|
Secure Boot direct-kernel boot — bad shim signature on Rocky 9 | tracked | Boot with SB off |
| ZFSBootMenu under SB ON | upstream-unsupported | Use direct kernel under SB; ZBM under SB-off |
kbe activate doesn’t auto-rewrite grub.cfg under SB direct-kernel | architectural | Manual edit; tracked for 1.2 |
Live ISO can’t SB-boot (its BOOTX64.EFI is raw GRUB) | tracked | SB OFF for installer USB; installed system enables SB |
| Rocky kernel-697 ships alongside kernel-611 (no zfs.ko on 697) | cosmetic | Installer correctly stages 611 |
Profile gate != "core" too wide — bleed-through | tracked | Per-tile gating in 1.1.x |
| RHEL credentials don’t persist past install | manual | Re-register on first boot |
| Fedora installer uses metalink instead of darksite | offline-not-pure | Works; not air-gapped |
| Ubuntu golden uses Canonical's vendored zfs-2.2.2 (58% pass) | tracked | PPA switch in 1.1.x |
| Smoke matrix not re-run for 1.1.0 final | pending | sudo ./tests/lifecycle-matrix.sh |
15. Upgrade notes
Major version. Live env, kernel, ZFS, shim, dnf are all stepping forward. No in-place upgrade from 1.0.x — do a fresh install onto a different ZFS pool, or wipe and reinstall on the same disk. Restore state from ZFS snapshots if you had them.
# Get it curl -L -o /tmp/kldload.iso https://dl.kldload.com/kldload-free-latest.iso # Burn it (USB at /dev/sda — verify with lsblk first) sudo bash -c 'wipefs -af /dev/sda && \ dd if=/tmp/kldload.iso of=/dev/sda bs=4M oflag=direct \ status=progress conv=fsync && sync && eject /dev/sda'
Full commit-by-commit changelog: RELEASE_NOTES-1.1.0.md · git log --oneline v1.0.4..v1.1.0