| pick your distro, get ZFS on root
kldload — your platform, your way, free
Source

First-Class Infrastructure

Pick any distro. Get ZFS on root, an encrypted mesh as the network, kernel-level observability, and GPU acceleration — all running before the first user ever logs in.

That sentence is doing a lot of work. The rest of this page unpacks what it actually means and why it matters.


1. The big idea — the kernel is the foundation, not the runtime

Most servers are built backwards. You start with an empty operating system. Then you SSH in and add things — a storage layer, a VPN, monitoring agents, GPU drivers. Each piece installs on top of a working server, and each piece can break the server.

kldload flips this around. The foundation is what’s in the kernel — not what gets installed on top of it. Four things, baked into the same kernel:

OpenZFS — your storage primitives. Snapshots, copies, checksums, instant clones. The OS itself is a ZFS dataset.

WireGuard — your network. Encrypted from the kernel up, available before userspace exists. Services bind to encrypted interfaces and disappear from the internet.

eBPF + bcc — your observability. The kernel tells you what’s happening, directly, at kernel speed. No agents, no log parsing.

NVIDIA + CUDA — your GPU acceleration. AI, transcoding, container workloads — ready at boot.

These aren’t kldload inventions. They’re open-source projects, each maintained by its own community. What’s unusual is having all four in the same kernel, working together, signed for Secure Boot, surviving updates. Most distros ship none of them. A few ship one or two as add-on modules that break every kernel update. Nobody ships all four working together by default.

That’s the entire kldload contribution: the boring engineering work of making them coexist, so you don’t have to.

Every technology here is open source and free. The same architecture the biggest enterprises run — without the wrapper they charge you for.

2. Two different layers — foundation in the kernel, flexibility in userspace

It’s tempting to frame this as “kldload vs. Ansible/Terraform/the IaC world.” That’s the wrong axis. They live at different layers and solve different problems — and they pair exceptionally well.

Here’s the split:

Kernel-baked foundation (kldload)Userland tooling (Ansible, Terraform, Helm)
What it ownsStorage primitives, encrypted network, kernel observability, GPU driversApp configuration, secrets, deployment orchestration, CI/CD, business logic
When it’s builtImage build time — once, signed, identical everywhereDeploy time — per-environment, per-cluster, per-release
What it gives youA booted server that is already on the encrypted mesh, already observable, already a ZFS datasetA flexible way to describe what should be running and converge to that
What it can’t doChange without rebuilding the imageProvide kernel-level features (a userspace WireGuard, for example, is 5-10× slower and not available at boot)
Best atIdentical foundation everywhere — no drift in the substratePer-app config that should differ between dev / staging / prod

The catch with userspace-only: if you only have userland tooling, you have a chicken-and-egg problem at the foundation. Before Ansible can run, you need a reachable host. Before that host can be reached, you need a working network. Before WireGuard is up, you need to install and start it. Before that, you need an OS with the right kernel modules. Every step needs the previous one, and every step can fail. During the 20-40 minutes the host is up but the platform isn’t, it’s a half-built attack surface.

The catch with kernel-baked only: if you only had a fixed image, you couldn’t change anything per environment. The image is the same in dev, staging, prod — but your app config, your secrets, your scale, your DNS records have to differ. Userspace tooling is how you express those differences.

How they pair: kldload boots a host that is already authenticated on the mesh, already storing on ZFS, already observable. Your IaC tool of choice reaches that host on second zero (no SSH-key delivery dance, no “wait for cloud-init”) and runs the app-layer deploy. The foundation is fixed by the image; the application stack is fluid via IaC. Different concerns, different layers, both essential, no conflict.

If you already use Ansible / Terraform / Helm / Argo / Flux, none of that goes away with kldload. What changes is what they’re standing on: a substrate that’s already correct, instead of one they have to make correct.

3. Different boot & deployment options

Because the foundation is in the kernel and the artifact is self-contained, you get deployment options that traditional install-on-blank-OS workflows can’t reach:

Boot from USB

The image fits on a USB stick. Plug it into any x86_64 box — bare metal, laptop, server, or cloud VM with USB pass-through — and you have a working node in minutes. Same image, same identity, same mesh keys, anywhere.

Boot from disk after install

Traditional install path. ZFS on root, all the kernel modules compiled in, fully encrypted mesh + observability live from first boot.

PXE / netboot

Same image, served over the network. A new bare-metal node racks itself into the platform by booting — identity sealed in TPM, mesh keys derived from hardware.

Cloud VM

Upload the qcow2 / raw / VHD to any hypervisor or cloud provider. EC2, Azure, GCP, OVH, your own KVM — same image, same behavior. The mesh ties them together regardless of provider.

Live ISO (ephemeral)

Boot without installing. The platform is fully live in RAM — same encrypted mesh, same observability, same ZFS — for one-shot tasks, recovery, or air-gapped diagnostics.

Container image

The platform pieces (eBPF programs, ZFS userland, observability stack) also ship as OCI images for hybrid workloads on managed Kubernetes (EKS, GKE, AKS) where you don’t control the kernel.

In every case the foundation is the same — the encrypted mesh comes up at boot, observability attaches before userspace, storage is ZFS. The only thing that changes is how the artifact got to the hardware.


4. What this changes in practice

Traditional waykldload
First useful packet20-40 minutes after the server bootsAt boot — on the encrypted mesh
Configuration driftServers diverge over time as people SSH inThe image is the source of truth — nothing drifts
Rolling back a bad changeUndo 47 Ansible tasks; hope they all reverse cleanlyBoot the previous image. 10 seconds.
Disaster recoveryProvision a new control plane first, then everything elseUSB stick + fresh hardware = recovered
Adding capacityNew server, new install, new automation runClone the image. Instant. Identical.
Attack surface at bootHalf-configured SSH + half-installed softwareOne UDP port. Nothing else.

It’s not that the components are different. ZFS, WireGuard, eBPF — they exist for everyone. The difference is when they’re available. “At boot, identically, on every server” is the line traditional automation cannot cross — because traditional automation always needs a server to deploy onto first.


5. What stops being a problem

When the foundation is solid, a whole category of operational work evaporates. Some examples:

Backups

Snapshots happen automatically. Replication happens incrementally over the encrypted mesh. No agent to install, no schedule to manage, no retention policy to babysit.

Bad upgrades

Snapshot the OS before any change. If something breaks, select the old snapshot at the boot menu. Rollback time: 10 seconds.

Staging environments

Clone production. Zero disk cost — the clone shares blocks with the original until it diverges. Test on a byte-identical copy without paying for it.

VMs being heavy

On ZFS, a VM is a clone of a golden image. Creating one takes two seconds. VMs become as cheap as containers.

Network trust

Services bind to the encrypted mesh, not the public NIC. The LAN is untrusted by design. Anyone listening on the wire sees encrypted UDP and nothing else.

Monitoring blind spots

eBPF watches every syscall, every flow, every disk I/O — from the kernel. No agents to deploy. No log parsing. No “we didn’t have visibility into that.”

This isn’t exhaustive. It’s a sampling of what falls away when you fix the foundation.


6. What a real kldload server looks like

Strip away the framing and a production kldload server is mundane in the best way:

  • The root filesystem is a ZFS dataset. Every service has its own dataset, tuned to its workload — databases on small blocks, logs with aggressive compression, video assets on big blocks.
  • The public NIC accepts encrypted UDP on one port. Nothing else routes.
  • All services — databases, monitoring, SSH, replication — bind to addresses on the encrypted mesh. The internet can’t see them.
  • Snapshots happen on a timer. Replication to the DR site happens automatically over the mesh.
  • Prometheus and Grafana watch everything in real time, pulling kernel-level metrics through eBPF.

Nothing in that list is exotic. It’s well-known Linux primitives wired together correctly, from boot, without you having to set any of it up. That’s what “first-class” means here: these things aren’t features you bolted on. They’re the chassis.


7. Where to go from here

Each layer has its own masterclass page with the details:

OpenZFS →

Storage you can trust. Snapshots, replication, datasets, checksums.

WireGuard →

Encrypted networks that disappear from the internet.

eBPF & bcc →

See the system, don’t guess.

NVIDIA on Linux →

GPUs without the pain. AI, transcoding, container workloads.

This page was the why. Those are the how. The rest of the masterclass collection covers nftables, systemd, backplane design, blue/green deployments, GitOps, and a dozen other layers that compose on top.