First-Class Infrastructure
Pick any distro. Get ZFS on root, an encrypted mesh as the network, kernel-level observability, and GPU acceleration — all running before the first user ever logs in.
That sentence is doing a lot of work. The rest of this page unpacks what it actually means and why it matters.
1. The big idea — the kernel is the foundation, not the runtime
Most servers are built backwards. You start with an empty operating system. Then you SSH in and add things — a storage layer, a VPN, monitoring agents, GPU drivers. Each piece installs on top of a working server, and each piece can break the server.
kldload flips this around. The foundation is what’s in the kernel — not what gets installed on top of it. Four things, baked into the same kernel:
OpenZFS — your storage primitives. Snapshots, copies, checksums, instant clones. The OS itself is a ZFS dataset.
WireGuard — your network. Encrypted from the kernel up, available before userspace exists. Services bind to encrypted interfaces and disappear from the internet.
eBPF + bcc — your observability. The kernel tells you what’s happening, directly, at kernel speed. No agents, no log parsing.
NVIDIA + CUDA — your GPU acceleration. AI, transcoding, container workloads — ready at boot.
These aren’t kldload inventions. They’re open-source projects, each maintained by its own community. What’s unusual is having all four in the same kernel, working together, signed for Secure Boot, surviving updates. Most distros ship none of them. A few ship one or two as add-on modules that break every kernel update. Nobody ships all four working together by default.
That’s the entire kldload contribution: the boring engineering work of making them coexist, so you don’t have to.
2. Two different layers — foundation in the kernel, flexibility in userspace
It’s tempting to frame this as “kldload vs. Ansible/Terraform/the IaC world.” That’s the wrong axis. They live at different layers and solve different problems — and they pair exceptionally well.
Here’s the split:
| Kernel-baked foundation (kldload) | Userland tooling (Ansible, Terraform, Helm) | |
|---|---|---|
| What it owns | Storage primitives, encrypted network, kernel observability, GPU drivers | App configuration, secrets, deployment orchestration, CI/CD, business logic |
| When it’s built | Image build time — once, signed, identical everywhere | Deploy time — per-environment, per-cluster, per-release |
| What it gives you | A booted server that is already on the encrypted mesh, already observable, already a ZFS dataset | A flexible way to describe what should be running and converge to that |
| What it can’t do | Change without rebuilding the image | Provide kernel-level features (a userspace WireGuard, for example, is 5-10× slower and not available at boot) |
| Best at | Identical foundation everywhere — no drift in the substrate | Per-app config that should differ between dev / staging / prod |
The catch with userspace-only: if you only have userland tooling, you have a chicken-and-egg problem at the foundation. Before Ansible can run, you need a reachable host. Before that host can be reached, you need a working network. Before WireGuard is up, you need to install and start it. Before that, you need an OS with the right kernel modules. Every step needs the previous one, and every step can fail. During the 20-40 minutes the host is up but the platform isn’t, it’s a half-built attack surface.
The catch with kernel-baked only: if you only had a fixed image, you couldn’t change anything per environment. The image is the same in dev, staging, prod — but your app config, your secrets, your scale, your DNS records have to differ. Userspace tooling is how you express those differences.
How they pair: kldload boots a host that is already authenticated on the mesh, already storing on ZFS, already observable. Your IaC tool of choice reaches that host on second zero (no SSH-key delivery dance, no “wait for cloud-init”) and runs the app-layer deploy. The foundation is fixed by the image; the application stack is fluid via IaC. Different concerns, different layers, both essential, no conflict.
3. Different boot & deployment options
Because the foundation is in the kernel and the artifact is self-contained, you get deployment options that traditional install-on-blank-OS workflows can’t reach:
Boot from USB
The image fits on a USB stick. Plug it into any x86_64 box — bare metal, laptop, server, or cloud VM with USB pass-through — and you have a working node in minutes. Same image, same identity, same mesh keys, anywhere.
Boot from disk after install
Traditional install path. ZFS on root, all the kernel modules compiled in, fully encrypted mesh + observability live from first boot.
PXE / netboot
Same image, served over the network. A new bare-metal node racks itself into the platform by booting — identity sealed in TPM, mesh keys derived from hardware.
Cloud VM
Upload the qcow2 / raw / VHD to any hypervisor or cloud provider. EC2, Azure, GCP, OVH, your own KVM — same image, same behavior. The mesh ties them together regardless of provider.
Live ISO (ephemeral)
Boot without installing. The platform is fully live in RAM — same encrypted mesh, same observability, same ZFS — for one-shot tasks, recovery, or air-gapped diagnostics.
Container image
The platform pieces (eBPF programs, ZFS userland, observability stack) also ship as OCI images for hybrid workloads on managed Kubernetes (EKS, GKE, AKS) where you don’t control the kernel.
In every case the foundation is the same — the encrypted mesh comes up at boot, observability attaches before userspace, storage is ZFS. The only thing that changes is how the artifact got to the hardware.
4. What this changes in practice
| Traditional way | kldload | |
|---|---|---|
| First useful packet | 20-40 minutes after the server boots | At boot — on the encrypted mesh |
| Configuration drift | Servers diverge over time as people SSH in | The image is the source of truth — nothing drifts |
| Rolling back a bad change | Undo 47 Ansible tasks; hope they all reverse cleanly | Boot the previous image. 10 seconds. |
| Disaster recovery | Provision a new control plane first, then everything else | USB stick + fresh hardware = recovered |
| Adding capacity | New server, new install, new automation run | Clone the image. Instant. Identical. |
| Attack surface at boot | Half-configured SSH + half-installed software | One UDP port. Nothing else. |
It’s not that the components are different. ZFS, WireGuard, eBPF — they exist for everyone. The difference is when they’re available. “At boot, identically, on every server” is the line traditional automation cannot cross — because traditional automation always needs a server to deploy onto first.
5. What stops being a problem
When the foundation is solid, a whole category of operational work evaporates. Some examples:
Backups
Snapshots happen automatically. Replication happens incrementally over the encrypted mesh. No agent to install, no schedule to manage, no retention policy to babysit.
Bad upgrades
Snapshot the OS before any change. If something breaks, select the old snapshot at the boot menu. Rollback time: 10 seconds.
Staging environments
Clone production. Zero disk cost — the clone shares blocks with the original until it diverges. Test on a byte-identical copy without paying for it.
VMs being heavy
On ZFS, a VM is a clone of a golden image. Creating one takes two seconds. VMs become as cheap as containers.
Network trust
Services bind to the encrypted mesh, not the public NIC. The LAN is untrusted by design. Anyone listening on the wire sees encrypted UDP and nothing else.
Monitoring blind spots
eBPF watches every syscall, every flow, every disk I/O — from the kernel. No agents to deploy. No log parsing. No “we didn’t have visibility into that.”
This isn’t exhaustive. It’s a sampling of what falls away when you fix the foundation.
6. What a real kldload server looks like
Strip away the framing and a production kldload server is mundane in the best way:
- The root filesystem is a ZFS dataset. Every service has its own dataset, tuned to its workload — databases on small blocks, logs with aggressive compression, video assets on big blocks.
- The public NIC accepts encrypted UDP on one port. Nothing else routes.
- All services — databases, monitoring, SSH, replication — bind to addresses on the encrypted mesh. The internet can’t see them.
- Snapshots happen on a timer. Replication to the DR site happens automatically over the mesh.
- Prometheus and Grafana watch everything in real time, pulling kernel-level metrics through eBPF.
Nothing in that list is exotic. It’s well-known Linux primitives wired together correctly, from boot, without you having to set any of it up. That’s what “first-class” means here: these things aren’t features you bolted on. They’re the chassis.
7. Where to go from here
Each layer has its own masterclass page with the details:
OpenZFS →
Storage you can trust. Snapshots, replication, datasets, checksums.
WireGuard →
Encrypted networks that disappear from the internet.
eBPF & bcc →
See the system, don’t guess.
NVIDIA on Linux →
GPUs without the pain. AI, transcoding, container workloads.
This page was the why. Those are the how. The rest of the masterclass collection covers nftables, systemd, backplane design, blue/green deployments, GitOps, and a dozen other layers that compose on top.