kldload — Executive Summary

Start Here

Executive Summary.

kldload is a re-packer. It takes your distro — CentOS, Debian, Ubuntu, Fedora, RHEL, Rocky, Arch, or FreeBSD — and re-packs it with ZFS on root, WireGuard, eBPF, and NVIDIA drivers compiled, signed, and integrated at build time. Nothing is patched. Nothing is forked. What comes out the other side is your distro, not ours.

OpenZFS becomes the root filesystem. Your entire OS — kernel, configs, home directories — sits on a filesystem with atomic snapshots, zero-cost clones, incremental replication, and checksums on every byte. Roll back a bad change in 2 seconds. Clone a server for testing at zero disk cost. Replicate to a DR site with one command. Boot environments let you upgrade fearlessly — if the new kernel breaks, select the previous one at the bootloader.

WireGuard becomes the network layer. Encrypted tunnels run as kernel interfaces that services bind to directly. SSH listens on the WireGuard address. Databases connect over the tunnel. The physical NIC carries UDP. Everything else is invisible to the internet. No VPN client. No port forwarding. No exposed services.

eBPF becomes the observability layer. Trace every TCP connection, every disk I/O, every process execution at kernel speed. Pre-installed tools (bcc-tools, bpftrace) answer questions that normally require tcpdump, strace, and hours of log parsing — in one command, in real time, with zero overhead.

NVIDIA drivers integrate at install time. GPU acceleration works from first boot. Multiple containers share one GPU simultaneously through CUDA time-slicing — no PCIe passthrough, no dedicated GPU per workload. AI inference, video transcoding, and compute workloads run side by side on the same card.

The difference

Without kldload, each of these is a separate post-install project. Install the OS on ext4. Add ZFS from a third-party repo and hope DKMS compiles against your kernel. Install WireGuard and manually configure tunnels. Install bcc-tools and hope the kernel headers match. Install NVIDIA drivers and debug Secure Boot module signing. Each module comes from a different source, builds independently, and breaks independently — especially on kernel updates.

kldload is not affected by any of that. All four modules are compiled against the exact kernel in the image, signed with the build's MOK key, and tested before the ISO is assembled. There is no post-install compilation. There is no third-party repo to add. There is no DKMS build to hope works. The modules are already in the image, already signed, already loaded. DKMS is pre-configured so future kernel updates rebuild automatically. The bootloader already understands ZFS. Boot environments already work. Kernel updates are protected by automatic snapshots — if a new kernel breaks a module, boot the previous environment in 10 seconds.

The result: a production-ready system with enterprise storage, encrypted networking, kernel observability, and GPU acceleration — from a single USB stick, in about two minutes, on any of 9 distributions. No manual steps. No post-install debugging. No prayers.

How to use it — 4 steps

1. Download the ISO. Boot from USB or in a VM.

2. Build your distro — pick your options: ZFS, WireGuard, eBPF, encryption, AI.

3. Install to disk, or export as a golden image (qcow2, VMDK, VHD, OVA, raw).

4. Done. One universal image — feed it to Packer, deploy with Terraform, or upload as an AMI. Cloud-init ready. Your infrastructure now has reliable ZFS storage on tap and an encrypted WireGuard backplane. Point your services at the WireGuard interface, reduce your public internet exposure, and start moving traffic off the wire.

One ISO. Nine distros. Every platform. Whatever userland you like.

Who this is for

Beginner. Boot the USB. Click through the web UI. Select Desktop. Click Install. You get a working Linux desktop with OpenZFS on root. You don't need to know what ZFS is. Snapshots happen automatically. Boot environments protect you from bad updates without you doing anything. It's easier than a normal Linux install because the hard decisions are already made.

Intermediate. Select the Server or KVM profile. Use kvm-create, kvm-clone, kvm-snap. Set up WireGuard with the tutorials. Run the eBPF observability tools. Follow a recipe to build a NAS, a game server, or a radio station. The tools are one command each. The masterclasses explain why.

Expert. Build multi-site clusters with BGP routing, VXLAN overlays, Cilium service mesh, blue/green deployments, FIPS 140-3 compliance, custom eBPF programs, fleet management with tag-based ZFS replication. The 32 masterclasses and 3,273 pages of documentation go as deep as you want to go.

The OS itself is simple to install and use. The power underneath is enterprise-grade. You use as much or as little as you need. A first-time Linux user can install it and have a working system in two minutes. An SRE can build a globally distributed platform on it. Same ISO. Same installer. Different depth. With continued polish and security hardening, this is production infrastructure — not a hobby project pretending to be one.

How things operate now

These aren't features kldload invented. They're capabilities that already exist in the Linux kernel through OpenZFS, WireGuard, and eBPF. kldload makes sure they're available from boot. When they are, the way you operate infrastructure changes fundamentally:

Storage. ZFS replaces the volume manager, the software RAID, the filesystem check, the backup script, and the replication agent. One subsystem. Self-healing, compressed, encrypted, snapshotted. You stop thinking about disks and start thinking about datasets.

Data sovereignty. Per-dataset encryption means data stays encrypted at rest and in transit. You replicate ciphertext. The receiving machine stores your data but can't read it. The keys never leave your hands. Send it to a cloud provider, an offsite backup, a partner — it's still yours. They're storing blobs they can't open.

Networking. WireGuard gives you encrypted Layer 3 tunnels at the kernel level. No VPN appliance. No certificate authority. No tunnel daemon to crash. The tunnel exists before userland starts. Applications don't know they're on an encrypted mesh and don't need to.

Observability. eBPF lets you attach probes to the running kernel — watch every syscall, every connection, every disk read. No monitoring agent. No SaaS dashboard. No monthly invoice. The kernel already knows what's happening. You just ask it.

Backup & recovery. Snapshots are instant and free. Replication is incremental and checksummed. Restore is one command. Boot environments mean you can roll back the entire OS — not just data, the operating system itself — in seconds. "Restore from backup" becomes "pick the snapshot you want."

Security. Modules are signed at build time. Every block on disk is checksummed. Secure Boot verifies the chain from firmware to bootloader to kernel to modules. A tampered module won't load. A corrupted block gets detected and repaired. This isn't a hardening checklist — it's structural.

Deployment. One USB. Offline. Everything baked in. No PXE server. No kickstart infrastructure. No internet. The image is self-contained. It works the same on bare metal, in a VM, or in any cloud.

Vendor lock-in. Pick your distro. CentOS, Debian, Ubuntu, Fedora, Rocky, RHEL, Arch. The same ZFS, the same WireGuard, the same tools on all of them. Switch distros without rebuilding your workflow. No subscription. No phone home. Your infrastructure, your choice.

None of this is magic. It's kernel modules that already exist, configured correctly, available at boot. The hard part was making it easy to install. That's what kldload does.

How operations change

These aren't theoretical benefits. They're concrete workflow improvements that happen the moment OpenZFS, WireGuard, and eBPF are the foundation instead of add-ons:

1. Test environments are actually consistent

Clone a production server for testing. The clone is byte-identical to production — not "configured the same way" but literally the same blocks, the same state, the same data. Test against real data without risking production. Create 5 test environments in 5 seconds. Destroy them when done. Zero disk cost until they diverge.

Workflow change: QA signs off on the actual production state, not a reconstruction of it. "Works in staging" finally means "works in production."

2. Rollback is 2 seconds, not 2 hours

Snapshot before every change. Bad deploy? zfs rollback — the entire filesystem reverts to the exact state before the change. Not "restore from backup." Not "rerun Ansible." Atomic. Instant. The system was never broken because you undid time.

Workflow change: Change management becomes fearless. Deploy on Friday if you want. The snapshot is your safety net.

3. Kernel updates stop being scary

Boot environments snapshot the OS before every update. New kernel breaks something? Select the previous boot environment at the bootloader. 10 seconds. No reinstall. No recovery USB. No debugging. The old kernel, the old modules, the old configs — all there, exactly as they were.

Workflow change: The update cycle goes from "schedule a maintenance window and hope" to "update, reboot, verify, done." If it breaks, boot the old one.

4. Backup is a filesystem feature, not a product

Automated snapshots run every 15 minutes. Incremental replication sends only changed blocks to a remote site over an encrypted tunnel. "Restore from backup" becomes "pick which 15-minute window you want." No backup agent. No backup server. No backup schedule to configure. It's a property of the filesystem.

Workflow change: The backup team doesn't exist anymore. The filesystem does it. The ops team monitors replication lag instead of managing backup jobs.

5. VM provisioning drops from minutes to milliseconds

Build one golden image. Clone it in 100ms. The clone uses zero disk until it diverges. Spin up 10 VMs from one template in under a second. Each is a full ZFS zvol with its own snapshot timeline. Delete a clone and the space comes back instantly.

Workflow change: Capacity requests go from "file a ticket, wait 2 days" to "kvm-clone template web-5" and done.

6. Database migrations become safe

Snapshot the database dataset before running the migration. If it fails, rollback. If it succeeds, keep the snapshot as a recovery point. Clone the database for testing the migration first — zero disk cost, real production data. Run 5 parallel migration tests against 5 clones simultaneously.

Workflow change: DBAs test against real production data without touching production. Five parallel attempts. Keep the winner. Destroy the losers.

7. Debugging goes from hours to seconds

Server is slow? tcpconnect shows every outbound TCP connection with the PID that made it. biolatency shows disk I/O latency as a histogram. execsnoop shows every process that starts. One command, one answer. No log files. No agents. No redeploying with more verbose logging. The kernel already knows. Ask it.

Workflow change: The incident response workflow becomes: alert fires, one eBPF command, root cause identified. No "add logging and redeploy."

8. Internal traffic becomes encrypted by default

Services bind to WireGuard interfaces. Database connections go through the tunnel. Monitoring scrapes go through the tunnel. SSH goes through the tunnel. The physical network carries encrypted UDP and nothing else. An attacker on the LAN sees noise. No certificates to manage per-service. No VPN client on every machine. The encryption is at the kernel level.

Workflow change: The network security audit becomes trivial. Everything is encrypted. There is no unencrypted internal traffic to worry about.

9. Disaster recovery becomes a timer, not a project

A systemd timer runs syncoid every hour. It sends incremental block deltas to the DR site over WireGuard. A 2TB server that changed 500MB in the last hour sends 500MB. The DR site has a byte-identical copy of everything, updated hourly. Failover = boot the replicated VMs on the DR host. RPO: 1 hour. RTO: minutes. No SAN. No shared storage. No DRBD.

Workflow change: DR testing becomes a cron job. Clone, boot, verify, destroy. Monthly. Automated. Actually tested, not "we think it works."

10. Blue/green deployments cost nothing

Clone the entire production environment. Upgrade the clone. Test it. If it works, swap traffic. If it doesn't, destroy the clone. The clone used zero disk because it shared all blocks with production until the upgrade changed them. Traditional blue/green requires 2x the hardware. ZFS blue/green requires 2x nothing.

Workflow change: Release engineering gets zero-risk deployments. The clone IS the staging environment AND the potential new production. One artifact, two purposes.

11. Disk space becomes transparent

zfs list shows exact usage per dataset — per service, per VM, per user. Compression ratio visible. Snapshot cost visible. No more du -sh * disagreeing with df. No more "where did the disk space go?" ZFS tells you exactly what's using space and why.

Workflow change: Capacity planning becomes data-driven. You know exactly what's growing, how fast, and when you'll need more.

12. Silent data corruption stops existing

Every block on disk is checksummed on write and verified on read. A corrupted block is detected automatically and repaired from the mirror or parity. Weekly scrubs verify every block proactively. You find out about corruption before you need the data, not after. ext4 and XFS cannot do this.

Workflow change: The "silent data corruption" category of outage stops existing. ZFS detects it before you need the data.

13. Air-gapped deployment works out of the box

The ISO contains complete package mirrors for every supported distro. No internet required during install. No PXE server. No kickstart infrastructure. Boot from USB. Install. The system works identically whether it's in a data center with gigabit internet or on a ship with no connectivity at all.

Workflow change: Field deployments become self-contained. Ship a USB stick. Everything is on it. No dependencies.

14. GPU sharing without passthrough

NVIDIA container toolkit with CDI means multiple containers share one GPU through CUDA time-slicing. Run AI inference, video transcoding, and compute workloads simultaneously on one card. No PCIe passthrough locking the GPU to one VM. No dedicated GPU per workload. Works on any NVIDIA GPU including consumer cards.

Workflow change: The GPU budget goes from "one card per workload" to "one card, every workload." AI, transcoding, compute — same GPU.

15. Per-service storage tuning becomes trivial

Each service gets its own ZFS dataset. PostgreSQL gets recordsize=8k matching its page size. Redis gets recordsize=16k. Media storage gets recordsize=1M for sequential throughput. Each dataset has independent snapshots, compression, quotas, and replication. On ext4, everything shares one block size and one filesystem. On ZFS, every service is tuned to its workload.

Workflow change: Storage performance tuning becomes a property you set, not a project you plan.

What this looks like in practice

Development & QA

A developer builds a golden image. Snapshots it. Clones it instantly for dev, staging, and QA — three identical environments created in seconds, not hours. They test, break things, iterate. When they're done, they roll back to the golden snapshot in one command. The next developer starts from the same clean state. No rebuilding. No waiting. No "works on my machine."

Build once. Clone instantly. Test freely. Roll back to clean. Ship with confidence.

Service deployment

Need a database server? Snapshot before migration. Need to scale out? Export the image and stamp out 10 copies with unique hostnames via cloud-init. Need to roll back a bad deploy? One command. The entire system — filesystem, configuration, state — reverts to the exact moment before the change. Not "restore from backup." Instant. Atomic. Complete.

Every deployment is reversible. Every change is recoverable. Every copy is instant.

Edge & air-gapped environments

Remote sites, factory floors, ships, field offices — places where internet is unreliable or unavailable. Build the image once with everything baked in. Deploy from USB. The encrypted mesh connects sites automatically. Replication keeps data synchronised between locations. No cloud dependency. No phoning home. It just works, anywhere.

Build it here. Deploy it there. It works the same everywhere.

Ask the kernel, not the application

Your web server is slow. Traditional debugging means adding logging, redeploying, waiting, reading logs, guessing, repeating. With kernel-level observability you skip all of that. Attach a probe to the running process — watch exactly which system calls it's making, which files it's opening, which connections are stalling, how long each disk read takes. The kernel already knows. You just have to ask it. No code changes. No redeployment. No waiting. The answer is right there, in real time, at the source.

Stop debugging your application. Start asking the kernel what your application is actually doing.

The cloud becomes just compute

You control your own storage, encryption, networking, observability. The image runs identically on bare metal, in a VM, or in any cloud. You're not renting a platform anymore. You're renting compute. Everything else is yours.

You don't have to change anything about how you work. Or you can change it all. Now you have the choice.

Or don't rent anything at all. Boot from USB. No PXE. No internet. It works the same everywhere.

KVM hypervisor with superpowers

The KVM profile turns bare metal into a production hypervisor. kvm-create spins up VMs on ZFS zvols. kvm-clone duplicates them in 100ms at zero disk cost. kvm-snap takes atomic snapshots. kvm-replicate sends incremental deltas to a DR site over WireGuard. 4 clones of a 10GB VM = 0 bytes of extra disk. Try that on Proxmox.

The same clone operation that takes Proxmox 30 seconds takes kldload 0.1 seconds.

Multi-site infrastructure

Three offices connected by WireGuard. ZFS replication keeps data synchronized between sites — incremental, block-level, encrypted in transit. A site goes down? Boot the replicated VMs on the DR host. The data was already there. BGP exchanges routes automatically. Add a fourth site and it learns everything in seconds.

Enterprise multi-site without the enterprise price tag.

AI inference on local hardware

The AI profile installs Ollama, pulls a model, and gives you a local AI assistant that knows your infrastructure. It reads live system state — pool health, WireGuard tunnels, running services — before answering questions. Runs on GPU with NVIDIA container toolkit. Two instances share one GPU. No cloud API. No data leaves your network.

Your infrastructure has an AI that understands it. On your hardware. Private.

Compliance without consultants

OpenZFS checksums = data integrity verification on every read. WireGuard = encrypted transport. Per-dataset encryption = data sovereignty. Audit-grade snapshot history = immutable recovery points. AES-256-GCM encryption approaching FIPS 140-3. Secure Boot chain from firmware to modules. These aren't features you configure — they're properties of the platform.

The compliance checklist fills itself when the filesystem does the work.

Self-hosted everything

Replace Google Drive with Nextcloud on ZFS. Replace 1Password with Vaultwarden. Replace Slack with Matrix. Replace GitHub with Gitea. Each service on its own ZFS dataset with independent snapshots, compression, quotas. Replicate the entire stack to an offsite backup with one cron job. Your data. Your hardware. Your network.

The homelab cloud recipe builds all of this in an afternoon. On a mini PC.

Media and broadcast

Plex on ZFS with per-movie datasets. Satellite signal capture with SDR and forensic watermarking. Internet radio with 30 stations from one box. Live TV streaming with SRT/HLS/DASH. Game servers where worlds are indestructible — 15-minute snapshots mean the maximum data loss from any event is 15 minutes.

Every appliance recipe in the collection follows the same pattern. The workload changes. The superpowers don't.

The point

kldload doesn't replace your applications. It doesn't touch them. It replaces the infrastructure beneath them.

Nothing is installed into your stack. There's no configuration. No agents. No daemons to maintain. The re-packer builds an image with encrypted networking and self-healing storage baked in at the kernel level — and you drop it in place. Your applications run on top, unchanged. The capabilities are just there, naturally, the moment the system boots.

The consequence is that entire categories of userland tooling — storage management, network encryption, monitoring, backup, security hardening, image pipelines, deployment automation — stop being products you buy, daemons you babysit, and vendors you depend on. They become properties of the operating system. They exist because the platform exists.

Use as much or as little as you want. Burn it all down and rebuild from the kernel primitives. Or apply a single pinpoint solution — add OpenZFS to an existing server, build a custom WireGuard backplane for one application, deploy eBPF observability on a single node. The platform is à la carte. Cherry-pick the pieces you need. Build custom backplanes. Combine primitives nobody's combined before.

If you want zero userland exposure — the source is there, strip out everything you don’t want. Quality-of-life additions like Sanoid snapshots are configured and ready, but they’re just services — remove them if they’re not for you. For everyone else, it’s configured to be used, learned, and built upon. 3,273 pages of documentation. 32 masterclasses. Over 125 recipes. Works on any Linux distro. 100% free. Always.

The only catch is that you have to build it.

← The Bridge One ISO. Every distro. Every platform. →