Documentation

The Full Stack

What does a kldload platform look like when you go all-out? Every layer filled in, every connection encrypted, every packet observed, every secret managed, every disk checksummed, every service authenticated, every deployment reversible. This is that document — the reference architecture for a fully deployed kldload stack, from bare metal to production workloads.

This is not a tutorial. This page is a map. It shows every technology in the stack, why it is there, what it connects to, and what it replaces. Each component links to its dedicated masterclass for the deep dive. Read this page first to understand the full picture, then drill into the individual masterclasses for implementation.

The premise: Most platforms are assembled from disconnected decisions — a firewall here, a container runtime there, certificates from somewhere, monitoring bolted on later. A kldload full stack is different. Every layer is chosen to reinforce every other layer. ZFS protects storage. WireGuard protects the network. eBPF observes the kernel. Cilium enforces policy. Keycloak authenticates users. Vault manages secrets. Sanoid snapshots everything. There are no gaps, no duct tape, and no vendor lock-in. You own every layer.

Prerequisites: none. This page is the starting point. Follow the links to go deeper.

Every technology on this page exists because it solves a specific problem better than the alternatives. Not because it is trendy, not because a vendor pushed it, not because it was the first thing that came up in a search. ZFS is here because no other filesystem checksums every block. WireGuard is here because no other tunnel protocol achieves the same security with 4,000 lines of code. eBPF is here because nothing else instruments the kernel without patching it. Every choice has a reason, and this page explains those reasons. If you disagree with a choice, the individual masterclass covers the alternatives and why they were not selected.

1. The Map — Every Layer at a Glance

A fully deployed kldload platform has seven layers. Each layer depends on the ones below it and enables the ones above it. From bottom to top:

┌─────────────────────────────────────────────────────────────────┐
│  WORKLOADS                                                       │
│  Kubernetes pods, databases, application containers, AI/LLM      │
├─────────────────────────────────────────────────────────────────┤
│  ORCHESTRATION                                                   │
│  Kubernetes (Cilium CNI, CoreDNS), blue/green deploys, Packer    │
├─────────────────────────────────────────────────────────────────┤
│  OBSERVABILITY                                                   │
│  eBPF tracing, Prometheus, Grafana, Loki, Alertmanager           │
├─────────────────────────────────────────────────────────────────┤
│  SECURITY & IDENTITY                                            │
│  Keycloak SSO, Vault secrets, step-ca PKI, SELinux, nftables     │
├─────────────────────────────────────────────────────────────────┤
│  NETWORKING                                                      │
│  WireGuard backplane, BIRD BGP, VXLAN/EVPN, DNS, IPsec, HAProxy │
├─────────────────────────────────────────────────────────────────┤
│  COMPUTE                                                         │
│  KVM hypervisor, libvirt, QEMU, NVIDIA GPU passthrough           │
├─────────────────────────────────────────────────────────────────┤
│  STORAGE & OS                                                    │
│  ZFS on root, sanoid snapshots, zfs-send replication, systemd    │
└─────────────────────────────────────────────────────────────────┘
          Bare metal — kldload ISO — one install — all of this

The stack is deliberately bottom-up. You cannot secure what you cannot observe, you cannot observe what you cannot network, and you cannot network what you cannot store. Each layer is a foundation. Skipping a layer creates a gap that the layers above cannot compensate for. A Kubernetes cluster without proper networking is fragile. Networking without encryption is exposed. Encryption without key management is theatre. This is why the full stack matters — every layer reinforces the others.

2. Layer 1 — Storage & Operating System

Everything starts with the disk. Every byte on this platform lives on ZFS — the operating system, the virtual machines, the databases, the container images, the logs, the backups. ZFS is the foundation because it is the only filesystem that provides atomic snapshots, built-in replication, transparent compression, per-block checksumming, and native encryption in a single coherent package.

STORAGE ZFS on Root

Every kldload node boots from a ZFS pool. The root filesystem, /home, /var, swap — all ZFS datasets with independent snapshot, compression, and quota policies. Boot environments let you snapshot before upgrades and roll back in seconds if anything breaks. There is no ext4, no XFS, no LVM. One filesystem, one tool, one set of commands.

STORAGE Pool Design

Boot pool: mirror vdev across two NVMe drives. Data pool: RAIDZ2 or dRAID across the remaining drives. SLOG: Optane or high-endurance NVMe for synchronous write acceleration (databases, NFS). L2ARC: fast SSD as a read cache for datasets that exceed ARC. Every pool has ashift=12, compression=zstd, atime=off, xattr=sa.

STORAGE Sanoid — Automated Snapshots

Sanoid runs on every node. It takes hourly, daily, and monthly snapshots of every dataset according to retention policy. Syncoid replicates snapshots to a remote node for disaster recovery. zfs send is incremental — only changed blocks cross the wire. A full DR replica of a 10 TB pool adds minutes of transfer per day, not hours.

STORAGE ZFS Encryption

Datasets containing secrets, user data, or compliance-scoped information use native ZFS encryption (encryption=aes-256-gcm, keyformat=passphrase or keyformat=raw with a key in Vault). Encryption happens below the snapshot layer — snapshots and replication work identically whether the dataset is encrypted or not. Keys are loaded at boot from Vault or a local keyfile.

STORAGE systemd

Every service, timer, and mount on the platform is managed by systemd. No cron jobs, no init scripts, no screen sessions. Service dependencies are explicit. Restart policies are defined. Resource limits (cgroups) are set per unit. Journal logs are structured and queryable. This is not optional — systemd is the control plane for the operating system, and treating it as such is what makes the platform manageable.

ZFS is not just a filesystem choice. It is an architectural decision that cascades through the entire stack. VM storage? ZFS zvols with snapshots. Database storage? ZFS dataset with recordsize tuned to the database page size. Container storage? ZFS storage driver. Backups? Sanoid snapshots + syncoid replication. Disaster recovery? ZFS send to a remote pool. Encryption at rest? Native ZFS encryption. Every other storage question in the stack has a ZFS answer, and that answer is consistent, atomic, and checksummed. This is why ZFS is Layer 1.

Masterclass deep dives: ZFS · systemd

3. Layer 2 — Compute

Bare metal runs the hypervisor. KVM is built into the Linux kernel — every kldload node is a hypervisor by default. Virtual machines run on ZFS zvols. GPU passthrough is native. No VMware, no Proxmox required (though Proxmox is supported). The hypervisor is the OS.

COMPUTE KVM & libvirt

KVM provides hardware-accelerated virtualisation. libvirt manages VM lifecycle. QEMU handles device emulation. VMs are defined as XML, stored on ZFS, snapshotted atomically, and cloned instantly with zfs clone. A golden image workflow — install once, seal for cloning, deploy hundreds — reduces provisioning to seconds per VM.

COMPUTE Golden Images & Packer

Packer builds machine images from code. A single Packer template produces identical images for KVM (qcow2), Proxmox (template), AWS (AMI), and bare metal. cloud-init handles first-boot personalisation: hostname, SSH keys, network config, ZFS pool import. Every node in the fleet boots from the same image. Drift is impossible because there is nothing to drift from — the image is the source of truth.

COMPUTE GPU Passthrough

NVIDIA GPUs are passed through to VMs or containers via VFIO (for full passthrough) or vGPU (for sharing). AI inference workloads — Ollama, vLLM, text-generation-inference — run in containers with --gpus all on ZFS-backed storage. The NVIDIA driver, CUDA toolkit, and container toolkit are pre-configured in the desktop profile. DKMS rebuilds the driver on kernel updates, just like ZFS.

COMPUTE Containers — Podman & Firecracker

Podman is the container runtime. It is daemonless, rootless-capable, and uses the ZFS storage driver for copy-on-write image layers. For microVM isolation, Firecracker provides hardware-level isolation with VM-speed startup. Containers run directly on the host or inside KVM VMs — the architecture supports both. SELinux MCS labels isolate containers at the kernel level.

The reason we use KVM and not a purpose-built hypervisor like ESXi is that KVM is the kernel. There is no hypervisor layer to manage separately — the host OS is the hypervisor. A kldload node running 40 VMs is also running ZFS, WireGuard, BIRD, Prometheus, and nftables on the host. All of those tools see the VMs as processes. You can snapshot a VM's storage with zfs snapshot, you can firewall its traffic with nftables, you can trace its I/O with eBPF. The hypervisor and the platform are the same thing. This is the fundamental architectural advantage over a traditional split between hypervisor and management plane.

Masterclass deep dives: Packer & IaC · Containers · CI/CD & GitOps · KVM Tutorial · NVIDIA Tutorial

4. Layer 3 — Networking

Networking on a full kldload stack is multi-plane. Different types of traffic travel on different encrypted planes, each with its own keys, its own routing, and its own policies. Nothing crosses planes without an explicit decision. This is the backplane architecture.

NETWORK WireGuard Backplane — Multi-Plane Mesh

Every node runs multiple WireGuard interfaces, each serving a different plane: management (SSH, Ansible, monitoring), storage (ZFS replication, NFS, iSCSI), workload (pod-to-pod, service mesh), and external (public-facing traffic, IPsec to partners). Each plane is a separate WireGuard interface with separate keys. Compromise of one plane does not expose the others.

NETWORK BIRD BGP — Dynamic Routing

BIRD runs on every node and exchanges routes via BGP. No static routes, no hardcoded IPs in config files. When a new node joins, BIRD announces its networks and every other node learns the routes automatically. When a node goes down, BGP withdraws its routes and traffic reroutes. ECMP (Equal-Cost Multi-Path) distributes traffic across multiple paths. This is how hyperscalers route — BGP on every host.

NETWORK VXLAN & EVPN — Overlay Networks

For workloads that need Layer 2 adjacency across Layer 3 boundaries — VM migration, multi-site clusters, legacy applications that assume broadcast — VXLAN encapsulates Ethernet frames in UDP. EVPN (via BIRD or FRRouting) provides control-plane MAC/IP learning so VXLAN does not flood. The overlay runs on top of the WireGuard backplane, so it is encrypted end-to-end without VXLAN knowing.

NETWORK DNS — CoreDNS + Unbound

CoreDNS serves internal zone records (forward and reverse) for every host, VM, and service on the platform, backed by a simple zone file or etcd. Unbound provides recursive resolution with DNSSEC validation for external queries. Every node's /etc/resolv.conf points to the local Unbound instance, which forwards internal queries to CoreDNS. DNS is not optional infrastructure — it is how services find each other.

NETWORK IPsec — External Connections

When you need to connect to a cloud VPN gateway (AWS, Azure, GCP), a partner's network, or a government system that requires FIPS-validated encryption, IPsec provides the tunnel. strongSwan handles IKEv2 negotiation. XFRM interfaces make IPsec tunnels route-based so they integrate with BIRD BGP. The WireGuard backplane handles internal traffic; IPsec handles the outside world.

NETWORK HAProxy & keepalived — Load Balancing

HAProxy distributes traffic across backends with health checking, TLS termination, and connection draining. keepalived provides a virtual IP (VIP) that floats between HAProxy instances — if the primary dies, the VIP moves to the standby in under a second. For Kubernetes, MetalLB or Cilium's LB IPAM announces service IPs via BGP directly.

The multi-plane architecture is the single most important networking decision. Traditional infrastructure puts everything on one flat network and tries to firewall between segments. A multi-plane design puts different traffic types on physically separate encrypted channels. A compromised workload cannot sniff management traffic because management traffic is on a different WireGuard interface with different keys. This is not VLAN segmentation — VLANs are a Layer 2 tag that any node on the switch can see. These are separate cryptographic tunnels. An attacker on the workload plane cannot even see that the management plane exists.

Masterclass deep dives: WireGuard · BIRD & BGP · VXLAN & EVPN · DNS · IPsec Tunnels · Backplane Networks · Load Balancing & HA

5. Layer 4 — Security & Identity

Security is not a layer you bolt on. It is woven into every other layer. But the identity layer — who you are, what you are allowed to do, what certificates you hold, what secrets you can access — is concentrated here. Every authentication decision and every secret on the platform flows through these components.

SECURITY Keycloak — Identity & SSO

Keycloak is the single source of identity. Every user, every service account, every role assignment lives in Keycloak's realm. Grafana, Vault, Kubernetes, the kldload web UI — all authenticate via OIDC tokens issued by Keycloak. One login, one set of credentials, one MFA prompt. No per-application passwords. No shared accounts. Active Directory and LDAP federate through Keycloak, so existing corporate identity works without migration.

SECURITY step-ca — Internal PKI

step-ca is the internal Certificate Authority. It issues short-lived X.509 certificates to every service via ACME (the same protocol Let's Encrypt uses). Every internal connection — database, API, metrics scrape, gRPC — is mTLS. Certificates rotate automatically every 24 hours. The CA root key lives on ZFS-encrypted storage, backed by Vault. No self-signed certificates, no curl -k, no "we'll add TLS later."

SECURITY Let's Encrypt — Public TLS

Public-facing services (the web UI, the API gateway, any external endpoint) get certificates from Let's Encrypt via certbot with DNS-01 challenges. Wildcard certificates cover subdomains. Renewal is automatic via systemd timer. Internal services use step-ca. Public services use Let's Encrypt. There is no overlap and no gap.

SECURITY HashiCorp Vault — Secrets Management

Vault stores every secret on the platform: database credentials, API keys, Keycloak client secrets, ZFS encryption keys, TLS private keys, WireGuard private keys. Applications access secrets via Vault's API, authenticated by their Keycloak OIDC token or Kubernetes service account. Secrets are never in config files, environment variables, or git. Vault's storage backend is a ZFS-backed Raft cluster — snapshottable, replicable, encrypted.

SECURITY SELinux — Mandatory Access Control

Every node runs SELinux in enforcing mode. Every confined service (httpd, sshd, named, Java, containers) operates within its labelled domain and cannot escape it, even as root. Custom policy modules cover kldload-specific services. MCS categories isolate containers at the kernel level. semanage export captures all customisations for reproducible deployment.

SECURITY nftables — Firewall

nftables provides stateful packet filtering on every node. The base policy is default-deny: only explicitly allowed traffic passes. Each WireGuard interface has its own nftables chain scoped to the plane's allowed services. Management plane allows SSH and Prometheus. Workload plane allows pod traffic. Storage plane allows ZFS send and NFS. No cross-plane traffic unless a rule says so.

SECURITY FIPS 140-3 Compliance

For regulated environments: RHEL's FIPS mode is enabled at the kernel level. OpenSSL and GnuTLS use only FIPS-validated algorithms. Libreswan provides FIPS-validated IPsec. Vault's seal mechanism uses FIPS-approved KMS. The entire cryptographic stack — from disk encryption to TLS to tunnel encryption — uses only algorithms with NIST validation certificates. This matters for government, finance, and healthcare.

The security architecture follows a principle: every connection is authenticated, every connection is encrypted, every secret is managed, every process is confined, and every decision is auditable. Keycloak knows who you are. Vault knows what secrets you can access. step-ca proves it cryptographically. SELinux enforces it at the kernel. nftables enforces it at the network. This is defense in depth — not one wall, but five concentric walls. An attacker who compromises one layer faces another. Compromise a pod? Cilium limits its network access. Escape the container? SELinux confines the process. Escalate to root? SELinux still applies. Get a shell on the host? WireGuard keys are per-plane, so lateral movement is limited. Every layer constrains the blast radius of every other layer.

Masterclass deep dives: Keycloak & SELinux · TLS & PKI · Vault & Secrets · Security Hardening · nftables · FIPS 140-3 Compliance

6. Layer 5 — Observability

You cannot operate what you cannot see. Observability on a full kldload stack covers three pillars — metrics, logs, and traces — plus a fourth that most platforms lack: kernel-level instrumentation via eBPF.

OBSERVE eBPF — Kernel Instrumentation

eBPF programs attach to kernel tracepoints, kprobes, and network hooks to observe syscalls, network flows, disk I/O, and scheduler events — without modifying the kernel or restarting anything. On a full stack deployment, eBPF provides: Cilium's network policy enforcement, Hubble's network flow visibility, custom latency histograms via bpftrace, and Falco's runtime security detection. eBPF is the nervous system of the platform.

OBSERVE Prometheus & Alertmanager — Metrics

Prometheus scrapes metrics from every component: node_exporter (hardware/OS), ZFS exporter (pool health, I/O), kube-state-metrics (Kubernetes objects), Keycloak metrics endpoint, Vault telemetry, WireGuard exporter, HAProxy stats, and application-level metrics. Alertmanager routes alerts to Slack, PagerDuty, or email. Alert rules cover: ZFS scrub errors, pool capacity >80%, certificate expiry <7 days, node down >5 minutes, SELinux AVC denials, WireGuard handshake failures.

OBSERVE Grafana — Dashboards

Grafana provides the visual layer. Pre-built dashboards cover: ZFS pool status, WireGuard tunnel health, Kubernetes cluster overview, node resource utilisation, database query latency, Keycloak login metrics, certificate expiry timeline, and eBPF network flow maps. Grafana authenticates via Keycloak OIDC — role-based access controls who sees what. Data sources: Prometheus for metrics, Loki for logs, Tempo for traces.

OBSERVE Loki — Log Aggregation

Loki collects logs from every node and container. Promtail (or the Grafana Agent) ships systemd journal entries, container stdout, and application log files to Loki. Logs are indexed by labels (node, service, namespace) not by full text — storage-efficient and query-fast. In Grafana, you can click from a metric spike directly to the logs for that service at that time. Storage backend: ZFS dataset with compression.

OBSERVE Hubble — Network Flow Visibility

Hubble is Cilium's observability layer. It captures every network flow in the Kubernetes cluster — source pod, destination pod, protocol, port, verdict (allowed/denied), latency. Hubble UI shows a real-time service map. Hubble metrics feed into Prometheus. When a network policy blocks something it should not (or allows something it should not), Hubble shows you exactly what happened, which policy matched, and why.

The reason this stack includes eBPF at the observability layer and not just as a Cilium implementation detail is that eBPF changes what you can see. Traditional monitoring tells you "this HTTP request took 200ms." eBPF tells you "this HTTP request spent 3ms in the application, 12ms waiting for a TCP socket, 180ms waiting for a disk read on a ZFS dataset with a fragmented recordsize, and the disk read hit the L2ARC but the ARC miss rate was 40% because the ARC was sized too small." You go from knowing something is slow to knowing exactly why. That changes how you operate.

Masterclass deep dives: eBPF · Cilium · Observability

7. Layer 6 — Orchestration

Orchestration is how workloads get deployed, scaled, updated, and rolled back. On a full kldload stack, this means Kubernetes for containers and blue/green deployments for everything else.

WORKLOAD Kubernetes on KVM

The Kubernetes cluster runs on KVM VMs, not on bare metal. Control plane nodes are three VMs on separate physical hosts for HA. Worker nodes are VMs cloned from a golden image. ZFS zvols back the VM disks — snapshotting an entire worker node before a Kubernetes upgrade is one command. If the upgrade fails, roll back the zvol. The cluster is disposable. The data is not.

WORKLOAD Cilium CNI — eBPF Networking

Cilium replaces kube-proxy and provides the Container Network Interface (CNI). Pod-to-pod networking uses eBPF programs attached directly to the kernel's network stack — no iptables chains, no netfilter overhead. Network policies are enforced at the eBPF level. Cilium provides: pod networking, service load balancing, network policy, transparent encryption (WireGuard or IPsec between nodes), bandwidth management, and Hubble observability.

WORKLOAD Blue/Green Deployments

Stateless services use Kubernetes rolling deployments. Stateful infrastructure (databases, message queues, the Kubernetes cluster itself) uses blue/green: deploy the new version alongside the old, verify it works, switch traffic, keep the old version for instant rollback. ZFS snapshots make blue/green trivial for VMs — the "green" environment is a clone of the "blue" snapshot. If green fails, destroy it and blue is untouched.

WORKLOAD GitOps & Packer Pipeline

Infrastructure changes flow through git. Packer builds golden images from committed code. Kubernetes manifests are applied via Flux or ArgoCD watching a git repo. Terraform or OpenTofu manages VM provisioning. Nothing is configured by hand. If a node drifts, destroy it and redeploy from the golden image. The git repo is the source of truth. The running infrastructure is a projection of it.

Running Kubernetes on KVM VMs instead of bare metal is a deliberate choice. Bare-metal Kubernetes is faster but couples the cluster lifecycle to the hardware lifecycle. KVM decouples them. You can snapshot the entire control plane before an etcd migration, run two Kubernetes versions side by side during an upgrade, and move worker nodes between physical hosts with live storage migration. The overhead of virtualisation (1-3% CPU, negligible I/O with virtio) is the price for this operational flexibility. On a full stack deployment, that trade-off is always worth it.

Masterclass deep dives: Kubernetes · Cilium · Blue/Green & SRE · Packer & IaC · Containers · CI/CD & GitOps · Construction Kit

8. Layer 7 — Workloads

This is what everything exists to serve. The workloads are the applications, databases, APIs, and services that deliver value. Everything below this point is infrastructure. The infrastructure's job is to make workloads reliable, secure, observable, and deployable.

WORKLOAD Databases on ZFS

PostgreSQL, MySQL, Redis, and etcd run on dedicated ZFS datasets with tuned recordsize (8K for PostgreSQL, 16K for MySQL InnoDB), synchronous writes to SLOG, and hourly snapshots via Sanoid. Point-in-time recovery is instant: zfs rollback to any snapshot. Replication uses zfs send to a standby node. Client connections use mTLS certificates from step-ca. Credentials live in Vault.

WORKLOAD AI & LLM Inference

Ollama or vLLM serves language models from NVIDIA GPUs passed through to KVM VMs or exposed to containers via the NVIDIA container toolkit. Model weights are stored on ZFS datasets with recordsize=1M and compression=off (already compressed). Inference APIs authenticate via Keycloak OIDC tokens. GPU utilisation metrics flow to Prometheus. This is the same infrastructure as everything else — no special snowflake.

WORKLOAD Application Containers

Stateless microservices run in Kubernetes pods. They pull configuration from Vault, authenticate users via Keycloak, serve traffic behind HAProxy or Cilium's load balancer, emit metrics to Prometheus, send logs to Loki, and store persistent data on ZFS-backed PersistentVolumes. Network policies (Cilium) restrict which pods can talk to which. Every container runs under SELinux with a unique MCS label.

WORKLOAD NFS & iSCSI Shared Storage

For workloads that need shared filesystem access (legacy applications, some AI training frameworks), NFS is served from a ZFS dataset with NFS kernel server. iSCSI provides block-level access for VMs that need raw devices. Both run over the storage plane WireGuard interface, encrypted in transit. ZFS quotas, reservations, and snapshots apply.

Notice that every workload description above mentions at least three other layers. A database uses ZFS for storage, WireGuard for replication transport, step-ca for TLS, Vault for credentials, Sanoid for backups, Prometheus for monitoring, and SELinux for confinement. An AI inference service uses KVM for the VM, GPU passthrough for compute, ZFS for model storage, Keycloak for API auth, Cilium for network policy, and Prometheus for GPU metrics. This is what "full stack" means — not one technology, but the integration between all of them. The value is not in any individual component. It is in the fact that they are all present, all configured, and all aware of each other.

Masterclass deep dives: Databases on ZFS · Load Balancing & HA · Operations Guide Upgrades & Boot Environments · Labeling & Assets

9. How a Request Flows Through the Stack

To make the architecture concrete, here is what happens when an external user hits an API endpoint on a fully deployed kldload platform. Every layer participates.

User's browser
  │
  │  HTTPS (Let's Encrypt certificate)
  ▼
HAProxy (TLS termination, keepalived VIP)
  │
  │  nftables: allow inbound 443, rate limit
  │  eBPF: Cilium captures flow metadata for Hubble
  ▼
Kubernetes Ingress (Cilium)
  │
  │  Cilium network policy: only allow traffic to this namespace
  │  mTLS: Cilium encrypts pod-to-pod with WireGuard
  ▼
Application Pod
  │
  │  Keycloak OIDC: validates JWT access token (local signature check)
  │  Token contains: user identity, roles, client scope
  │  SELinux: pod runs as container_t:s0:c123,c456
  ▼
Application queries database
  │
  │  Vault: application fetched DB credentials at startup (dynamic secret, 1hr TTL)
  │  step-ca: mTLS client certificate authenticates to PostgreSQL
  │  WireGuard: DB connection travels on storage plane, not workload plane
  ▼
PostgreSQL on ZFS
  │
  │  ZFS: recordsize=8K, SLOG for synchronous writes
  │  Sanoid: hourly snapshots, 30-day retention
  │  SELinux: postgresql_t domain, cannot access /home, /tmp, or other services
  ▼
Response travels back up the same path
  │
  │  eBPF: latency histogram recorded by bpftrace
  │  Prometheus: request count and duration metric incremented
  │  Loki: structured log entry with request ID, user, latency
  │  Hubble: full network flow recorded (src pod → dst pod, port, verdict)
  ▼
User sees the response

Count the security boundaries that request crossed: TLS termination at the load balancer, nftables firewall, Cilium network policy, OIDC token validation, Vault-issued dynamic database credential, mTLS to the database, WireGuard plane isolation, SELinux domain confinement on every process. Count the observability touch points: eBPF flow capture, Prometheus metric, Loki log entry, Hubble flow record, and the latency histogram. Every single request is authenticated, encrypted, authorised, confined, logged, and measured. This is not "best practice" theatre — these are all running simultaneously on every request, with negligible latency impact because eBPF and WireGuard are in-kernel and Cilium bypasses iptables entirely.

10. Disaster Recovery & Backup

On a full kldload stack, disaster recovery is not a separate system. It is a property of the architecture.

ZFS Snapshots — Point-in-Time Recovery

Sanoid takes hourly snapshots of every dataset. Accidentally delete a file? zfs rollback. Corrupt a database? Roll back the dataset to before the corruption. Need the state from three weeks ago? The snapshot is there. Cost: near-zero (copy-on-write, only changed blocks consume space).

// ctrl+Z for your entire infrastructure, going back weeks.

Syncoid Replication — Off-Site Backup

Syncoid sends incremental snapshots to a remote node via zfs send | zfs receive over the storage WireGuard plane. The remote node has a complete, consistent, up-to-date copy of every dataset. If the primary site burns down, the remote site has everything up to the last sync (typically 15–60 minutes).

// Your entire infrastructure, replicated to another building, every hour, automatically.

Boot Environments — Safe Upgrades

Before any OS upgrade, a boot environment snapshot is created. If the upgrade breaks boot, select the previous boot environment from the bootloader and the system comes up exactly as it was. ZFS makes this atomic — the rollback is a metadata operation, not a file copy.

// "The update broke everything." — reboot, pick the previous entry, you're back in 10 seconds.

Golden Image Rebuilds — Immutable Infrastructure

If a node is compromised or corrupt beyond repair, do not fix it. Destroy it and redeploy from the golden image. Packer builds the image, cloud-init personalises it, ZFS receive restores the data from the replica. A complete node rebuild takes minutes, not hours. The infrastructure is cattle, not pets.

// Don't heal the sick cow. Get a new cow. The barn (ZFS data) is fireproof.

RPO and RTO

Scenario	RPO (data loss)	RTO (downtime)	Mechanism
Accidental file deletion	<1 hour	Seconds	`zfs rollback` or browse `.zfs/snapshot/`
Database corruption	<1 hour	Minutes	`zfs rollback` dataset to pre-corruption snapshot
Bad OS upgrade	Zero	1 reboot	Boot environment rollback
Node hardware failure	<1 hour	Minutes	Redeploy golden image + `zfs receive` from replica
Full site loss	<1 hour	Hours	DR site has full ZFS replica. Promote and repoint DNS.
Ransomware / compromise	<1 hour	Hours	Destroy all nodes. Redeploy from golden images. Restore from ZFS replicas (read-only, attacker cannot encrypt them).

The ZFS replication model is what makes disaster recovery achievable for small teams. You are not buying a separate backup product, managing tape libraries, or paying for cloud backup storage per-GB. You are using the same filesystem you already run, sending incremental block diffs to a remote node over an encrypted tunnel, and the remote node has a live, mountable, browsable copy of everything. If your primary site goes down, the replica is not a tar.gz you need to extract — it is a running ZFS pool that you import and use. The RPO is the sync interval (configurable — 15 minutes to 24 hours). The RTO is however long it takes to boot a new node and import the pool.

11. The Numbers — What This Looks Like in Practice

Reference deployment: 6-node cluster

Node	Role	Hardware	Runs
`infra-1`	Infrastructure	64 GB RAM, 2x NVMe (mirror), 4x SSD (RAIDZ2)	Keycloak, Vault, step-ca, CoreDNS, HAProxy (primary)
`infra-2`	Infrastructure (HA)	64 GB RAM, 2x NVMe (mirror), 4x SSD (RAIDZ2)	Keycloak replica, Vault (standby), CoreDNS, HAProxy (standby)
`compute-1`	Hypervisor	256 GB RAM, 2x NVMe (mirror), 8x SSD (dRAID), NVIDIA A4000	KVM VMs: K8s control plane, workers, GPU workloads
`compute-2`	Hypervisor	256 GB RAM, 2x NVMe (mirror), 8x SSD (dRAID), NVIDIA A4000	KVM VMs: K8s workers, databases, application VMs
`observe-1`	Observability	128 GB RAM, 2x NVMe (mirror), 6x HDD (RAIDZ2)	Prometheus, Grafana, Loki, Alertmanager
`dr-1`	Disaster recovery	64 GB RAM, 2x NVMe (mirror), 8x HDD (RAIDZ2)	ZFS replicas (syncoid target), cold standby for all services

Network plane layout

┌─────────────────────────────────────────────────────────────────┐
│  MANAGEMENT PLANE (wg-mgmt)     10.250.0.0/24                   │
│  SSH, Ansible, Prometheus scrapes, Grafana                       │
│  Keys: per-node Curve25519 keypair                               │
├─────────────────────────────────────────────────────────────────┤
│  STORAGE PLANE (wg-storage)     10.251.0.0/24                   │
│  ZFS send/receive, NFS, iSCSI, database replication              │
│  Keys: per-node Curve25519 keypair (different from mgmt)         │
├─────────────────────────────────────────────────────────────────┤
│  WORKLOAD PLANE (wg-workload)   10.252.0.0/24                   │
│  Pod-to-pod (Cilium), service traffic, API calls                 │
│  Keys: per-node Curve25519 keypair (different from both above)   │
├─────────────────────────────────────────────────────────────────┤
│  EXTERNAL PLANE (eth0 / ipsec0)                                  │
│  Public-facing services, IPsec tunnels to partners/cloud         │
│  nftables: strict ingress filtering, DDoS mitigation             │
└─────────────────────────────────────────────────────────────────┘

BIRD BGP runs on ALL planes, exchanging routes per-plane.
Each plane is a full mesh of WireGuard tunnels.
No traffic crosses planes without an explicit nftables FORWARD rule.

Certificate hierarchy

Let's Encrypt (public)
├── *.example.com (wildcard, 90-day, auto-renewed)
│   ├── api.example.com (HAProxy TLS termination)
│   ├── grafana.example.com (Grafana)
│   └── auth.example.com (Keycloak)

step-ca (internal)
├── Root CA (10-year, offline, ZFS-encrypted dataset in Vault)
│   └── Intermediate CA (3-year, online, step-ca server on infra-1)
│       ├── *.internal (24-hour leaf certs, ACME auto-renewed)
│       │   ├── postgres.internal (database mTLS)
│       │   ├── vault.internal (Vault API mTLS)
│       │   ├── k8s-api.internal (Kubernetes API server)
│       │   └── prometheus.internal (metrics scrape mTLS)
│       └── client certificates (service-to-service mTLS)

strongSwan IPsec CA
├── IPsec Root CA (step-ca issued intermediate)
│   ├── gateway-a.example.com (site-to-site tunnel certs)
│   └── gateway-b.example.com

Secrets management map

Vault (infra-1, HA with infra-2)
├── secret/keycloak/         — admin password, DB credentials, client secrets
├── secret/postgres/         — superuser password, replication credentials
├── secret/grafana/          — Keycloak OIDC client secret
├── secret/wireguard/        — private keys for all nodes and planes
├── secret/step-ca/          — intermediate CA private key
├── secret/zfs/              — encryption passphrases per dataset
├── secret/ipsec/            — PSKs or certificate private keys
├── pki/internal/            — Vault PKI engine (alternative to step-ca)
├── database/postgres/       — dynamic credentials (Vault generates per-app creds)
└── transit/                 — encryption-as-a-service (envelope encryption for apps)

Look at the numbers. Six physical nodes. Four network planes. Two certificate hierarchies. One secrets manager. And yet the entire thing installs from one ISO. Every node boots from the same kldload image. The differentiation happens in cloud-init and the postinstaller script. The infra nodes get Keycloak and Vault. The compute nodes get libvirt and GPU drivers. The observe node gets Prometheus and Grafana. The DR node gets syncoid. The base layer — ZFS, WireGuard, BIRD, nftables, SELinux, eBPF, systemd — is identical on every node. This is what "re-packer" means. The same bricks build different rooms.

12. Why Each Technology — The Decision Table

Component	What it Does	Why This One	What it Replaces
ZFS	Filesystem + volume manager	Checksums, snapshots, replication, encryption, compression — one tool	ext4 + LVM + mdadm + rsync + LUKS (5 tools for one job)
WireGuard	Encrypted tunnels	4K lines of code, in-kernel, Curve25519, no configuration complexity	OpenVPN, IPsec for internal traffic
BIRD	BGP routing daemon	Lightweight, config-file driven, BGP on every host like hyperscalers	Static routes, OSPF, proprietary routing
Cilium	Kubernetes CNI + network policy + LB	eBPF-native, replaces kube-proxy + iptables, Hubble observability	Calico, Flannel, kube-proxy
eBPF	Kernel instrumentation	Programmable kernel observation without patching or modules	strace, dtrace, kernel modules, tcpdump
Keycloak	Identity & SSO	Open source, OIDC + SAML, federation, MFA, full-featured	Okta, Auth0, Dex, per-app auth
Vault	Secrets management	Dynamic secrets, PKI engine, transit encryption, audit log	Ansible Vault, .env files, hardcoded credentials
step-ca	Internal Certificate Authority	ACME protocol, short-lived certs, auto-renewal, lightweight	Self-signed certs, manual OpenSSL, CFSSL
SELinux	Mandatory access control	Kernel-enforced, survives root compromise, MCS for containers	AppArmor, nothing (most skip MAC entirely)
nftables	Firewall	Successor to iptables, atomic ruleset loads, sets/maps, faster	iptables, firewalld
strongSwan	IPsec VPN	IKEv2, certificate auth, XFRM interfaces, interoperable	OpenVPN for external connections
Prometheus	Metrics collection	Pull-based, PromQL, de facto standard, massive exporter ecosystem	Nagios, Zabbix, Datadog
Grafana	Dashboards & visualisation	Multi-datasource, Keycloak SSO, alerting, open source	Kibana, proprietary dashboards
Sanoid	Snapshot policy + replication	Purpose-built for ZFS, policy-driven, syncoid for send/receive	cron + zfs snapshot scripts, Bacula, Borg
Packer	Machine image builds	Multi-platform (KVM, cloud, bare metal), code-defined images	Manual installs, custom scripts, Kickstart alone

Every row in this table is a choice that compounds. ZFS makes snapshots free, which makes DR trivial, which makes blue/green safe, which makes upgrades fearless. WireGuard makes encryption cheap, which makes multi-plane practical, which makes security real instead of aspirational. Keycloak makes SSO free, which makes per-app auth unnecessary, which means one place to enforce MFA. Each technology is good individually. The stack effect — the way they reinforce each other — is what makes the full deployment greater than the sum of its parts. That compounding is the thing that is hard to see when you are evaluating technologies one at a time. This page exists so you can see all of them at once.

13. How to Get There — Build Order

You do not deploy all of this at once. The stack builds bottom-up, layer by layer. Each step is usable on its own. You can stop at any layer and have a functional platform.

Phase 1 — Foundation (Day 1)
├── Install kldload (ZFS on root, WireGuard, eBPF, nftables, SELinux)
├── Configure ZFS pool layout and Sanoid snapshot policies
├── Set up WireGuard management plane between nodes
├── Enable SELinux enforcing, configure booleans
└── Result: encrypted, snapshotted, firewalled bare metal

Phase 2 — Networking (Day 2-3)
├── Deploy BIRD BGP on all nodes
├── Add storage and workload WireGuard planes
├── Configure CoreDNS for internal name resolution
├── Set up nftables per-plane policies
└── Result: multi-plane routed network, no static routes

Phase 3 — Security (Day 4-5)
├── Deploy step-ca (internal PKI)
├── Deploy Keycloak (SSO)
├── Deploy Vault (secrets management)
├── Configure mTLS between all services
├── Move all secrets to Vault
└── Result: every connection authenticated and encrypted

Phase 4 — Compute (Day 6-7)
├── Build golden images with Packer
├── Deploy KVM VMs for Kubernetes control plane and workers
├── Install Kubernetes with Cilium CNI
├── Configure Kubernetes OIDC with Keycloak
└── Result: container orchestration on encrypted, snapshotted VMs

Phase 5 — Observability (Day 8-9)
├── Deploy Prometheus + Alertmanager
├── Deploy Grafana (Keycloak SSO)
├── Deploy Loki for log aggregation
├── Configure eBPF tracing and Hubble
├── Build dashboards, set up alert rules
└── Result: full visibility into every layer

Phase 6 — Workloads (Day 10+)
├── Deploy databases on tuned ZFS datasets
├── Deploy application containers
├── Configure load balancing (HAProxy or Cilium LB)
├── Set up blue/green deployment pipeline
├── Configure syncoid replication to DR node
└── Result: production workloads with full DR

Two weeks from bare metal to a fully deployed, encrypted, observable, authenticated, snapshottable platform running production workloads. A single engineer can do it because every tool is open source, every configuration is code, and the kldload ISO pre-installs the hard parts (ZFS DKMS, WireGuard, eBPF tooling). The individual masterclass pages are the detailed guides for each phase. This page is the map that tells you what order to read them in.

14. The Point

A fully deployed kldload platform is not a product you buy. It is a platform you build. Every component is open source. Every configuration is a text file in a git repo. Every node boots from the same ISO. You understand every layer because you built every layer.

The result is infrastructure that is encrypted at every layer (ZFS, WireGuard, mTLS, IPsec), authenticated everywhere (Keycloak SSO, Vault dynamic secrets, certificate-based mTLS), observable to the kernel (eBPF, Prometheus, Grafana, Hubble), recoverable to any point in time (ZFS snapshots, syncoid replication, boot environments), auditable (SELinux, Vault audit log, Keycloak login events, structured logging), and entirely yours.

No vendor lock-in. No licence fees. No phone-home telemetry. No cloud dependency. It runs in your rack, on your hardware, under your control. That is the full stack.

This is what kldload was built for. Not to be a single tool, but to be the foundation that makes all of these tools work together. The ISO installs the OS with ZFS, WireGuard, eBPF, and the kernel-level plumbing that everything else depends on. The masterclass pages teach you how to build each layer. This page shows you what it looks like when all the layers are filled in. It is ambitious, but it is achievable — every technology on this page has a masterclass with concrete configuration examples, and every masterclass has been written for people who build infrastructure, not people who read about it.

First-Class Infrastructure — the philosophy behind the stack
ZFS Masterclass — Layer 1 deep dive
WireGuard Masterclass — backplane encryption
BIRD & BGP Masterclass — dynamic routing
Keycloak & SELinux Masterclass — identity and access control
TLS & PKI Masterclass — certificate infrastructure
Vault & Secrets Masterclass — secrets management
eBPF Masterclass — kernel observability
Cilium Masterclass — Kubernetes networking
Kubernetes Masterclass — container orchestration
IPsec Tunnels Masterclass — external connectivity
Operations Guide Upgrades & Boot Environments — day-2 operations
Build Your Own — getting started with custom deployments

← First-Class Infrastructure ZFS Masterclass →