| your Linux re-packer
kldload — your platform, your way, anywhere, free
Source

Construction Kit Masterclass

Every recipe on this site — NAS server, firewall, radio station, game servers, signal observatory — was built by someone who asked: what if I applied this to my thing? This masterclass answers that question directly. It teaches the underlying pattern so you can build any appliance you can imagine, not just the ones documented here.

The pattern: Pick a workload. Give it a ZFS dataset. Add WireGuard for encryption. Add nftables for isolation. Add sanoid for snapshots. Add syncoid for replication. Add systemd for lifecycle management. Add eBPF for observability. The workload doesn’t matter — the pattern is the same. This masterclass teaches you the pattern.

Once you internalize it, you stop reading recipes and start writing them. The re-packer is a set of primitives. You assemble them for your workload — whatever that workload is.

Out-of-character note: This is the page for people who read the recipes and thought “what if I applied this to my thing?” Your thing — whatever it is — gets the same superpowers as every recipe on this site: ZFS snapshots, encrypted backplane, instant clones, incremental replication, per-service datasets, boot environment rollback. The re-packer is the set of primitives. You assemble them for your workload. That’s the whole idea.

1. kldload is a re-packer, not a product

Most operating system projects ship a product. You install it, it does a specific thing, you use it that way. kldload is different. It ships primitives. ZFS for storage. WireGuard for networking. nftables for isolation. sanoid and syncoid for data lifecycle. systemd for process lifecycle. eBPF for observability. These aren’t features bolted onto a product — they’re the building blocks. The product is whatever you build with them.

The recipes on this site are existence proofs: someone built a NAS, documented the steps, and shared it. Someone else built a radio station. Someone else built a self-updating satellite DVR. Every one of those recipes is the same 10-step template applied to a different workload. The NAS recipe is the template applied to Samba. The game server recipe is the template applied to Minecraft. The radio station is the template applied to Icecast.

The re-packer doesn’t care what you’re building. It gives every workload the same capabilities: point-in-time recovery, encrypted replication, network isolation, fearless upgrades, deep observability. Your job is to pick the workload. The kit handles everything else.

Primitives, not features

ZFS is not “the backup feature.” It’s a primitive that provides copy-on-write snapshots, clones, send/receive, encryption, compression, and quotas. You compose these to get backup, DR, testing environments, encryption at rest, space efficiency — all at once.

// Feature: “automated backup to S3” // Primitive: zfs send | zstd | ssh user@remote zfs recv // The primitive is more powerful. Always.

Composition, not configuration

You don’t configure kldload to “enable NAS mode.” You compose: create a dataset, install Samba, bind to WireGuard, add a sanoid policy, write a systemd unit. Each step is independent and understandable. The composition is your appliance.

// UNIX philosophy applied to infrastructure // Small tools, each excellent, composed by the operator

The workload is the variable

Every recipe on this site has the same structure: 10 steps, 14 layers, one workload. Change the workload and you get a different appliance. The structure doesn’t change. Once you know the structure, you can build anything in 30 minutes.

// NAS = template(workload=Samba) // GameSrv = template(workload=Minecraft) // MailSrv = template(workload=Postfix+Dovecot)

Infrastructure as understanding

The recipes are documented because understanding compounds. When you know why each step exists, you can adapt it when something changes — a new version, a new requirement, a new constraint. Infrastructure-as-understanding beats infrastructure-as-copy-paste.

// Know why: recordsize=8k for OLTP databases // Then adapt: switch to recordsize=128k for data warehouse // Without understanding, you copy-paste forever

2. The template — every kldload appliance follows this pattern

Every appliance recipe on this site, and every appliance you’ll build yourself, follows the same 10-step sequence. The steps are not arbitrary — each one exists because skipping it creates a real problem. Install in order. Understand each step. Adapt as needed for your workload.

  1. Install kldload
    Boot the ISO. Pick your target distro (CentOS, Debian, Ubuntu, Rocky, RHEL, Fedora, Arch, Alpine). Pick your profile: desktop for workstations, server for headless appliances, core for pure ZFS-on-root with no kldload tools. ZFS on root is installed, boot environment created, WireGuard and all platform tools are available immediately.
  2. Create ZFS datasets
    One dataset per logical service. Use kdir or raw zfs create. Tune recordsize to match your I/O pattern: 8k for OLTP databases, 128k for sequential workloads, 1M for large media files. Set compression=lz4 for general data, compression=zstd for archival data, compression=off for already-compressed content. Set atime=off for any dataset you care about throughput on.
  3. Install your application
    Use apt, dnf, pacman, Docker, Podman, or compile from source — whatever the workload requires. Point the application’s data directory at the ZFS dataset you created in step 2. The application doesn’t need to know anything about ZFS.
  4. Bind to WireGuard backplane
    Configure your application to listen on the WireGuard interface IP, not 0.0.0.0. Add the service port to the backplane’s allowed-services list. The service is now invisible from the internet. Management, monitoring, and replication all happen over the encrypted backplane. Clients connect via WireGuard peer. No public attack surface.
  5. Add nftables zones
    Define per-interface rules: what is allowed in on eth0 (public), what is allowed in on wg0 (management backplane), what is allowed in on wg1 (monitoring backplane), what the service is allowed to connect to outbound. Rate-limit anything public-facing. Default deny. Log drops during initial setup.
  6. Add sanoid snapshots
    Write a /etc/sanoid/sanoid.conf template for your dataset. Choose your retention policy: frequent=15 (every 15 minutes), hourly=48, daily=30, monthly=12 is a sensible starting point for most services. Enable autosnap=yes and autoprune=yes. Enable the sanoid timer. Snapshots now happen automatically, forever.
  7. Add syncoid replication
    Configure a syncoid cron or systemd timer to replicate your dataset to a remote host over WireGuard. Use --no-sync-snap if you rely on sanoid for snapshot creation. The remote host receives an identical, independently snapshottable copy of your data. If the primary dies, you boot the replica and import the pool — typically under 5 minutes.
  8. Add systemd hardening
    Write a systemd unit for your service. Add hardening directives: ProtectSystem=strict, PrivateTmp=yes, NoNewPrivileges=yes, SystemCallFilter=@system-service, CapabilityBoundingSet restricted to only what the service needs. Bind-mount only the ZFS dataset the service reads and writes. The service now runs in a kernel-enforced sandbox.
  9. Add monitoring
    Install node_exporter and scrape from Prometheus on the monitoring backplane. Add service-specific metrics: application health checks, queue depths, error rates, latency histograms. Write Grafana dashboards. Configure alerting thresholds. Use eBPF tools (biolatency, tcpconnect, runqlat) for deep performance visibility when something feels wrong.
  10. Add boot environments
    Before any upgrade: kbe create pre-upgrade-$(date +%Y%m%d). Install the upgrade. Test. If it breaks, kbe activate pre-upgrade-20260115 and reboot. You’re back in 10 seconds. Boot environments are OS-level snapshots: the entire root filesystem, kernel, and bootloader chain. This is your net under the high-wire act of production upgrades.

Concrete example: a generic web application

Suppose your workload is a Python web application backed by PostgreSQL. Here is the template applied literally:

// Web application — template applied

# Step 2: ZFS datasets
zfs create -o mountpoint=/srv/myapp     -o recordsize=128k -o compression=lz4 rpool/srv/myapp
zfs create -o mountpoint=/var/lib/pgsql -o recordsize=8k   -o compression=lz4 rpool/srv/pgsql

# Step 3: Install app
dnf install -y python3 postgresql-server gunicorn
postgresql-setup --initdb
systemctl enable --now postgresql
# point PGDATA at /var/lib/pgsql (already on ZFS)

# Step 4: WireGuard — listen on backplane only
# gunicorn bind = 10.8.0.1:8000 (not 0.0.0.0)
# PostgreSQL listen_addresses = 'localhost,10.8.0.1'

# Step 6: sanoid policy
[rpool/srv/myapp]  use_template=production
[rpool/srv/pgsql]  use_template=production  recursive=yes

# Step 7: syncoid replication (hourly cron)
syncoid --no-sync-snap rpool/srv/pgsql backup@10.8.1.2:backup/myapp-pgsql

# Step 10: boot environment before every upgrade
kbe create pre-deploy-$(date +%Y%m%d-%H%M)
Out-of-character note: This 10-step template is what every recipe on the site follows. The NAS recipe is this template applied to Samba. The game server recipe is this template applied to Minecraft. The radio station recipe is this template applied to Icecast. Once you internalize the template, you can build any appliance in 30 minutes — and the resulting appliance will have stronger data protection and security than most production systems built by teams of people who didn’t use a template.

3. The superpowers table — what each layer adds to any workload

When designing a new appliance, walk down this table and ask: does my workload benefit from this layer? The answer is usually yes for at least 8 of the 14 layers. Use this as your checklist.

Layer What It Does Your Workload Gains Example
ZFS datasets Per-service storage with independently tuned properties Independent snapshots, quotas, compression, separate I/O tuning per service Database on recordsize=8k, media on recordsize=1M, log archive on compression=zstd
ZFS snapshots Point-in-time copies, instant, near-zero cost, preserved in the pool Rollback any service to any previous state in 2 seconds Schema migration failed? Rollback. Griefer destroyed a game world? Rollback. Bad config push? Rollback.
ZFS clones Zero-cost copy of any dataset at any snapshot point Test on production data without touching production Clone the database for load testing; clone the game world for mod testing; clone the VM disk for staging
ZFS send/receive Block-level incremental replication over any transport Offsite backup, disaster recovery, multi-site active/passive Hourly replication to backup server over WireGuard; cross-datacenter DR; warm standby that imports in minutes
ZFS compression Transparent, per-dataset, zero application changes 30–60% disk savings on compressible data; faster sequential reads from cache lz4 for general data, zstd for archive, zstd-19 for cold storage, off for video or already-compressed blobs
ZFS encryption Per-dataset native encryption, keys managed independently Sensitive data encrypted at rest with no application changes Encrypted home directories, encrypted database, unencrypted public media cache — all on the same pool
Boot environments OS-level ZFS snapshots of the entire root filesystem Fearless upgrades; instant OS rollback; branch your OS like code Bad kernel update? Boot the previous environment in 10 seconds. Failed package upgrade? Same. Test a config branch in a BE, discard it if it fails.
WireGuard Modern encrypted tunnel, kernel-native, zero overhead Services invisible from internet; encrypted management plane; site-to-site backplane SSH, database ports, monitoring, and replication traffic all on WireGuard — none of it exposed to the public internet
nftables Per-interface stateful firewall and traffic control Zone isolation, rate limiting, explicit allow-lists, logged drops Management on wg0, monitoring on wg1, replication on wg2, public traffic on eth0 with strict rate limits
systemd hardening Kernel-enforced process sandboxing via unit directives Service isolated from rest of system; restricted filesystem view; capability limits ProtectSystem=strict, NoNewPrivileges, SystemCallFilter, bind-mounted dataset — even a compromised service can’t escape its sandbox
eBPF / bcc-tools Kernel-level tracing with no overhead when not active Deep observability: I/O latency, TCP events, CPU scheduling, syscall profiles biolatency for storage bottlenecks, tcpconnect for unexpected connections, runqlat for CPU saturation, custom probes for your application
sanoid Automated snapshot scheduling with configurable retention policies Never lose data; never think about backup scheduling; time-travel for any dataset 15-minute frequent, 48 hourly, 30 daily, 12 monthly — automatic, pruned, always current
syncoid Automated incremental ZFS replication over SSH Offsite DR without per-dataset configuration; resumable transfers; encrypted in transit Hourly incremental replication over WireGuard to backup site; cross-region DR; always-current warm standby
Cilium eBPF Kubernetes networking with L3–L7 policy and Hubble observability Per-pod network policies, transparent mTLS, identity-based security, service mesh without sidecars Namespace isolation, L7 HTTP policy, Hubble flow visibility, BGP service announcement — all without modifying applications
Out-of-character note: This table is the cheat sheet. When designing a new appliance, walk down this list and ask “does my workload benefit from this?” The answer is usually yes for at least 8 of the 14 layers. The ones you skip are intentional decisions, not defaults. That’s the difference between designed infrastructure and assembled-by-accident infrastructure.

4. Quality of life enhancements

The raw ZFS, WireGuard, and systemd commands are powerful but verbose. Repeating them exactly every time is error-prone and demotivating. kldload ships wrappers that reduce each common operation to a single readable command. The goal: every operation you might skip because it’s too annoying becomes one you actually do every time.

ksnap — snapshot before any change

Take a named snapshot of any dataset in one command. No more remembering the full ZFS snapshot syntax or the date format. Run it before every deploy, every config change, every package update.

ksnap rpool/srv/myapp pre-deploy # replaces: # zfs snapshot rpool/srv/myapp@pre-deploy-20260115-143022

kdir — create a ZFS dataset, not just a directory

Creates a ZFS dataset with sane defaults (compression, atime, mountpoint) instead of a plain directory. Prevents the common mistake of putting service data outside the ZFS hierarchy.

kdir /srv/myapp # replaces: # zfs create -o mountpoint=/srv/myapp \ # -o compression=lz4 -o atime=off \ # rpool/srv/myapp

kbe — boot environment management

Create, list, activate, and destroy boot environments. One command before any system upgrade. One command to roll back. No manual ZFS clone + bootloader configuration.

kbe create pre-kernel-6.8 kbe list kbe activate pre-kernel-6.8 # then reboot — you’re back in 10 seconds

kdf — ZFS-aware disk space

Shows real disk usage accounting for snapshots, clones, and compression. The standard df command lies on ZFS systems because it doesn’t understand copy-on-write space accounting.

kdf # shows: dataset, used, available, compression ratio, # snapshot space, clone references — the full picture

kst — one-command health check

Runs the full system health check: pool status, scrub status, service status, replication lag, disk SMART status, and network connectivity. Paste the output into a support request or a postmortem.

kst # pool: ONLINE scrub: clean services: 7/7 running # replication: lag 4m SMART: all clean WG: 3/3 peers up

kvm-create / kvm-clone / kvm-snap / kvm-replicate

Full VM lifecycle in single commands. Create a VM with ZFS-backed disk. Clone it (backed by ZFS clone, zero-copy). Snapshot it. Replicate it to another host. The VM layer and the ZFS layer are integrated by design.

kvm-clone prod-web01 staging-web01 # ZFS clone + new MAC + updated network config # ready in <5 seconds

kai — AI that reads your system state

Reads pool status, service logs, SMART data, and replication status before answering questions. You don’t paste logs into a chat window — the AI already knows your system state when you ask.

kai "why is my write latency high?" # reads: biolatency, pool iostat, service logs # answers with context from your actual system

The principle: every common operation is one command

Quality of life is the difference between “I could do this” and “I actually do this every time.” Snapshots before deploys. Boot environments before upgrades. Health checks before sleep. These habits only stick when the friction is low enough to form them.

// Friction kills good habits // One command = habit-forming // Five flags = skipped under pressure
Out-of-character note: These tools exist because the raw ZFS/systemd/WireGuard commands are powerful but verbose. ksnap replaces zfs snapshot rpool/srv/myapp@manual-$(date +%Y%m%d-%H%M%S). kdir replaces six lines of zfs create with properties. Quality of life is the difference between “I could do this” and “I actually do this every time.” The wrappers exist to make the right behaviour the easy behaviour.

5. Over-the-top use cases — what becomes possible

These are not hypothetical. Every one of these use cases is built from the same 10-step template and the same 14 layers. The pattern is always the same. The workload is the variable.

5.1 — The time-traveling database

Take a snapshot every 15 minutes. Your database can travel to any point in the last 30 days. The database doesn’t know ZFS exists. It just works faster and never loses data.

  • sanoid frequent policy: snapshot every 15 minutes, retain 48 hours of 15-minute snapshots
  • Clone any snapshot for forensic analysis without touching production: zfs clone rpool/srv/pgsql@sanoid-20260115-0300 rpool/srv/pgsql-forensic
  • Replicate to a read-only replica for reporting queries — no load on the primary
  • Schema migration failed at step 3 of 7? Roll back in 2 seconds. The transaction log is irrelevant — ZFS operates below the filesystem.

5.2 — The self-healing web application

systemd restarts the app on crash. sanoid snapshots every 15 minutes. syncoid replicates hourly. The infrastructure handles failure automatically at every layer.

  • Bad deploy? ksnap rollback rpool/srv/myapp pre-deploy-20260115-1430. Back in 2 seconds.
  • Corrupted database? ZFS rollback to the snapshot before the bad migration. No pg_dump required.
  • Server dies completely? Boot the syncoid replica. Import the ZFS pool. Start services. Under 5 minutes.
  • The application developer needs to know none of this. The infrastructure handles it at a layer below the application.

5.3 — The invisible CI/CD runner

Lives on the WireGuard backplane. Zero public attack surface. Every job runs in a clean, ephemeral ZFS clone. Build state never bleeds between jobs.

  • GitLab Runner or GitHub Actions self-hosted on a kldload server, listening on WireGuard only
  • Each CI job gets a ZFS clone of the base build environment: zfs clone rpool/ci/base@golden rpool/ci/job-$CI_JOB_ID
  • Job runs. Clone destroyed. Zero leftover state. Zero contamination between jobs.
  • Build artifacts stored on a dedicated ZFS dataset with compression=zstd — 50% savings on compiled binaries
  • Build cache on its own dataset: survives job cleanup, snapshotted to protect against corruption, independent retention policy

5.4 — The paranoid mail server

Postfix + Dovecot on ZFS. Every mailbox is a dataset. No mailbox can corrupt another. Encrypted at rest. Public interface exposes only port 25.

  • One ZFS dataset per mailbox: independent snapshots, per-user quotas, per-mailbox encryption keys
  • WireGuard backplane for IMAP — users VPN in to read mail. Port 143 is never public.
  • Public interface: SMTP port 25 only, with nftables rate limiting and connection-count limits
  • Snapshot before every filter rule change. If the sieve rules break mail delivery, rollback in 2 seconds.
  • syncoid replication to DR site. If the server dies, zpool import on the backup and mail flows in minutes — no mail lost, all mailboxes intact.

5.5 — The multi-tenant SaaS platform

Each tenant gets their own ZFS dataset and their own Kubernetes namespace. Onboarding is 5 minutes. Offboarding is clean, auditable, and leaves no data behind unless you want it to.

  • Tenant storage: zfs create -o quota=50G -o encryption=on rpool/tenants/$TENANT_ID
  • Tenant networking: Cilium network policy per namespace — tenants cannot reach each other by default
  • Onboarding: create dataset, create namespace, deploy Helm chart. Fully automated. Under 5 minutes.
  • Offboarding: snapshot the tenant dataset (legal hold), then zfs destroy -r. Clean removal. Auditable. The snapshot persists for your retention period.
  • Per-tenant backup: syncoid runs per-dataset. Each tenant’s data replicates independently to their own DR target if required.

5.6 — The disaster-proof home automation

Home Assistant on ZFS. Every automation change is preceded by a snapshot. Bad automation locked you out of your house? Rollback. Immediately. All communication over WireGuard.

  • ksnap rpool/srv/hass pre-automation-change before every configuration edit
  • Rollback if the new automations break something: ksnap rollback rpool/srv/hass pre-automation-change
  • syncoid replication to a Raspberry Pi offsite. If the primary dies, swap to the replica — all automations, history, and state intact.
  • WireGuard for remote access. Your smart home traffic never touches the cloud. No Nabu Casa, no third-party relay.
  • sanoid retains 30 days of daily snapshots. Accidentally deleted a device configuration from 3 weeks ago? It’s in a snapshot.

5.7 — The research data pipeline

Ingest, process, analyze, archive — each stage on its own ZFS dataset with properties tuned to that stage’s I/O pattern. Clone any stage for reprocessing without touching the original.

  • Raw ingest: compression=off, recordsize=1M — maximize write throughput on large files
  • Processed data: compression=zstd, recordsize=128k — 60% savings on tabular data, fast sequential reads
  • Analysis working set: compression=lz4, recordsize=8k — fast random access for database-style queries
  • Archive: compression=zstd-19, recordsize=1M — maximum compression, replicated offsite, snapshotted weekly
  • Reprocess any stage: clone the dataset at the snapshot before reprocessing. The original pipeline data is untouched. If the reprocessing is wrong, discard the clone.
Out-of-character note: These are not hypothetical. Every one of these use cases is built from the same 10-step template and the same 14 superpowers. The time-traveling database is PostgreSQL + sanoid. The invisible CI runner is GitLab Runner + WireGuard + ZFS clones. The paranoid mail server is Postfix + ZFS per-mailbox + syncoid. The pattern is always the same. The workload is the variable. Once you understand the pattern, you stop asking “can kldload do X?” and start asking “how do I apply the template to X?”

6. Designing your appliance — the decision framework

Before you write a single command, answer these questions. Your answers determine your dataset layout, network topology, snapshot policy, and security posture. Decisions made at design time are free. Decisions made after data is in production are expensive.

Question Determines Common answers
What is the workload? Which application you install in step 3 Web app, database, media server, IoT gateway, scientific pipeline, game server, mail server
What is the I/O pattern? recordsize on your ZFS dataset Random small I/O (OLTP) → 8k; sequential large (media, backup) → 1M; mixed → 128k
What data is compressible? compression property per dataset Text, logs, code, databases → lz4 or zstd; video, images, archives → off
What data needs encryption? Which datasets get encryption=aes-256-gcm PII, credentials, medical records, private keys → encrypt; public media, read-only mirrors → optional
What is your backup RPO? sanoid frequent_period and hourly retention 15 minutes (financial, medical), 1 hour (most services), daily (cold archive, rarely-changed data)
What is the DR strategy? syncoid targets, schedule, and network topology Single site (no syncoid), multi-site (syncoid to remote), cloud (syncoid to cloud VM over WireGuard)
What is the security posture? Number of WireGuard planes and nftables zone strictness Public service (strict ingress rules, rate limiting); internal only (backplane only, no public interface); classified (air-gapped backplane, audited access)
Who accesses it? Authentication model and network exposure Human users → TLS + auth; services → mTLS or WireGuard; both → separate planes and RBAC
How often does it change? Boot environment discipline and snapshot frequency Frequently updated → boot environments before every change; rarely updated → boot environments before major upgrades only
What does failure look like? Monitoring alerts and DR runbook Service unavailable (restart), data corruption (ZFS rollback), hardware failure (syncoid replica), site failure (full DR import)

Work through these questions before opening a terminal. Write the answers down — they become the first section of your appliance manifest in the next step.


7. The appliance manifest — document your design

Every appliance you build should have a manifest. Not a 40-page document — a single file that answers the questions someone would ask when they need to recover your system at 2am without you present. The manifest is also the design document: fill it in before you build, and it will catch gaps in your thinking.

Manifest template

// appliance.manifest — template

name:        <appliance name>
purpose:     <one sentence>
owner:       <name, contact>
distro:      <CentOS / Debian / Rocky / Ubuntu / Fedora / Arch>
profile:     <desktop / server / core>

zfs_layout:
  rpool/srv/<name>:       recordsize=128k  compression=lz4   atime=off
  rpool/srv/<name>-db:   recordsize=8k    compression=lz4   atime=off
  rpool/srv/<name>-log:  recordsize=128k  compression=zstd  atime=off

network_topology:
  eth0:   <public interface — what ports are exposed and to whom>
  wg0:    <management backplane — who has peers, what services listen here>
  wg1:    <monitoring backplane — Prometheus scrape target, node_exporter>
  wg2:    <replication backplane — syncoid source/target, direction>

services:
  - unit:     <systemd unit name>
    listens:  <interface:port>
    dataset:  <rpool/srv/...>
    hardening: ProtectSystem=strict  NoNewPrivileges  PrivateTmp

snapshot_policy:
  template:       production
  frequent_period: 15     # minutes
  hourly:         48
  daily:          30
  monthly:        12

replication_policy:
  target:         <user@host:pool/dataset>
  schedule:       hourly
  transport:      WireGuard wg2
  encryption:     in-transit via WireGuard, at-rest via ZFS encryption

monitoring:
  alerts:
    - metric:   pool_health != ONLINE
      action:   page
    - metric:   replication_lag_minutes > 90
      action:   warn
    - metric:   service_up == 0
      action:   page

upgrade_procedure:
  1. kbe create pre-upgrade-$(date +%Y%m%d)
  2. ksnap <dataset> pre-upgrade
  3. Apply upgrade (packages / config / schema)
  4. Verify: kst + service health check
  5. If failed: kbe activate pre-upgrade-<date> && reboot

dr_procedure:
  1. On backup host: zpool import <pool>
  2. Start services: systemctl start <units>
  3. Update DNS / WireGuard peers to point to backup host
  4. Verify: kst + smoke tests
  5. RTO target: <X minutes>  RPO target: <Y minutes>

Concrete filled-in example: personal git server

// gitea.manifest

name:        git.internal
purpose:     Self-hosted Gitea instance for team source control
owner:       Anthony Carpenter  ops@example.com
distro:      Rocky Linux 9
profile:     server

zfs_layout:
  rpool/srv/gitea:      recordsize=128k  compression=lz4   atime=off
  rpool/srv/gitea-db:   recordsize=8k    compression=lz4   atime=off
  rpool/srv/gitea-lfs:  recordsize=1M    compression=off   atime=off

network_topology:
  eth0:   TCP 80,443 (HTTPS only — Nginx reverse proxy)
  wg0:    TCP 3000 Gitea admin UI, TCP 22 SSH git push (backplane only)
  wg1:    TCP 9100 node_exporter, TCP 9101 postgres_exporter
  wg2:    syncoid source — all three datasets replicated nightly

services:
  - unit:     gitea.service
    listens:  wg0:3000
    dataset:  rpool/srv/gitea
    hardening: ProtectSystem=strict  NoNewPrivileges  User=git
  - unit:     postgresql.service
    listens:  localhost:5432
    dataset:  rpool/srv/gitea-db

snapshot_policy:
  frequent_period: 15
  hourly:          48
  daily:           30
  monthly:         12

replication_policy:
  target:    backup@10.8.2.1:tank/git
  schedule:  nightly at 02:00
  transport: wg2 (dedicated replication backplane)

monitoring:
  alerts:
    - pool DEGRADED:   page immediately
    - gitea down >1m: page immediately
    - disk usage >80%: warn 24h before action required

upgrade_procedure:
  1. kbe create pre-gitea-upgrade
  2. ksnap rpool/srv/gitea-db pre-upgrade
  3. systemctl stop gitea
  4. dnf update gitea
  5. systemctl start gitea && curl -f http://10.8.0.1:3000/api/v1/version
  6. If failed: kbe activate pre-gitea-upgrade && reboot

dr_procedure:
  RTO: 8 minutes  RPO: 24 hours
  1. ssh backup@10.8.2.1
  2. zpool import tank
  3. systemctl start gitea postgresql
  4. Update DNS: git.internal → 10.8.2.1
  5. Verify: kst && git clone http://git.internal/test

8. From appliance to golden image

Once your appliance works exactly the way you want it, you can seal it and export it as a portable image. The image becomes your deployment artifact: import it into any hypervisor, deploy it with Packer, provision fleets with Terraform. The lifecycle is: design → build → test on clone → seal → export → deploy → replicate → monitor.

Test on a clone first

Before sealing, clone your appliance dataset to a test VM. Run your smoke tests against the clone. If everything passes, the original is ready to seal. The clone is destroyed. Production data is never touched during testing.

zfs clone rpool/srv/myapp@pre-seal rpool/test/myapp kvm-create test-myapp --disk rpool/test/myapp # run tests, then: zfs destroy rpool/test/myapp

Seal with kexport

Sealing clears machine-specific state: machine-id reset, SSH host keys removed, cloud-init enabled. The sealed image is clone-ready — every deployed instance generates its own identity on first boot.

kexport --format qcow2 --output /exports/myapp.qcow2 # clears: machine-id, SSH host keys # enables: cloud-init with multi-datasource config

Export formats

kexport produces qcow2 (KVM), vmdk (VMware), vhd (Hyper-V/Azure), ova (portable OVF), or raw. One command. The export is SCP’d to a remote host if configured, or saved locally.

kexport --format ova # for VMware vSphere import kexport --format vhd # for Azure or Hyper-V kexport --format raw # for bare metal or dd

Packer for automated builds

Drive kldload installs via Packer for repeatable, automated golden image builds. The kldload ISO accepts an answers file via kernel parameters — Packer passes it. Zero interaction, fully automated image pipeline.

packer build gitea-rocky9.pkr.hcl # unattended kldload install → configure → seal → export # every build is identical, every time

Fleet deployment with Terraform

Use the golden image as a Terraform data source. Provision N instances from one image. Each gets its WireGuard keys, hostname, and cloud-init user-data. The entire fleet is identical at the OS and application layer.

module "web_fleet" { source = "./modules/kldload-vm" image = "gitea-rocky9-20260115.qcow2" count = 5 wg_subnet = "10.8.3.0/24" }

The full lifecycle

Design → Build → Test (clone) → Seal → Export → Deploy (Packer/Terraform) → Replicate (syncoid) → Monitor (Prometheus/Grafana). Each stage is documented in your appliance manifest. The lifecycle is repeatable by anyone on your team.

// Day 1: you build the appliance // Day 30: your team deploys 20 instances from the image // Day 60: you update the image, Packer rebuilds, Terraform rolls

See the Packer & IaC Masterclass for the full automated build pipeline, Terraform module structure, and fleet management patterns.


9. Contributing your recipe

The appliance recipes on this site started as someone’s custom build. They built a thing, documented what they did, and shared it. The recipe format is simple: architecture diagram, ZFS dataset layout, install steps, configuration, verification procedure, monitoring setup. That’s it.

If you’ve built something with kldload that isn’t documented here — a workload, a use case, a clever combination of layers — it belongs here. Someone else will read it, think “I could apply this to my thing,” and build something you never imagined. That’s how the re-packer grows.

Recipe format

Architecture diagram

A simple text or ASCII diagram showing: which services run, which datasets they use, which network interfaces they listen on, where replication goes. Enough for someone to understand the topology without reading the steps.

ZFS layout

Every dataset, with properties. Pool structure. Snapshot policy. Encryption if applicable. This section alone is worth a recipe — ZFS decisions are the hardest to reverse.

Install steps

Commands only. No explanation of what dnf does. No apologies for complexity. Just: run this, then this, then this. Tested from scratch. Every command verified.

Configuration

The minimum viable config for the service. Annotated with the “why” for non-obvious options. The full config lives in the repo; the recipe links to it.

Verification

How do you know it works? What command confirms the service is healthy? What does correct output look like? A recipe without a verification section is a guess.

Monitoring

Which metrics matter for this workload? What’s the alert threshold? Which eBPF tool catches the common failure mode? One paragraph. This is the section people read at 2am.

How to submit

  • GitHub PR: github.com/kldload/kldload — add your recipe HTML to recipes/, follow the existing file structure
  • Discord: discord.gg/QX8wf38N3V — post in #recipes and someone will help format and merge it
  • Rough draft is fine: write what you built, in plain English, and open a PR. We’ll help with HTML formatting. The knowledge is the contribution — the formatting is secondary.

Related pages