| pick your distro, get ZFS on root
kldload — your platform, your way, free
Source

Cluster Setup & Blue/Green Deployments

This tutorial builds a sample cluster on a single kldload KVM host, then shows the ZFS superpower that makes kldload unique: duplicating entire infrastructure instantly for blue/green deployments, upgrade testing, and disaster recovery. Everything uses tools that ship with the kvm profile — kvm-create, kvm-clone, kvm-snap.

The idea: On traditional infrastructure, testing an upgrade means building a second environment from scratch — hours of work, double the hardware. On kldload, you kvm-clone every VM in your cluster in seconds. The clone is a byte-identical copy that uses zero extra disk space (ZFS copy-on-write). Run the upgrade on the clone. If it works, cut traffic over. If it doesn't, destroy the clone. Zero risk. Zero wasted time. Zero wasted disk.

Blue/green deployment usually means two identical production environments — one live (blue), one idle (green). You deploy to green, test, then swap traffic. The problem: maintaining two full environments is expensive. ZFS clones eliminate the cost. The "green" environment is a set of ZFS clones that share all blocks with blue until they diverge. You're not running two environments — you're running one environment with a zero-cost shadow copy.

The sample cluster

We'll build an 8-node cluster on a single kldload KVM host, then clone the entire thing for blue/green upgrades.

VM Role RAM WG IP (wg1) Purpose
k8s-cp-1 control plane 4GB 10.200.0.1 K8s API + etcd
k8s-cp-2 control plane 4GB 10.200.0.2 K8s API + etcd
k8s-cp-3 control plane 4GB 10.200.0.3 K8s API + etcd
k8s-worker-1 worker 8GB 10.200.0.11 Application pods
k8s-worker-2 worker 8GB 10.200.0.12 Application pods
k8s-worker-3 worker 8GB 10.200.0.13 Application pods
db-1 database 8GB 10.200.0.20 PostgreSQL on ZFS
monitor observability 4GB 10.200.0.30 Prometheus + Grafana

Total: 48GB RAM. Fits on a single 64GB kldload KVM host.


Step 1 — Build golden images

Two golden images cover the whole cluster. One for K8s nodes (containerd + kubeadm pre-installed), one for general services (database, monitoring). Every VM in the cluster is a clone of one of these. Building the goldens takes 20 minutes. Cloning 8 VMs takes 10 seconds.
# Golden image for K8s nodes (see the Kubernetes on KVM tutorial for full bake steps)
kvm-create golden-k8s --ram 4096 --cpus 4 --disk 40 \
  --iso /var/lib/libvirt/isos/kldload-free-latest.iso --os centos-stream9

# Install kldload (server profile), then inside the VM:
# - install containerd, kubeadm, kubectl, kubelet
# - pre-pull K8s images
# - seal: clear machine-id, hostname, SSH keys
# - poweroff

kvm-snap golden-k8s   # snapshot the golden image

# Golden image for services (database, monitoring)
kvm-create golden-svc --ram 4096 --cpus 2 --disk 40 \
  --iso /var/lib/libvirt/isos/kldload-free-latest.iso --os centos-stream9

# Install kldload (server profile), install PostgreSQL, Prometheus, Grafana
# Seal and snapshot
kvm-snap golden-svc

Step 2 — Clone the cluster (10 seconds)

# Clone K8s control plane nodes
for i in 1 2 3; do
  kvm-clone golden-k8s k8s-cp-${i}
done

# Clone K8s workers
for i in 1 2 3; do
  kvm-clone golden-k8s k8s-worker-${i}
done

# Clone service nodes
kvm-clone golden-svc db-1
kvm-clone golden-svc monitor

# 8 VMs created. Total time: ~10 seconds.
# Total extra disk used: ~0 bytes (ZFS CoW — shares blocks with golden images)
# Start everything
for vm in k8s-cp-{1,2,3} k8s-worker-{1,2,3} db-1 monitor; do
  virsh start ${vm}
done

# Set hostnames
for vm in k8s-cp-{1,2,3} k8s-worker-{1,2,3} db-1 monitor; do
  ssh root@${vm} "hostnamectl set-hostname ${vm}"
done

Step 3 — Configure the cluster

With all 8 VMs running, configure WireGuard mesh, K8s, database, and monitoring. See the individual tutorials for each component:

# Once everything is configured and running, snapshot the entire cluster
for vm in k8s-cp-{1,2,3} k8s-worker-{1,2,3} db-1 monitor; do
  kvm-snap ${vm}
done

# This is your "known-good" baseline. Every VM has a rollback point.
At this point you have a working 8-node cluster: 3 K8s control planes, 3 workers, a PostgreSQL database, and a Prometheus/Grafana monitoring stack. All on WireGuard, all on ZFS. The golden images are still there — you can spin up more workers in seconds with kvm-clone golden-k8s k8s-worker-4. The "known-good" snapshots mean you can roll any node back to this exact state. Now comes the fun part.

The blue/green exercise: duplicate the entire cluster

The exercise: You need to upgrade Kubernetes from 1.30 to 1.31. On traditional infrastructure, you either upgrade in-place (risky — no rollback) or build a second cluster from scratch (hours of work, double the hardware). On kldload, you clone the entire cluster in 10 seconds.

Clone blue → green

# Clone every blue VM to create the green cluster
for vm in k8s-cp-{1,2,3} k8s-worker-{1,2,3} db-1 monitor; do
  kvm-clone ${vm} green-${vm}
done

# 8 green VMs created. Total time: ~10 seconds.
# Total extra disk: ~0 bytes until green diverges from blue.
# Check disk usage — clones share blocks with the originals
zfs list -o name,used,refer -r rpool/vms | grep -E 'k8s|db|monitor|green'

# NAME                    USED   REFER
# rpool/vms/k8s-cp-1      8.2G   8.2G
# rpool/vms/green-k8s-cp-1  64K   8.2G   ← 64K! shares blocks with blue
# rpool/vms/k8s-worker-1 12.1G  12.1G
# rpool/vms/green-k8s-worker-1 64K 12.1G  ← same — near-zero until it diverges
Read those numbers again. The blue cluster uses ~70GB of disk. The green clone uses ~512K total — essentially zero. Both clusters reference the same blocks through ZFS copy-on-write. Only when green writes something different (the K8s upgrade, new container images, updated configs) does it consume additional space — and only for the changed blocks. This is how you run blue/green on a single host without doubling your storage. Traditional blue/green requires 2x the hardware. ZFS blue/green requires 2x nothing.

Upgrade green

# Start the green cluster (with different network config so it doesn't conflict)
for vm in green-k8s-cp-{1,2,3} green-k8s-worker-{1,2,3} green-db-1 green-monitor; do
  virsh start ${vm}
done

# SSH into green control plane, upgrade K8s
ssh root@green-k8s-cp-1
kubeadm upgrade plan
kubeadm upgrade apply v1.31.0
# ... upgrade kubelet on all green nodes ...

# Test green: run your test suite, check pod health, verify services
kubectl get nodes
kubectl get pods --all-namespaces

Decision time

Green works — cut over

Shut down blue. Reassign green's network config to match blue's. Green becomes the new production. Blue becomes the rollback.

# Shut down blue
for vm in k8s-cp-{1,2,3} k8s-worker-{1,2,3} db-1 monitor; do
  virsh shutdown ${vm}
done

# Green is now production.
# Blue VMs are still there — instant rollback if needed.

Green is broken — destroy it

Destroy green. Blue never stopped running. Zero downtime. Zero impact. Try again tomorrow.

# Destroy green — blue never stopped running
for vm in green-k8s-cp-{1,2,3} green-k8s-worker-{1,2,3} green-db-1 green-monitor; do
  virsh destroy ${vm} 2>/dev/null
  virsh undefine ${vm} --nvram 2>/dev/null
  zfs destroy rpool/vms/${vm}
done

# Blue is untouched. Nothing happened. Try again later.
This is the key insight: the clone IS the test environment AND the potential new production. You didn't build a second cluster — you duplicated the first one. The green cluster has the same data, the same configs, the same state as blue at the moment of cloning. The upgrade runs on real data, not synthetic test data. If it works on green, it will work in production because green IS production (plus the upgrade). This is the most reliable upgrade testing possible — and it takes 10 seconds to set up.

Beyond upgrades: what else you can clone

Test a database migration

# Clone just the database
kvm-clone db-1 db-test

# Run the migration on db-test
# If it works: apply to db-1
# If it breaks: destroy db-test, nothing happened

Load testing

# Clone the entire cluster
# Point a load generator at green
# See how it performs under stress
# Destroy green when done

Security audit

# Clone the cluster
# Run penetration tests against green
# No risk to production — green is disposable

Training / onboarding

# Clone the cluster for each new team member
# They get a full copy of production to learn on
# Can't break anything real
# Destroy when done
Every one of these use cases would traditionally require building a separate environment — hours of work, dedicated hardware, configuration drift from production. With ZFS clones, it's kvm-clone and go. The clone is production-identical by definition because it IS production at the block level. When you're done, zfs destroy and the space comes back instantly. This is what "infrastructure as cattle, not pets" actually means when you have the right storage layer underneath.

Per-node rollback with kvm-snap

Blue/green is for cluster-wide changes. For single-node issues, use per-VM snapshots:

# Before updating a single node
kvm-snap k8s-worker-2

# Apply the change
ssh root@k8s-worker-2 "dnf update -y && reboot"

# If it breaks
kvm-snap k8s-worker-2 rollback

# The node is back to its exact pre-update state.
# Kubernetes reschedules pods automatically.

DR: replicate the cluster to another site

# Replicate every VM to the DR host over WireGuard
for vm in k8s-cp-{1,2,3} k8s-worker-{1,2,3} db-1 monitor; do
  kvm-replicate rpool/vms/${vm} dr-host
done

# Run hourly with a systemd timer for continuous DR
# RPO = 1 hour. RTO = time to boot VMs on the DR host (minutes).
The same ZFS send/receive that powers kvm-clone also powers DR replication. First send is a full copy. Every subsequent send is incremental — only the blocks that changed. An 8-node cluster with 70GB of total disk that changes 2GB/hour sends 2GB/hour to the DR site. Not 70GB. Not a full backup. Just the delta. This is the same technology that NetApp, Pure Storage, and every enterprise SAN charges thousands per year for. It's built into ZFS.

Where to go next