Cluster Setup & Blue/Green Deployments
This tutorial builds a sample cluster on a single kldload KVM host, then shows the ZFS superpower that makes kldload unique: duplicating entire infrastructure instantly for blue/green deployments, upgrade testing, and disaster recovery. Everything uses tools that ship with the kvm profile — kvm-create, kvm-clone, kvm-snap.
The idea: On traditional infrastructure, testing an upgrade means
building a second environment from scratch — hours of work, double the hardware.
On kldload, you kvm-clone every VM in your cluster in seconds.
The clone is a byte-identical copy that uses zero extra disk space (ZFS copy-on-write).
Run the upgrade on the clone. If it works, cut traffic over. If it doesn't,
destroy the clone. Zero risk. Zero wasted time. Zero wasted disk.
The sample cluster
We'll build an 8-node cluster on a single kldload KVM host, then clone the entire thing for blue/green upgrades.
| VM | Role | RAM | WG IP (wg1) | Purpose |
|---|---|---|---|---|
| k8s-cp-1 | control plane | 4GB | 10.200.0.1 | K8s API + etcd |
| k8s-cp-2 | control plane | 4GB | 10.200.0.2 | K8s API + etcd |
| k8s-cp-3 | control plane | 4GB | 10.200.0.3 | K8s API + etcd |
| k8s-worker-1 | worker | 8GB | 10.200.0.11 | Application pods |
| k8s-worker-2 | worker | 8GB | 10.200.0.12 | Application pods |
| k8s-worker-3 | worker | 8GB | 10.200.0.13 | Application pods |
| db-1 | database | 8GB | 10.200.0.20 | PostgreSQL on ZFS |
| monitor | observability | 4GB | 10.200.0.30 | Prometheus + Grafana |
Total: 48GB RAM. Fits on a single 64GB kldload KVM host.
Step 1 — Build golden images
# Golden image for K8s nodes (see the Kubernetes on KVM tutorial for full bake steps)
kvm-create golden-k8s --ram 4096 --cpus 4 --disk 40 \
--iso /var/lib/libvirt/isos/kldload-free-latest.iso --os centos-stream9
# Install kldload (server profile), then inside the VM:
# - install containerd, kubeadm, kubectl, kubelet
# - pre-pull K8s images
# - seal: clear machine-id, hostname, SSH keys
# - poweroff
kvm-snap golden-k8s # snapshot the golden image
# Golden image for services (database, monitoring)
kvm-create golden-svc --ram 4096 --cpus 2 --disk 40 \
--iso /var/lib/libvirt/isos/kldload-free-latest.iso --os centos-stream9
# Install kldload (server profile), install PostgreSQL, Prometheus, Grafana
# Seal and snapshot
kvm-snap golden-svc
Step 2 — Clone the cluster (10 seconds)
# Clone K8s control plane nodes
for i in 1 2 3; do
kvm-clone golden-k8s k8s-cp-${i}
done
# Clone K8s workers
for i in 1 2 3; do
kvm-clone golden-k8s k8s-worker-${i}
done
# Clone service nodes
kvm-clone golden-svc db-1
kvm-clone golden-svc monitor
# 8 VMs created. Total time: ~10 seconds.
# Total extra disk used: ~0 bytes (ZFS CoW — shares blocks with golden images)
# Start everything
for vm in k8s-cp-{1,2,3} k8s-worker-{1,2,3} db-1 monitor; do
virsh start ${vm}
done
# Set hostnames
for vm in k8s-cp-{1,2,3} k8s-worker-{1,2,3} db-1 monitor; do
ssh root@${vm} "hostnamectl set-hostname ${vm}"
done
Step 3 — Configure the cluster
With all 8 VMs running, configure WireGuard mesh, K8s, database, and monitoring. See the individual tutorials for each component:
- WireGuard Mesh — connect all nodes with encrypted tunnels
- Kubernetes on KVM — kubeadm init + join on the cloned nodes
- Cilium — eBPF networking for the K8s cluster
- Backplane Networks — bind services to WG interfaces only
- Observability — Prometheus + Grafana on the monitor node
# Once everything is configured and running, snapshot the entire cluster
for vm in k8s-cp-{1,2,3} k8s-worker-{1,2,3} db-1 monitor; do
kvm-snap ${vm}
done
# This is your "known-good" baseline. Every VM has a rollback point.
kvm-clone golden-k8s k8s-worker-4. The "known-good" snapshots mean you can roll any node back to this exact state. Now comes the fun part.The blue/green exercise: duplicate the entire cluster
The exercise: You need to upgrade Kubernetes from 1.30 to 1.31. On traditional infrastructure, you either upgrade in-place (risky — no rollback) or build a second cluster from scratch (hours of work, double the hardware). On kldload, you clone the entire cluster in 10 seconds.
Clone blue → green
# Clone every blue VM to create the green cluster
for vm in k8s-cp-{1,2,3} k8s-worker-{1,2,3} db-1 monitor; do
kvm-clone ${vm} green-${vm}
done
# 8 green VMs created. Total time: ~10 seconds.
# Total extra disk: ~0 bytes until green diverges from blue.
# Check disk usage — clones share blocks with the originals
zfs list -o name,used,refer -r rpool/vms | grep -E 'k8s|db|monitor|green'
# NAME USED REFER
# rpool/vms/k8s-cp-1 8.2G 8.2G
# rpool/vms/green-k8s-cp-1 64K 8.2G ← 64K! shares blocks with blue
# rpool/vms/k8s-worker-1 12.1G 12.1G
# rpool/vms/green-k8s-worker-1 64K 12.1G ← same — near-zero until it diverges
Upgrade green
# Start the green cluster (with different network config so it doesn't conflict)
for vm in green-k8s-cp-{1,2,3} green-k8s-worker-{1,2,3} green-db-1 green-monitor; do
virsh start ${vm}
done
# SSH into green control plane, upgrade K8s
ssh root@green-k8s-cp-1
kubeadm upgrade plan
kubeadm upgrade apply v1.31.0
# ... upgrade kubelet on all green nodes ...
# Test green: run your test suite, check pod health, verify services
kubectl get nodes
kubectl get pods --all-namespaces
Decision time
Green works — cut over
Shut down blue. Reassign green's network config to match blue's. Green becomes the new production. Blue becomes the rollback.
# Shut down blue
for vm in k8s-cp-{1,2,3} k8s-worker-{1,2,3} db-1 monitor; do
virsh shutdown ${vm}
done
# Green is now production.
# Blue VMs are still there — instant rollback if needed.
Green is broken — destroy it
Destroy green. Blue never stopped running. Zero downtime. Zero impact. Try again tomorrow.
# Destroy green — blue never stopped running
for vm in green-k8s-cp-{1,2,3} green-k8s-worker-{1,2,3} green-db-1 green-monitor; do
virsh destroy ${vm} 2>/dev/null
virsh undefine ${vm} --nvram 2>/dev/null
zfs destroy rpool/vms/${vm}
done
# Blue is untouched. Nothing happened. Try again later.
Beyond upgrades: what else you can clone
Test a database migration
# Clone just the database
kvm-clone db-1 db-test
# Run the migration on db-test
# If it works: apply to db-1
# If it breaks: destroy db-test, nothing happened
Load testing
# Clone the entire cluster
# Point a load generator at green
# See how it performs under stress
# Destroy green when done
Security audit
# Clone the cluster
# Run penetration tests against green
# No risk to production — green is disposable
Training / onboarding
# Clone the cluster for each new team member
# They get a full copy of production to learn on
# Can't break anything real
# Destroy when done
kvm-clone and go. The clone is production-identical by definition because it IS production at the block level. When you're done, zfs destroy and the space comes back instantly. This is what "infrastructure as cattle, not pets" actually means when you have the right storage layer underneath.Per-node rollback with kvm-snap
Blue/green is for cluster-wide changes. For single-node issues, use per-VM snapshots:
# Before updating a single node
kvm-snap k8s-worker-2
# Apply the change
ssh root@k8s-worker-2 "dnf update -y && reboot"
# If it breaks
kvm-snap k8s-worker-2 rollback
# The node is back to its exact pre-update state.
# Kubernetes reschedules pods automatically.
DR: replicate the cluster to another site
# Replicate every VM to the DR host over WireGuard
for vm in k8s-cp-{1,2,3} k8s-worker-{1,2,3} db-1 monitor; do
kvm-replicate rpool/vms/${vm} dr-host
done
# Run hourly with a systemd timer for continuous DR
# RPO = 1 hour. RTO = time to boot VMs on the DR host (minutes).
Where to go next
- KVM Virtual Machines — the kvm profile, kvm-create/clone/snap/replicate tools
- Kubernetes on KVM — golden image workflow, kubeadm, Cilium
- Backplane Networks Masterclass — build the invisible encrypted fabric
- ZFS Masterclass — advanced ZFS: send/receive, encryption, tuning
- Packer & IaC Masterclass — automate image builds, deploy to any cloud