AWS has 200+ services. You need about 12 of them. The rest exist to lock you in,
bill you for breathing, and make your infrastructure so entangled with proprietary
APIs that leaving feels like open-heart surgery. Every single critical service
AWS provides has an open-source equivalent that runs on bare metal. VPC? That's
VXLAN + Open vSwitch. Route 53? PowerDNS. ELB? HAProxy. EC2? KVM. S3? MinIO.
The difference is: when you build it yourself, there's no egress fee, no surprise
bill, and no vendor who can raise prices 30% because they feel like it.
This recipe is the advanced tier. Build Your Own Cloud
got your services running. Multi-Site Cloud replicated
them across regions. This page turns that into a production-grade cloud platform
with enterprise networking, dynamic routing, overlay networks, multi-tenant isolation,
load balancing, and an API-driven control plane. The full stack. No training wheels.
This is the capstone recipe. It ties together every masterclass on the site: ZFS for storage, WireGuard for encrypted transport, BGP for dynamic routing, VXLAN/EVPN for overlay networking, Cilium for K8s networking, eBPF for observability, nftables for firewalling, DNS for service discovery, systemd for service management, Packer for image automation, and backplane networking for the invisible infrastructure underneath. If you've read the masterclasses, this page shows you what it looks like when they're all running together as one platform.
Prerequisites: You should have completed the
Multi-Site Cloud recipe first. This builds on that
foundation — WireGuard mesh, ZFS replication, and multi-node infrastructure
are assumed to be in place.
What you're replacing
AWS Service
Open-Source Replacement
What it actually does
VPC / Subnets
VXLAN + Open vSwitch
Virtual network overlays with tenant isolation
Route Tables / Transit Gateway
FRRouting (BGP + OSPF)
Dynamic routing between sites and networks
ELB / ALB / NLB
HAProxy + keepalived
Layer 4/7 load balancing with health checks
Route 53
PowerDNS + CoreDNS
Authoritative + internal DNS with API
EC2
KVM + libvirt
Virtual machines on bare metal
ECS / Fargate
Nomad or Kubernetes
Container orchestration
S3
MinIO
S3-compatible object storage on ZFS
CloudWatch
Prometheus + Grafana + Loki
Metrics, dashboards, log aggregation
IAM
Keycloak
Identity, SSO, RBAC, OIDC
ACM (certificates)
step-ca + ACME
Internal PKI and automatic cert issuance
CloudFormation
Terraform + Ansible
Infrastructure as code
API Gateway
Kong or Traefik
API routing, rate limiting, auth
The equivalent cloud bill for this stack across 3 regions is significant.
On bare metal, the same capabilities cost a fraction. And you own it.
Architecture
This is no longer "three servers with WireGuard." This is a proper cloud fabric —
overlay networks carrying tenant traffic, underlay networks carrying control plane
traffic, dynamic routing protocols making forwarding decisions, and a load balancer
tier accepting traffic from the internet. Every component you'd find in an AWS
region, except it's open source and you can actually read the config files.
The underlay is the physical network (or WireGuard tunnels between sites). It carries
control plane traffic: BGP route advertisements, ZFS replication, SSH management.
The overlay is VXLAN — virtual Layer 2 networks that ride on top of the underlay.
Tenant VMs and containers live on the overlay. They think they're on their own private
LAN, but they're actually encapsulated in UDP packets flying between sites.
The underlay is the highway system. The overlay is the postal system — letters (packets) ride inside trucks (VXLAN tunnels) on the highway, but the sender and receiver only see addresses.
Why not just use more WireGuard tunnels?
WireGuard is perfect for site-to-site and remote access. But it's point-to-point — you'd need
N² tunnels for N networks, and it doesn't do multi-tenancy, broadcast domains, or dynamic
membership. VXLAN + OVS gives you thousands of isolated virtual networks, dynamic VTEP
discovery via BGP EVPN, and the ability to live-migrate VMs between hosts without
reconfiguring anything. It's what the cloud providers actually use under the hood.
Step 1: FRRouting — dynamic routing with BGP and OSPF
Static routes are fine for three servers. They're a nightmare for thirty. FRRouting
is the open-source routing suite that runs on every major ISP and cloud provider's
edge network. It speaks BGP, OSPF, IS-IS, BFD, EVPN — the same protocols that
route the actual internet. We're going to use it for two things: OSPF for fast
internal convergence within a site, and BGP for policy-based routing between sites.
OSPF vs. BGP — when to use which
OSPF (Open Shortest Path First) is a link-state protocol. Every router knows the
complete topology and calculates shortest paths itself. It converges fast (sub-second with BFD)
and is perfect for internal routing within a site or campus. BGP (Border Gateway Protocol)
is a path-vector protocol. It's what connects autonomous systems on the internet — and it's
what connects your sites. BGP gives you policy control: prefer one path over another, prepend
AS paths to influence traffic, and gracefully drain a site before maintenance.
OSPF is GPS navigation within a city — it knows every street and picks the fastest route. BGP is the highway system between cities — it knows which highways exist and lets you choose based on policy (toll roads, speed, congestion).
# Install FRRouting on all nodes (CentOS/RHEL/Rocky)
dnf install -y frr frr-pythontools
# Enable the daemons we need
sed -i 's/bgpd=no/bgpd=yes/' /etc/frr/daemons
sed -i 's/ospfd=no/ospfd=yes/' /etc/frr/daemons
sed -i 's/bfdd=no/bfdd=yes/' /etc/frr/daemons
sed -i 's/zebra=no/zebra=yes/' /etc/frr/daemons
systemctl enable --now frr
OSPF — internal routing within each site
OSPF discovers neighbors automatically and builds a complete map of the internal
network. When a link goes down, every router knows within milliseconds (with BFD)
and recalculates paths. No manual route updates. No "oh, someone forgot to add
a static route" at 3am.
# Site A — /etc/frr/frr.conf (OSPF section)
cat >> /etc/frr/frr.conf << 'FRR'
!
! ─── OSPF: internal routing ───────────────────────────
router ospf
ospf router-id 10.10.0.1
! Advertise all internal networks
network 10.10.0.0/24 area 0.0.0.0
network 10.100.0.0/16 area 0.0.0.0
! Fast convergence with BFD
passive-interface default
no passive-interface wg0
no passive-interface br-mgmt
!
! BFD — sub-second failure detection
bfd
peer 10.10.0.2
no shutdown
!
peer 10.10.0.3
no shutdown
!
!
FRR
BGP — inter-site routing with policy
Each site gets its own private ASN (65001, 65002, 65003). BGP peers over the
WireGuard mesh. This gives you fine-grained control over which site handles
which traffic, the ability to drain a site for maintenance, and automatic
failover when a site goes down.
# Site A (AS 65001) — /etc/frr/frr.conf (BGP section)
cat >> /etc/frr/frr.conf << 'FRR'
!
! ─── BGP: inter-site routing ──────────────────────────
router bgp 65001
bgp router-id 10.10.0.1
bgp log-neighbor-changes
bgp bestpath as-path multipath-relax
!
! Neighbors — peer over WireGuard
neighbor 10.10.0.2 remote-as 65002
neighbor 10.10.0.2 description Site-B-Frankfurt
neighbor 10.10.0.2 bfd
neighbor 10.10.0.2 timers 10 30
!
neighbor 10.10.0.3 remote-as 65003
neighbor 10.10.0.3 description Site-C-HomeLab
neighbor 10.10.0.3 bfd
neighbor 10.10.0.3 timers 10 30
!
! Address family — advertise service networks
address-family ipv4 unicast
network 10.100.0.0/16
network 172.20.0.0/14
! Prefer local exit (lower MED = preferred)
neighbor 10.10.0.2 route-map SITE-B-OUT out
neighbor 10.10.0.3 route-map SITE-C-OUT out
! Accept all from peers
neighbor 10.10.0.2 route-map ACCEPT-ALL in
neighbor 10.10.0.3 route-map ACCEPT-ALL in
exit-address-family
!
! EVPN address family — for VXLAN overlay routing
address-family l2vpn evpn
neighbor 10.10.0.2 activate
neighbor 10.10.0.3 activate
advertise-all-vni
exit-address-family
!
! ─── Route maps ───────────────────────────────────────
route-map ACCEPT-ALL permit 10
!
route-map SITE-B-OUT permit 10
set metric 100
!
route-map SITE-C-OUT permit 10
set metric 200
!
! ─── Prefix lists (safety) ────────────────────────────
ip prefix-list INTERNAL seq 10 permit 10.0.0.0/8 le 24
ip prefix-list INTERNAL seq 20 permit 172.16.0.0/12 le 24
ip prefix-list INTERNAL seq 100 deny any
!
FRR
# Reload FRR without restarting
systemctl reload frr
Need to reboot Site A for maintenance? Don't just pull the plug. Use BGP to
gracefully drain traffic first. Prepend the AS path to make Site A's routes
less preferred — traffic shifts to Site B in seconds. Do your work. Remove
the prepend. Traffic flows back. Zero downtime. This is how every ISP and
cloud provider does it.
It's like putting up a "lane closed ahead" sign. Traffic merges to the other lanes before you start construction, not after.
# Drain Site A before maintenance
vtysh << 'DRAIN'
configure terminal
route-map DRAIN-OUT permit 10
set as-path prepend 65001 65001 65001
!
router bgp 65001
address-family ipv4 unicast
neighbor 10.10.0.2 route-map DRAIN-OUT out
neighbor 10.10.0.3 route-map DRAIN-OUT out
exit-address-family
!
end
clear bgp * soft out
DRAIN
echo "Site A drained — traffic is now flowing through Site B"
echo "Wait 30 seconds for convergence, then do your maintenance"
# After maintenance — restore normal routing
vtysh << 'RESTORE'
configure terminal
no route-map DRAIN-OUT
router bgp 65001
address-family ipv4 unicast
neighbor 10.10.0.2 route-map SITE-B-OUT out
neighbor 10.10.0.3 route-map SITE-C-OUT out
exit-address-family
!
end
clear bgp * soft out
RESTORE
Step 2: VXLAN + Open vSwitch — the network fabric
This is the heart of the cloud. VXLAN (Virtual Extensible LAN) creates isolated
Layer 2 overlay networks on top of your physical/WireGuard underlay. Each tenant,
environment, or workload gets its own VXLAN segment identified by a VNI (VXLAN
Network Identifier). There are 16 million possible VNIs. AWS calls these "VPCs."
We call them "a few OVS commands."
What VXLAN actually does
Take an Ethernet frame from a VM. Wrap it in a UDP packet. Send it across the
underlay to another host. Unwrap it. Deliver it to the destination VM. The VMs
think they're on the same Layer 2 switch, even if they're on different continents.
The encapsulation uses UDP port 4789, and each VXLAN segment is identified by a
24-bit VNI in the header — giving you 16,777,216 isolated networks. AWS charges
extra for each VPC. You get 16 million of them for free.
VXLAN is a tunnel that makes two switches on different continents look like they're the same switch. The VNI is the VLAN tag, but with 16 million possible values instead of 4,096.
# Install Open vSwitch on all nodes
dnf install -y openvswitch libibverbs
systemctl enable --now openvswitch
# Verify
ovs-vsctl show
Create the OVS bridge and VXLAN tunnels
# On Site A (10.10.0.1) — create the main OVS bridge
ovs-vsctl add-br br-overlay
# Add VXLAN tunnel ports to other sites
# key=flow means VNI is determined per-flow, not per-tunnel
ovs-vsctl add-port br-overlay vxlan-site-b -- \
set interface vxlan-site-b type=vxlan \
options:remote_ip=10.10.0.2 \
options:key=flow \
options:dst_port=4789
ovs-vsctl add-port br-overlay vxlan-site-c -- \
set interface vxlan-site-c type=vxlan \
options:remote_ip=10.10.0.3 \
options:key=flow \
options:dst_port=4789
# Verify tunnel setup
ovs-vsctl show
Create tenant networks (VNIs)
Each tenant or environment gets its own VNI. OpenFlow rules on the OVS bridge
enforce isolation — traffic from VNI 100 can never reach VNI 200 unless you
explicitly route between them.
# /usr/local/bin/cloud-network
cat > /usr/local/bin/cloud-network << 'SCRIPT'
#!/bin/bash
set -euo pipefail
ACTION="${1:-help}"
VNI="${2:-}"
NAME="${3:-}"
SUBNET="${4:-}"
case "$ACTION" in
create)
[ -z "$VNI" ] || [ -z "$NAME" ] || [ -z "$SUBNET" ] && {
echo "Usage: $0 create "
echo " e.g. $0 create 100 tenant-a 10.100.0.0/24"
exit 1
}
echo "Creating network: $NAME (VNI $VNI, subnet $SUBNET)"
# Create internal OVS port for this VNI
ovs-vsctl add-port br-overlay "vni-$VNI" \
tag="$VNI" -- \
set interface "vni-$VNI" type=internal
# Assign the gateway IP (first usable address)
GW_IP=$(echo "$SUBNET" | sed 's|0/|1/|')
ip addr add "$GW_IP" dev "vni-$VNI"
ip link set "vni-$VNI" up
# Add OpenFlow rules for VXLAN encap/decap
# Incoming: match VNI, deliver to local port
ovs-ofctl add-flow br-overlay \
"table=0,priority=100,tun_id=$VNI,actions=output:vni-$VNI"
# Outgoing: tag with VNI, send to all VXLAN tunnels
PORT_NUM=$(ovs-ofctl show br-overlay | grep "vni-$VNI" | awk -F'(' '{print $1}' | tr -d ' ')
ovs-ofctl add-flow br-overlay \
"table=0,priority=100,in_port=$PORT_NUM,actions=set_field:$VNI->tun_id,output:vxlan-site-b,output:vxlan-site-c"
# Enable DHCP for this network via dnsmasq
cat > "/etc/dnsmasq.d/vni-$VNI.conf" << EOF
interface=vni-$VNI
dhcp-range=${SUBNET%.*}.10,${SUBNET%.*}.250,255.255.255.0,12h
dhcp-option=option:router,$GW_IP
dhcp-option=option:dns-server,${SUBNET%.*}.1
EOF
systemctl restart dnsmasq
echo "Network $NAME (VNI $VNI) created"
echo " Gateway: $GW_IP"
echo " DHCP: ${SUBNET%.*}.10 - ${SUBNET%.*}.250"
;;
list)
echo "=== Active overlay networks ==="
ovs-vsctl list-ports br-overlay | grep "^vni-" | while read port; do
VNI_NUM="${port#vni-}"
IP=$(ip -4 addr show "$port" 2>/dev/null | grep inet | awk '{print $2}')
echo " VNI $VNI_NUM: $IP ($port)"
done
;;
delete)
[ -z "$VNI" ] && { echo "Usage: $0 delete "; exit 1; }
echo "Deleting network VNI $VNI"
ovs-ofctl del-flows br-overlay "tun_id=$VNI"
ovs-vsctl del-port br-overlay "vni-$VNI" 2>/dev/null || true
rm -f "/etc/dnsmasq.d/vni-$VNI.conf"
systemctl restart dnsmasq
echo "Network VNI $VNI deleted"
;;
*)
echo "Usage: $0 {create|list|delete} [VNI] [name] [subnet]"
echo ""
echo "Examples:"
echo " $0 create 100 production 10.100.0.0/24"
echo " $0 create 200 staging 10.200.0.0/24"
echo " $0 create 300 dev-team 10.30.0.0/24"
echo " $0 list"
echo " $0 delete 300"
;;
esac
SCRIPT
chmod +x /usr/local/bin/cloud-network
# Create the network fabric
cloud-network create 100 production 10.100.0.0/24
cloud-network create 200 staging 10.200.0.0/24
cloud-network create 300 development 10.30.0.0/24
cloud-network create 900 management 10.90.0.0/24
# Verify
cloud-network list
ovs-ofctl dump-flows br-overlay
Attach VMs to overlay networks
# When creating a KVM VM, attach it to a VXLAN network:
# 1. Create an OVS port for the VM
ovs-vsctl add-port br-overlay "vm-web-01" tag=100 -- \
set interface "vm-web-01" type=internal
# 2. Use the port as the VM's network interface in libvirt XML:
#
#
#
#
#
#
#
# The VM lands on VNI 100 (production network), gets DHCP,
# and can talk to other VNI 100 VMs across all sites.
Step 3: BGP EVPN — distributed VXLAN control plane
So far, VXLAN tunnels are statically configured between sites. That works for
3 nodes. For 30, you need a control plane that automatically discovers which
VMs are on which hosts and populates MAC/IP tables accordingly. That's BGP EVPN
(Ethernet VPN) — the same protocol that data centers use to scale VXLAN to
thousands of hosts.
What EVPN actually solves
Without EVPN, every VXLAN host floods broadcast traffic to every other host to
learn MAC addresses — just like a physical switch, but across your WAN links.
That's expensive. EVPN uses BGP to advertise MAC/IP bindings: "VM with MAC
aa:bb:cc:dd:ee:ff and IP 10.100.0.5 is reachable via VTEP 10.10.0.1, VNI 100."
Every other host installs that entry in its forwarding table. No flooding.
No wasted bandwidth. Just targeted, unicast delivery.
Without EVPN, finding a VM is like shouting in a crowded room. With EVPN, it's like checking a phone book.
Step 4: HAProxy + keepalived — production load balancing
Traffic from the internet needs to reach your services. HAProxy is the load
balancer that every high-traffic site secretly runs behind the scenes. It handles
Layer 4 (TCP) and Layer 7 (HTTP) load balancing, TLS termination, health checks,
rate limiting, and connection draining. Keepalived provides VRRP failover — if
the primary HAProxy node dies, the floating IP moves to the standby in under a second.
Why HAProxy instead of nginx/Caddy?
Nginx and Caddy are great reverse proxies. HAProxy is a great load balancer.
The difference matters at scale: HAProxy has connection-aware health checks
(not just HTTP pings), graceful connection draining (finish in-flight requests
before removing a backend), sticky sessions, circuit breakers, and a runtime
API that lets you add/remove backends without reloading config. It also handles
1M+ concurrent connections on a single core. There's a reason it's been the
industry standard for 20 years.
# Install HAProxy and keepalived
dnf install -y haproxy keepalived
cat > /etc/haproxy/haproxy.cfg << 'HAPROXY'
# ═══════════════════════════════════════════════════════
# kldload Production Cloud — HAProxy configuration
# ═══════════════════════════════════════════════════════
global
log /dev/log local0
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 50000
user haproxy
group haproxy
daemon
# Modern TLS only
ssl-default-bind-ciphersuites TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_GCM_SHA256
ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
# Runtime API — add/remove backends without reload
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 5s
timeout client 30s
timeout server 30s
timeout http-keep-alive 10s
timeout check 10s
maxconn 10000
# Health check defaults
default-server inter 3s fall 3 rise 2
# ─── Stats dashboard ──────────────────────────────────
listen stats
bind *:8404
mode http
stats enable
stats uri /
stats refresh 10s
stats admin if TRUE
stats show-legends
# ─── HTTPS frontend ───────────────────────────────────
frontend https-in
bind *:443 ssl crt /etc/haproxy/certs/ alpn h2,http/1.1
bind *:80
# Redirect HTTP to HTTPS
http-request redirect scheme https unless { ssl_fc }
# Route by hostname
acl host_app hdr(host) -i app.example.com
acl host_api hdr(host) -i api.example.com
acl host_grafana hdr(host) -i grafana.example.com
acl host_minio hdr(host) -i s3.example.com
use_backend app-servers if host_app
use_backend api-servers if host_api
use_backend grafana if host_grafana
use_backend minio if host_minio
default_backend app-servers
# Rate limiting — 100 requests/10s per IP
stick-table type ip size 100k expire 30s store http_req_rate(10s)
http-request track-sc0 src
http-request deny deny_status 429 if { sc_http_req_rate(0) gt 100 }
# ─── Backend: application servers ─────────────────────
backend app-servers
balance roundrobin
option httpchk GET /health
http-check expect status 200
# Sticky sessions via cookie
cookie SERVERID insert indirect nocache
# Graceful drain — finish in-flight requests
default-server inter 5s fall 3 rise 2 slowstart 60s
server app-a-1 10.100.0.10:8080 check cookie a1
server app-a-2 10.100.0.11:8080 check cookie a2
server app-b-1 10.200.0.10:8080 check cookie b1 backup
# ─── Backend: API servers ────────────────────────────
backend api-servers
balance leastconn
option httpchk GET /api/health
http-check expect status 200
server api-a-1 10.100.0.20:3000 check
server api-a-2 10.100.0.21:3000 check
server api-b-1 10.200.0.20:3000 check backup
# ─── Backend: Grafana ────────────────────────────────
backend grafana
balance roundrobin
option httpchk GET /api/health
server grafana-1 10.90.0.10:3000 check
# ─── Backend: MinIO ──────────────────────────────────
backend minio
balance leastconn
option httpchk GET /minio/health/live
http-check expect status 200
server minio-a-1 10.100.0.30:9000 check
server minio-a-2 10.100.0.31:9000 check
# ─── TCP frontend: PostgreSQL ────────────────────────
frontend postgres-in
mode tcp
bind *:5432
default_backend postgres-servers
backend postgres-servers
mode tcp
option pgsql-check user haproxy
server pg-primary 10.100.0.40:5432 check
server pg-standby 10.200.0.40:5432 check backup
HAPROXY
systemctl enable --now haproxy
# Runtime API examples — no config reload needed
# Add a new backend server
echo "add server app-servers/app-a-3 10.100.0.12:8080 check" | \
socat stdio /run/haproxy/admin.sock
# Drain a server before maintenance (finish in-flight, reject new)
echo "set server app-servers/app-a-1 state drain" | \
socat stdio /run/haproxy/admin.sock
# Check backend health
echo "show servers state" | socat stdio /run/haproxy/admin.sock
Step 5: PowerDNS + CoreDNS — the naming layer
AWS charges you per million DNS queries. Per million. For looking up names.
PowerDNS handles authoritative DNS (your public zones) with a PostgreSQL backend
and an HTTP API for dynamic updates. CoreDNS handles internal service discovery —
every VM and container gets a DNS name automatically.
Step 6: step-ca — internal PKI (your own certificate authority)
Every service in your cloud needs TLS. You're not going to get Let's Encrypt
certs for postgres-primary.cloud.internal. You need your own CA. step-ca
from Smallstep is an ACME-compatible certificate authority that runs on your
infrastructure. Services request certs automatically via the ACME protocol —
the same protocol Let's Encrypt uses — but against your internal CA.
# Install step CLI and step-ca
curl -fsSL https://dl.smallstep.com/gh-release/cli/docs-cli-install/v0.25.0/step-cli_amd64.rpm -o /tmp/step-cli.rpm
curl -fsSL https://dl.smallstep.com/gh-release/certificates/docs-ca-install/v0.25.0/step-ca_amd64.rpm -o /tmp/step-ca.rpm
dnf install -y /tmp/step-cli.rpm /tmp/step-ca.rpm
# Initialize the CA
step ca init \
--name "kldload Production CA" \
--provisioner admin \
--dns "ca.cloud.internal" \
--dns "10.90.0.1" \
--address ":8443" \
--deployment-type standalone
# Enable ACME provisioner (Let's Encrypt-compatible)
step ca provisioner add acme --type ACME
# Start the CA
systemctl enable --now step-ca
# Any service can now request a certificate automatically:
step ca certificate "postgres.cloud.internal" \
server.crt server.key \
--ca-url https://ca.cloud.internal:8443 \
--root /root/.step/certs/root_ca.crt \
--not-after 720h
# Or use ACME (works with Caddy, HAProxy, Traefik, etc.):
# In Caddy:
# tls {
# ca https://ca.cloud.internal:8443/acme/acme/directory
# }
# Auto-renew with a cron job
cat > /etc/cron.d/cert-renew << 'EOF'
0 */12 * * * root step ca renew /etc/ssl/server.crt /etc/ssl/server.key --force 2>&1 | logger -t cert-renew
EOF
Step 7: Keycloak — identity and access management
AWS IAM is the thing that makes grown engineers cry. Keycloak does the same job —
SSO, RBAC, OIDC, SAML — but you can actually read the documentation without
needing a decoder ring. One login for Grafana, MinIO, Gitea, your apps, and
the cloud management API.
# Configure Grafana to use Keycloak SSO
cat >> /etc/grafana/grafana.ini << 'INI'
[auth.generic_oauth]
enabled = true
name = kldload SSO
client_id = grafana
client_secret = your-client-secret
scopes = openid profile email
auth_url = https://auth.example.com/realms/cloud/protocol/openid-connect/auth
token_url = https://auth.example.com/realms/cloud/protocol/openid-connect/token
api_url = https://auth.example.com/realms/cloud/protocol/openid-connect/userinfo
role_attribute_path = contains(realm_access.roles[*], 'admin') && 'Admin' || 'Viewer'
INI
# Configure MinIO to use Keycloak
mc admin config set homelab identity_openid \
config_url="https://auth.example.com/realms/cloud/.well-known/openid-configuration" \
client_id="minio" \
claim_name="policy" \
scopes="openid"
Step 8: Consul — service mesh and discovery
You have services running across three sites on overlay networks. How does
HAProxy know which backends are healthy? How does CoreDNS know which IP
belongs to postgres.cloud.internal? Consul. It's the glue — service
registration, health checking, KV store, and service mesh in one binary.
# Install Consul on all nodes
dnf install -y consul
# Server config (run on 3 or 5 nodes for quorum)
cat > /etc/consul.d/consul.hcl << 'HCL'
datacenter = "site-a"
data_dir = "/opt/consul"
server = true
bootstrap_expect = 3
bind_addr = "10.10.0.1"
client_addr = "0.0.0.0"
ui_config {
enabled = true
}
# WAN federation between sites
retry_join_wan = ["10.10.0.2", "10.10.0.3"]
# DNS interface for CoreDNS integration
ports {
dns = 8600
}
# Enable service mesh (Connect)
connect {
enabled = true
}
# TLS via step-ca
tls {
defaults {
ca_file = "/etc/consul.d/certs/ca.pem"
cert_file = "/etc/consul.d/certs/server.pem"
key_file = "/etc/consul.d/certs/server-key.pem"
verify_incoming = true
verify_outgoing = true
}
}
HCL
systemctl enable --now consul
# Register a service with health check
cat > /etc/consul.d/services/postgres.hcl << 'HCL'
service {
name = "postgres"
port = 5432
tags = ["primary", "production"]
check {
id = "postgres-tcp"
name = "PostgreSQL TCP"
tcp = "localhost:5432"
interval = "5s"
timeout = "2s"
}
check {
id = "postgres-query"
name = "PostgreSQL Query"
args = ["/usr/local/bin/pg-health-check"]
interval = "10s"
timeout = "5s"
}
}
HCL
consul reload
# Query services
consul catalog services
consul catalog nodes -service=postgres
dig @127.0.0.1 -p 8600 postgres.service.consul SRV
Step 9: Production observability stack
You can't run a cloud you can't see. The production stack is three pillars:
metrics (Prometheus), logs (Loki), and traces (Tempo). All feeding
into Grafana. All scraped automatically via Consul service discovery. No
more manually adding targets to prometheus.yml.
# Prometheus config — auto-discover services via Consul
cat > /etc/prometheus/prometheus.yml << 'PROM'
global:
scrape_interval: 15s
evaluation_interval: 15s
# ─── Auto-discovery from Consul ───────────────────────
scrape_configs:
- job_name: 'consul-services'
consul_sd_configs:
- server: 'localhost:8500'
services: []
relabel_configs:
# Use Consul service name as job label
- source_labels: [__meta_consul_service]
target_label: job
# Use Consul node name as instance label
- source_labels: [__meta_consul_node]
target_label: instance
# Add site label from Consul datacenter
- source_labels: [__meta_consul_dc]
target_label: site
# Only scrape services tagged 'metrics'
- source_labels: [__meta_consul_tags]
regex: .*,metrics,.*
action: keep
- job_name: 'node-exporter'
consul_sd_configs:
- server: 'localhost:8500'
services: ['node-exporter']
relabel_configs:
- source_labels: [__meta_consul_node]
target_label: instance
# ─── ZFS-specific metrics ──────────────────────────
- job_name: 'zfs-exporter'
static_configs:
- targets: ['10.10.0.1:9134', '10.10.0.2:9134', '10.10.0.3:9134']
labels:
tier: 'storage'
# ─── HAProxy stats ────────────────────────────────
- job_name: 'haproxy'
static_configs:
- targets: ['localhost:8404']
PROM
systemctl restart prometheus
The difference between "a bunch of servers" and "a cloud" is an API. You need
a way to say "create a VM on VNI 100 with 4 CPUs and 8GB RAM" and have it
happen. Here's a minimal control plane that ties together everything we've built.
# /usr/local/bin/cloud-ctl — the cloud management CLI
cat > /usr/local/bin/cloud-ctl << 'SCRIPT'
#!/bin/bash
set -euo pipefail
CMD="${1:-help}"
shift || true
case "$CMD" in
# ─── Network operations ──────────────────────────
network)
cloud-network "$@"
;;
# ─── VM operations ───────────────────────────────
vm-create)
NAME="${1:?Usage: cloud-ctl vm-create }"
VNI="${2:?}"
CPUS="${3:-2}"
RAM="${4:-4}"
DISK="${5:-50}"
echo "=== Creating VM: $NAME ==="
echo " Network: VNI $VNI"
echo " Resources: ${CPUS} vCPUs, ${RAM}GB RAM, ${DISK}GB disk"
# Create ZFS dataset for VM disk
zfs create -o volsize="${DISK}G" "rpool/vms/$NAME"
# Create OVS port
ovs-vsctl add-port br-overlay "tap-$NAME" tag="$VNI" -- \
set interface "tap-$NAME" type=internal
# Generate libvirt XML
cat > "/tmp/$NAME.xml" << VMXML
$NAME$RAM$CPUShvm
VMXML
virsh define "/tmp/$NAME.xml"
virsh start "$NAME"
# Register with Consul
consul services register -name="vm-$NAME" \
-tag="vni-$VNI" -tag="compute" \
-meta="cpus=$CPUS" -meta="ram=${RAM}G"
echo "=== VM $NAME is running ==="
virsh dominfo "$NAME"
;;
vm-list)
echo "=== Virtual Machines ==="
virsh list --all
echo ""
echo "=== ZFS VM Volumes ==="
zfs list -r rpool/vms -o name,volsize,used 2>/dev/null || echo "No VM volumes"
;;
vm-destroy)
NAME="${1:?Usage: cloud-ctl vm-destroy }"
echo "Destroying VM: $NAME"
virsh destroy "$NAME" 2>/dev/null || true
virsh undefine "$NAME" 2>/dev/null || true
zfs destroy "rpool/vms/$NAME" 2>/dev/null || true
ovs-vsctl del-port br-overlay "tap-$NAME" 2>/dev/null || true
consul services deregister -id="vm-$NAME" 2>/dev/null || true
echo "VM $NAME destroyed"
;;
vm-snapshot)
NAME="${1:?Usage: cloud-ctl vm-snapshot [label]}"
LABEL="${2:-manual-$(date +%s)}"
echo "Snapshotting VM $NAME as $LABEL"
# ZFS snapshot is instant — the VM doesn't even notice
zfs snapshot "rpool/vms/$NAME@$LABEL"
echo "Snapshot created: rpool/vms/$NAME@$LABEL"
;;
vm-clone)
SRC="${1:?Usage: cloud-ctl vm-clone }"
DST="${2:?}"
VNI="${3:?}"
echo "Cloning $SRC to $DST"
# Snapshot source, then clone — instant, copy-on-write
zfs snapshot "rpool/vms/$SRC@clone-$DST"
zfs clone "rpool/vms/$SRC@clone-$DST" "rpool/vms/$DST"
echo "Clone ready: $DST (near-zero space until data diverges)"
;;
# ─── Status ──────────────────────────────────────
status)
echo "=============================="
echo " Production Cloud Status"
echo " $(date '+%Y-%m-%d %H:%M:%S')"
echo "=============================="
echo ""
echo "--- Routing ---"
vtysh -c "show bgp summary" 2>/dev/null || echo "FRR not running"
echo ""
echo "--- Overlay Networks ---"
cloud-network list
echo ""
echo "--- VMs ---"
virsh list --all 2>/dev/null || echo "libvirt not running"
echo ""
echo "--- Services (Consul) ---"
consul catalog services 2>/dev/null || echo "Consul not running"
echo ""
echo "--- HAProxy ---"
echo "show stat" | socat stdio /run/haproxy/admin.sock 2>/dev/null | \
awk -F, '{printf " %-20s %-12s %s\n", $1, $2, $18}' | head -20 || \
echo "HAProxy not running"
echo ""
echo "--- ZFS ---"
zpool status -x
echo ""
echo "--- Storage ---"
zfs list -o name,used,avail,compressratio -r rpool | head -20
;;
*)
echo "cloud-ctl — kldload Production Cloud Management"
echo ""
echo "Usage: cloud-ctl [args]"
echo ""
echo "Network:"
echo " network create Create overlay network"
echo " network list List overlay networks"
echo " network delete Delete overlay network"
echo ""
echo "Compute:"
echo " vm-create [cpus] [ram] [disk] Create a VM"
echo " vm-list List all VMs"
echo " vm-destroy Destroy a VM"
echo " vm-snapshot [label] Snapshot a VM"
echo " vm-clone Clone a VM (instant)"
echo ""
echo "Status:"
echo " status Full cloud status"
;;
esac
SCRIPT
chmod +x /usr/local/bin/cloud-ctl
# Example workflow — deploy a web application
cloud-ctl network create 100 production 10.100.0.0/24
cloud-ctl vm-create web-01 100 4 8 50
cloud-ctl vm-create web-02 100 4 8 50
cloud-ctl vm-create db-01 100 8 32 200
# Clone a production VM for staging in under a second
cloud-ctl vm-clone web-01 staging-web-01 200
cloud-ctl vm-clone db-01 staging-db-01 200
# Snapshot before deploying
cloud-ctl vm-snapshot web-01 pre-deploy-v2.1
cloud-ctl vm-snapshot db-01 pre-deploy-v2.1
# Something broke? Rollback is instant.
# zfs rollback rpool/vms/web-01@pre-deploy-v2.1
The complete stack — AWS to open source translation
Every layer of this stack is open source, battle-tested in production at companies
far larger than yours, and runs on commodity hardware. There is no proprietary
component. No license key. No "contact sales for pricing." No vendor who can
hold your infrastructure hostage.
The total cost: $135–400/month in bare metal rentals + your home lab.
The AWS equivalent: $3,000–8,000/month, plus the invisible cost of being
locked into a platform that gets more expensive every year and harder to leave
every quarter. You're not saving money. You're buying freedom.
Layer
Tool
AWS Equivalent
Status
Network fabric
VXLAN + Open vSwitch
VPC
Step 2
Dynamic routing
FRRouting (BGP + OSPF)
Route Tables / TGW
Step 1
VXLAN control plane
BGP EVPN
VPC Peering
Step 3
Load balancing
HAProxy + keepalived
ELB / ALB / NLB
Step 4
DNS
PowerDNS + CoreDNS
Route 53
Step 5
PKI / Certificates
step-ca (ACME)
ACM
Step 6
Identity / SSO
Keycloak
IAM / Cognito
Step 7
Service mesh
Consul
App Mesh / Cloud Map
Step 8
Observability
Prometheus + Loki + Grafana
CloudWatch
Step 9
Control plane
cloud-ctl (custom)
AWS Console / CLI
Step 10
Compute
KVM + libvirt
EC2
Multi-Site recipe
Object storage
MinIO on ZFS
S3
Homelab recipe
Block storage
ZFS zvols
EBS
Built in
Snapshots
ZFS snapshots
EBS Snapshots
Built in
Replication
Syncoid over WireGuard
Cross-Region Replication
Multi-Site recipe
Encryption
ZFS native + WireGuard
KMS + VPN
Built in
Is this actually production-ready?
Every component in this stack runs in production at scale. FRRouting powers
ISP edge networks. HAProxy handles billions of requests per day at companies
like GitHub, Stack Overflow, and Airbnb. Open vSwitch runs in every major cloud
provider's data center. Consul runs at HashiCorp's own customers at massive
scale. The question isn't whether these tools are production-ready — they've
been production-ready for a decade. The question is whether you're ready to
stop paying someone else to run them for you.
The ingredients are the same ones the restaurants use. You're just cooking at home.
Where to go from here
Container orchestration — Add Nomad or Kubernetes for container workloads alongside KVM VMs. Nomad is simpler; Kubernetes has the ecosystem. Both integrate with Consul.
Firecracker microVMs — For serverless workloads, Firecracker boots a VM in 125ms. See the Serverless / Firecracker guide.
Ceph for distributed storage — When ZFS replication isn't enough and you need active-active storage across sites, Ceph provides distributed block/object/file storage. It's complex but proven.
Terraform provider — Wrap cloud-ctl in a Terraform provider for declarative infrastructure management. Libvirt already has one.
Multi-tenant billing — If you're selling this as a service, add usage metering with Prometheus and export to a billing system.
GPU passthrough — For ML workloads, pass NVIDIA GPUs through to KVM VMs. See the NVIDIA guide.
You just built an open-source AWS. Not a toy version. Not a demo. A production
cloud platform with overlay networking, dynamic routing, load balancing, service
discovery, internal PKI, identity management, and observability. On hardware
you own. With data sovereignty you control. For a fraction of the cost.
The cloud isn't a place. It's a set of patterns. VPCs are just VXLAN.
Route 53 is just DNS. ELB is just HAProxy. IAM is just Keycloak. EC2 is just
KVM. S3 is just MinIO. The cloud providers packaged these patterns, put a web
console on top, and charge you $8,000/month for the privilege. Now you know
how the trick works. And you can do it yourself.