Load Balancing & HA Masterclass
This guide goes deep on load balancing and high availability — HAProxy, keepalived,
Traefik, and Caddy — grounded in the kldload stack. If you have web servers, APIs,
or any service that needs to stay up when a node dies, this is the masterclass for
you. Zero-to-hero: from the first frontend block to a full two-node HA cluster
with a floating IP, WireGuard backends, and ZFS config snapshots.
1. Every Production Service Needs a Load Balancer
A single server is a single point of failure. Load balancers distribute traffic across multiple backends, health-check them continuously, and remove failed ones automatically. When a backend goes down, the load balancer stops sending it traffic — in seconds, not after a human notices and reacts. When you deploy a new version of your application, the load balancer enables zero-downtime rolling updates: take one server out of rotation, upgrade it, put it back, repeat.
On kldload, the load balancer runs on ZFS — config changes are snapshottable, upgrades are rollback-safe with boot environments, and the entire load balancer state can be replicated to a standby node with zfs send | zfs recv. WireGuard backs the backend pool: backends are reachable only via the WireGuard backplane, invisible from the internet. Health checks travel over encrypted tunnels. The backend pool has no public IPs. Port scans see one IP with one service. The load balancer is the only thing that exists.
This masterclass covers HAProxy (the industry standard — Layer 4 and Layer 7), keepalived (VRRP floating IPs for active/passive HA), Traefik (auto-discovery for container environments), and Caddy (automatic HTTPS with the simplest config format that exists).
2. HAProxy Fundamentals
HAProxy is the most widely deployed load balancer in the world. GitHub, Stack Overflow, Airbnb, and hundreds of other high-traffic sites run it. It handles millions of connections per second on commodity hardware. The entire configuration is one file.
Layer 4 load balancing (TCP)
HAProxy forwards TCP connections without reading the application protocol. Works for anything: HTTP, HTTPS (passthrough), MySQL, Redis, SMTP. Fast, simple, no protocol knowledge required. The load balancer sees source IP and destination port, picks a backend, and forwards.
Layer 7 load balancing (HTTP)
HAProxy parses HTTP requests before routing. This enables path-based routing, header inspection, cookie-based session affinity, rate limiting, and ACL-based access control. HAProxy reads the URL, Host header, cookies — then makes a routing decision based on what it finds.
Frontend → Backend → Server
The HAProxy config model: a frontend listens on an IP and port. It evaluates ACLs and routes to a backend. A backend contains one or more server entries — the actual upstream hosts. One frontend can route to multiple backends based on rules.
Balance algorithms
roundrobin — send requests to backends in turn. leastconn — send to the backend with fewest active connections (best for long-lived connections). source — hash the client IP, always send to the same backend (sticky sessions). uri — hash the request URI, useful for caching tiers.
Install HAProxy on kldload
# CentOS / Rocky / RHEL
dnf install -y haproxy
# Debian / Ubuntu
apt-get install -y haproxy
# Enable and start
systemctl enable --now haproxy
# Config file
/etc/haproxy/haproxy.cfg
# Check config syntax without restarting
haproxy -c -f /etc/haproxy/haproxy.cfg
Basic HTTP load balancer — 3 web servers
global
log /dev/log local0
maxconn 50000
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5s
timeout client 30s
timeout server 30s
retries 3
frontend http_front
bind *:80
default_backend web_servers
backend web_servers
balance roundrobin
option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
server web1 10.10.0.11:8080 check inter 2s fall 3 rise 2
server web2 10.10.0.12:8080 check inter 2s fall 3 rise 2
server web3 10.10.0.13:8080 check inter 2s fall 3 rise 2
Breaking down the server line: check enables health checking. inter 2s — check every 2 seconds.
fall 3 — mark down after 3 consecutive failures. rise 2 — mark up after 2 consecutive successes.
HAProxy will not send traffic to a server marked down.
Health check types
# TCP check — just verify the port is open
server web1 10.10.0.11:8080 check
# HTTP check — verify the app returns a 200
option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
http-check expect status 200
# Custom HTTP check with expected string in body
http-check expect string "OK"
# Health check over a different port (separate management port)
server web1 10.10.0.11:8080 check port 9000 inter 2s fall 3 rise 2
haproxy -c, and reload with systemctl reload haproxy. That reload is zero-downtime — HAProxy keeps existing connections alive while loading the new config.3. HAProxy Layer 7 Features
Layer 7 means HAProxy reads the HTTP request before making a routing decision. This unlocks path-based routing, header inspection, rate limiting, and cookie-based session persistence — capabilities that a pure TCP load balancer cannot provide.
ACLs — route by path, header, or hostname
frontend http_front
bind *:80
bind *:443 ssl crt /etc/haproxy/certs/example.com.pem
# Define ACLs
acl is_api path_beg /api/
acl is_static path_beg /static/ /assets/ /img/
acl is_www hdr(host) -i www.example.com example.com
acl is_api_host hdr(host) -i api.example.com
# Route based on ACLs
use_backend api_servers if is_api_host
use_backend api_servers if is_api
use_backend static_files if is_static
default_backend web_servers
backend api_servers
balance leastconn
option httpchk GET /api/health
server api1 10.10.0.21:8000 check inter 2s fall 3 rise 2
server api2 10.10.0.22:8000 check inter 2s fall 3 rise 2
backend static_files
balance roundrobin
server static1 10.10.0.31:80 check
server static2 10.10.0.32:80 check
backend web_servers
balance roundrobin
server web1 10.10.0.11:8080 check inter 2s fall 3 rise 2
server web2 10.10.0.12:8080 check inter 2s fall 3 rise 2
SSL/TLS termination
# Terminate TLS at HAProxy, forward plaintext to backends
frontend https_front
bind *:443 ssl crt /etc/haproxy/certs/example.com.pem alpn h2,http/1.1
# Redirect HTTP to HTTPS
bind *:80
redirect scheme https code 301 if !{ ssl_fc }
default_backend web_servers
# Generate a PEM from cert + key (HAProxy wants both in one file)
cat fullchain.pem privkey.pem > /etc/haproxy/certs/example.com.pem
chmod 600 /etc/haproxy/certs/example.com.pem
Rate limiting with stick tables
frontend http_front
bind *:80
# Track source IPs in a stick table
stick-table type ip size 100k expire 30s store conn_rate(10s),http_req_rate(10s)
http-request track-sc0 src
# Block if more than 100 requests in 10 seconds
acl too_many_requests sc_http_req_rate(0) gt 100
http-request deny deny_status 429 if too_many_requests
default_backend web_servers
Session persistence with cookies
backend web_servers
balance roundrobin
# Insert a cookie to pin clients to a backend
cookie SERVERID insert indirect nocache
server web1 10.10.0.11:8080 check cookie web1
server web2 10.10.0.12:8080 check cookie web2
server web3 10.10.0.13:8080 check cookie web3
Connection limits
backend web_servers
# Never send more than 100 simultaneous connections to one server
server web1 10.10.0.11:8080 check maxconn 100
server web2 10.10.0.12:8080 check maxconn 100
# Queue connections above the limit, not reject
timeout queue 10s
WebSocket support
backend ws_servers
balance source
option http-server-close
# WebSocket requires HTTP/1.1 and connection upgrade
timeout tunnel 1h
server ws1 10.10.0.41:9000 check
server ws2 10.10.0.42:9000 check
4. keepalived — Floating IPs for HA
HAProxy on a single server is better than nothing, but it is still a single point of failure. keepalived solves this with VRRP (Virtual Router Redundancy Protocol) — a virtual IP that floats between two servers. When the active node fails, the passive takes the IP in under three seconds. Clients never know the difference. No DNS changes, no reconfiguration, no human intervention.
What VRRP does
VRRP creates a virtual IP address that is owned by one node at a time (the MASTER). The MASTER sends periodic advertisements to the BACKUP. If the BACKUP stops receiving advertisements, it takes ownership of the virtual IP. The transition is transparent: ARP announces the new MAC, clients reconnect immediately.
Active/passive vs active/active
Active/passive: one server handles all traffic, the other is a hot standby. Simple, no split-brain risk. Active/active: both servers handle traffic using different virtual IPs, each is the BACKUP for the other's VIP. Requires DNS round-robin or a third-party DNS LB to distribute clients across the two VIPs. More complex, but doubles throughput.
Install keepalived
# CentOS / Rocky / RHEL
dnf install -y keepalived
# Debian / Ubuntu
apt-get install -y keepalived
systemctl enable keepalived
Two HAProxy nodes sharing a floating IP
Assume: lb1 at 10.10.0.1, lb2 at 10.10.0.2, floating VIP 10.10.0.10. Both nodes run HAProxy with identical configs. keepalived decides which one owns the VIP.
# /etc/keepalived/keepalived.conf on lb1 (MASTER)
global_defs {
router_id lb1
script_user root
enable_script_security
}
vrrp_script check_haproxy {
script "/usr/bin/pgrep haproxy"
interval 2
weight -20
fall 2
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 150 # higher wins
advert_int 1
authentication {
auth_type PASS
auth_pass secretpass
}
virtual_ipaddress {
10.10.0.10/24
}
track_script {
check_haproxy
}
}
# /etc/keepalived/keepalived.conf on lb2 (BACKUP)
global_defs {
router_id lb2
script_user root
enable_script_security
}
vrrp_script check_haproxy {
script "/usr/bin/pgrep haproxy"
interval 2
weight -20
fall 2
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100 # lower than master
advert_int 1
authentication {
auth_type PASS
auth_pass secretpass
}
virtual_ipaddress {
10.10.0.10/24
}
track_script {
check_haproxy
}
}
The vrrp_script check_haproxy block is critical: it checks whether HAProxy is running,
not just whether the server is reachable. If HAProxy crashes on lb1 but the host is alive,
the weight penalty drops lb1's effective priority below lb2's, triggering failover.
Health check scripts are what make keepalived actually reliable — check the service,
not just the server.
# Start keepalived on both nodes
systemctl start keepalived
# Verify VIP ownership on lb1
ip addr show eth0 | grep 10.10.0.10
# Simulate failure: stop HAProxy on lb1
systemctl stop haproxy
# Verify VIP moved to lb2 within 3 seconds
ip addr show eth0 # on lb2 — should now show 10.10.0.10
# Check VRRP state
journalctl -u keepalived -f
5. Traefik — Auto-Discovery Load Balancer
HAProxy is the right tool when you have a stable list of backends and want maximum control. Traefik is the right tool when backends are ephemeral — Docker containers that start and stop, Kubernetes pods that reschedule. Traefik discovers backends automatically from Docker labels, Kubernetes ingress resources, or config files. Add a label to a container, Traefik routes to it. No config reload, no manual backend list maintenance.
Install Traefik on kldload
# Download the binary
curl -L https://github.com/traefik/traefik/releases/download/v3.1.0/traefik_v3.1.0_linux_amd64.tar.gz | tar xz
mv traefik /usr/local/bin/
# Or run as a container
docker run -d \
--name traefik \
-p 80:80 -p 443:443 -p 8080:8080 \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /etc/traefik:/etc/traefik \
traefik:v3.1
Static config — traefik.yml
# /etc/traefik/traefik.yml
api:
dashboard: true
insecure: false # enable dashboard at :8080 (bind to localhost only)
entryPoints:
web:
address: ":80"
http:
redirections:
entryPoint:
to: websecure
scheme: https
websecure:
address: ":443"
providers:
docker:
endpoint: "unix:///var/run/docker.sock"
exposedByDefault: false # require explicit label to expose a container
file:
directory: /etc/traefik/dynamic
watch: true
certificatesResolvers:
letsencrypt:
acme:
email: you@example.com
storage: /etc/traefik/acme.json
httpChallenge:
entryPoint: web
Docker Compose — auto-routing with labels
# docker-compose.yml
version: "3.8"
services:
traefik:
image: traefik:v3.1
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /etc/traefik:/etc/traefik
restart: unless-stopped
webapp:
image: nginx:alpine
labels:
- "traefik.enable=true"
- "traefik.http.routers.webapp.rule=Host(`www.example.com`)"
- "traefik.http.routers.webapp.entrypoints=websecure"
- "traefik.http.routers.webapp.tls.certresolver=letsencrypt"
- "traefik.http.services.webapp.loadbalancer.server.port=80"
restart: unless-stopped
api:
image: myapp:latest
labels:
- "traefik.enable=true"
- "traefik.http.routers.api.rule=Host(`api.example.com`)"
- "traefik.http.routers.api.entrypoints=websecure"
- "traefik.http.routers.api.tls.certresolver=letsencrypt"
- "traefik.http.services.api.loadbalancer.server.port=8000"
restart: unless-stopped
Traefik discovers both containers, gets Let's Encrypt certificates for both hostnames, and starts routing. No Traefik restart, no config file edit. Bring up a new container with the right labels and it is routed. Stop it and Traefik removes the route.
Kubernetes IngressRoute (Traefik CRD)
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: webapp
namespace: default
spec:
entryPoints:
- websecure
routes:
- match: Host(`www.example.com`)
kind: Rule
services:
- name: webapp-svc
port: 80
- match: Host(`api.example.com`) && PathPrefix(`/v2`)
kind: Rule
services:
- name: api-svc
port: 8000
tls:
certResolver: letsencrypt
6. Caddy — The Simplest HTTPS Server
Caddy is a web server and reverse proxy with one killer feature: automatic HTTPS. You write a hostname in the Caddyfile, Caddy gets a Let's Encrypt certificate, configures TLS, and handles renewal — forever. Zero certificate management. For small-to-medium deployments where you do not need HAProxy's full feature set, Caddy is the fastest path to production HTTPS.
Install Caddy on kldload
# CentOS / Rocky / RHEL
dnf install -y 'dnf-command(copr)'
dnf copr enable @caddy/caddy
dnf install -y caddy
# Debian / Ubuntu
apt-get install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | tee /etc/apt/sources.list.d/caddy-stable.list
apt-get update && apt-get install -y caddy
systemctl enable --now caddy
Caddyfile — multi-site with automatic TLS in 10 lines
# /etc/caddy/Caddyfile
www.example.com {
reverse_proxy 10.10.0.11:8080 10.10.0.12:8080
}
api.example.com {
reverse_proxy 10.10.0.21:8000 10.10.0.22:8000
}
blog.example.com {
reverse_proxy localhost:2368
}
static.example.com {
root * /var/www/static
file_server
}
That is the entire config. Caddy reads the hostnames, contacts Let's Encrypt, gets certificates for all four, configures TLS, and starts reverse-proxying. Multiple backends get round-robin load balancing by default. Add a health check:
www.example.com {
reverse_proxy 10.10.0.11:8080 10.10.0.12:8080 {
health_uri /health
health_interval 10s
health_timeout 5s
health_status 200
lb_policy round_robin
}
}
Caddyfile — advanced features
api.example.com {
# Rate limiting (requires caddy-ratelimit module)
rate_limit {
zone dynamic {
key {remote_host}
events 100
window 10s
}
}
# Add security headers
header {
Strict-Transport-Security "max-age=31536000; includeSubDomains"
X-Content-Type-Options nosniff
X-Frame-Options DENY
}
# Basic auth on a path
handle /admin/* {
basicauth {
admin $2a$14$... # bcrypt hash
}
reverse_proxy localhost:8001
}
reverse_proxy 10.10.0.21:8000 10.10.0.22:8000
}
# Reload config without restart
caddy reload --config /etc/caddy/Caddyfile
7. Health Checks Deep Dive
A load balancer without health checks is a traffic distributor, not a reliability tool. Health checks are what make failover automatic. Getting them right is the difference between a load balancer that removes failed backends in two seconds and one that keeps sending traffic to a server that returns 500 errors.
TCP health checks
# HAProxy: basic TCP check — is the port open?
server web1 10.10.0.11:8080 check inter 2s fall 3 rise 2
# This only tells you the port is open.
# It does NOT tell you the application is working.
# A web server can accept TCP connections but return 500 for every request.
HTTP health checks
# HAProxy: HTTP check — does the app return 200?
backend web_servers
option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
http-check expect status 200
server web1 10.10.0.11:8080 check inter 2s fall 3 rise 2
# HAProxy: check for a specific string in the response body
http-check expect string "\"status\":\"ok\""
# HAProxy: check a JSON health endpoint (match any 2xx)
option httpchk GET /healthz HTTP/1.1\r\nHost:\ internal
http-check expect rstatus ^(200|204)$
Application-specific health endpoint
# What a good /health endpoint checks:
# - Database connection: can we query the DB?
# - Cache connection: can we reach Redis?
# - Downstream APIs: are dependencies reachable?
# - Disk space: are we above the threshold?
# Example: Python Flask health endpoint
from flask import Flask, jsonify
import psycopg2, redis
app = Flask(__name__)
@app.route('/health')
def health():
checks = {}
try:
conn = psycopg2.connect(DATABASE_URL)
conn.close()
checks['database'] = 'ok'
except Exception as e:
checks['database'] = str(e)
try:
r = redis.Redis.from_url(REDIS_URL)
r.ping()
checks['cache'] = 'ok'
except Exception as e:
checks['cache'] = str(e)
status = 'ok' if all(v == 'ok' for v in checks.values()) else 'degraded'
http_status = 200 if status == 'ok' else 503
return jsonify({'status': status, 'checks': checks}), http_status
Health check timing parameters
# HAProxy server line breakdown:
server web1 10.10.0.11:8080 \
check \ # enable health checking
inter 2s \ # check interval: every 2 seconds
fastinter 500ms \ # interval when server is in transition state
downinter 5s \ # interval for known-down servers
fall 3 \ # consecutive failures before marking DOWN
rise 2 \ # consecutive successes before marking UP
weight 10 \ # relative weight for roundrobin
slowstart 60s # ramp up traffic over 60s after marking UP
Monitoring health check state with Prometheus
# HAProxy exposes Prometheus metrics natively (HAProxy 2.0+)
frontend stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
# For Prometheus scraping:
frontend prometheus
bind *:8405
http-request use-service prometheus-exporter if { path /metrics }
no log
# Or use haproxy_exporter sidecar
docker run -d \
--name haproxy-exporter \
-p 9101:9101 \
prom/haproxy-exporter \
--haproxy.scrape-uri="http://localhost:8404/stats;csv"
# Key metrics to alert on:
# haproxy_backend_status — 0=DOWN, 1=UP per backend
# haproxy_backend_active_servers — number of active servers
# haproxy_backend_http_responses_total by code — 5xx rate
# haproxy_server_check_failures_total — cumulative health check failures
8. SSL/TLS Termination and Passthrough
There are three ways to handle TLS at a load balancer. Understanding the tradeoffs is the difference between a correct architecture and a security hole.
TLS termination
The load balancer decrypts TLS, forwards plaintext to backends. Backends need no certificates. The LB can inspect HTTP content for Layer 7 routing. Used by most deployments. Tradeoff: the load balancer sees all traffic in plaintext — it is a trusted component in your architecture.
TLS passthrough
The load balancer forwards encrypted traffic without decrypting. Backend decrypts. The LB cannot inspect HTTP content — it can only route based on SNI (the hostname in the TLS ClientHello). Used when backends must hold the private key, or for end-to-end encryption compliance.
Re-encryption (mTLS)
The load balancer decrypts from the client and re-encrypts to the backend. Can use mutual TLS (mTLS) for the backend leg — the backend verifies the LB's client certificate. Full HTTP inspection at the LB, encrypted transit to backends. Best for zero-trust architectures.
HAProxy TLS passthrough (SNI routing)
frontend tls_passthrough
bind *:443
mode tcp
option tcplog
# Route by SNI without decrypting
tcp-request inspect-delay 5s
tcp-request content accept if { req_ssl_hello_type 1 }
acl is_api req_ssl_sni -i api.example.com
acl is_www req_ssl_sni -i www.example.com
use_backend api_tls if is_api
use_backend www_tls if is_www
backend api_tls
mode tcp
balance roundrobin
server api1 10.10.0.21:443 check
server api2 10.10.0.22:443 check
backend www_tls
mode tcp
balance roundrobin
server web1 10.10.0.11:443 check
server web2 10.10.0.12:443 check
Re-encryption to backend (HTTPS backend)
backend api_reencrypt
balance roundrobin
# Forward to backend over TLS
server api1 10.10.0.21:8443 check ssl verify required ca-file /etc/haproxy/ca.pem
server api2 10.10.0.22:8443 check ssl verify required ca-file /etc/haproxy/ca.pem
Integration with step-ca for internal TLS
# step-ca issues internal certificates for your infrastructure
# Install step-ca on a kldload node
curl -L https://dl.smallstep.com/gh-release/certificates/gh-release-header/v0.27.0/step-ca_linux_0.27.0_amd64.tar.gz | tar xz
mv step-ca /usr/local/bin/
# Initialize a CA
step ca init --deployment-type=standalone
# Issue a certificate for HAProxy
step ca certificate haproxy.internal haproxy.crt haproxy.key
# Issue for backends
step ca certificate api1.internal api1.crt api1.key
# HAProxy uses the internal CA for backend verification
backend api_internal
server api1 10.10.0.21:8443 check ssl verify required ca-file /root/.step/certs/root_ca.crt
9. Load Balancing on WireGuard
This is the kldload pattern. The load balancer is the only server with a public IP. Everything behind it — web servers, APIs, databases — lives on the WireGuard backplane. The backend pool is invisible from the internet. A port scan of your public IP shows one IP with one or two open ports. The entire infrastructure is hidden.
The pattern: public load balancer (HAProxy + keepalived) with a public IP. All backends live on the WireGuard backplane — private addresses like 10.10.0.0/24. HAProxy health checks go to WireGuard addresses. Traffic from HAProxy to backends is encrypted by WireGuard at the network layer. The backends have no public IPs. There are no firewall rules to allow inbound connections to them — they are unreachable from the internet by design.
WireGuard peers authenticate by public key — a backend that does not have a valid key cannot receive traffic, even if it could somehow route to the backplane. This is a stronger guarantee than a firewall rule, which can be misconfigured. The backend pool is cryptographically isolated.
Full config: public LB with WireGuard backends
# WireGuard is already configured on all nodes
# LB backplane address: 10.10.0.1
# web1 backplane address: 10.10.0.11
# web2 backplane address: 10.10.0.12
# web3 backplane address: 10.10.0.13
# /etc/haproxy/haproxy.cfg on the load balancer
global
log /dev/log local0
maxconn 100000
user haproxy
group haproxy
daemon
# Stats socket for runtime API
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
defaults
log global
mode http
option httplog
option dontlognull
option forwardfor # pass X-Forwarded-For to backends
option http-server-close
timeout connect 5s
timeout client 30s
timeout server 30s
retries 3
# Public HTTPS frontend
frontend https_front
bind *:443 ssl crt /etc/haproxy/certs/example.com.pem alpn h2,http/1.1
bind *:80
redirect scheme https code 301 if !{ ssl_fc }
# Real client IP logging (HAProxy terminates TLS, add XFF header)
http-request set-header X-Real-IP %[src]
# Path-based routing
acl is_api path_beg /api/
use_backend api_wg if is_api
default_backend web_wg
# Web backends — all on WireGuard addresses
backend web_wg
balance leastconn
option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
http-check expect status 200
# WireGuard addresses — invisible from internet
server web1 10.10.0.11:8080 check inter 2s fall 3 rise 2
server web2 10.10.0.12:8080 check inter 2s fall 3 rise 2
server web3 10.10.0.13:8080 check inter 2s fall 3 rise 2
# API backends — all on WireGuard addresses
backend api_wg
balance leastconn
option httpchk GET /api/health HTTP/1.1\r\nHost:\ api.example.com
http-check expect status 200
server api1 10.10.0.21:8000 check inter 2s fall 3 rise 2
server api2 10.10.0.22:8000 check inter 2s fall 3 rise 2
# Internal stats — bind to WireGuard address only, never public
frontend stats
bind 10.10.0.1:8404
stats enable
stats uri /stats
stats refresh 5s
stats auth admin:changeme
nftables rules on the load balancer node
# Only allow inbound on 80/443 from internet
# WireGuard (51820) is already handled by the WireGuard interface
# Stats page accessible only from WireGuard backplane
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
ct state established,related accept
iif lo accept
# WireGuard
udp dport 51820 accept
# Public web traffic
tcp dport { 80, 443 } accept
# SSH only from backplane
iif wg0 tcp dport 22 accept
# ICMP
ip protocol icmp accept
ip6 nexthdr icmpv6 accept
}
}
10. ZFS for Load Balancer Config
HAProxy config changes can take down your entire infrastructure if they are wrong.
A misconfigured ACL, a missing ssl crt, a typo in a server address — any of these
causes HAProxy to fail to reload. On kldload, ZFS gives you an instant undo button:
snapshot before every change, rollback in seconds if something breaks.
Snapshot before config changes
# Store HAProxy config on a dedicated dataset
zfs create rpool/etc/haproxy
zfs set mountpoint=/etc/haproxy rpool/etc/haproxy
# Snapshot before every change
zfs snapshot rpool/etc/haproxy@before-acl-change-2026-04-02
# Make the change
vim /etc/haproxy/haproxy.cfg
# Test the config
haproxy -c -f /etc/haproxy/haproxy.cfg
# If the test fails, rollback immediately
zfs rollback rpool/etc/haproxy@before-acl-change-2026-04-02
# If the test passes, reload
systemctl reload haproxy
Boot environments for HAProxy upgrades
# Create a boot environment before upgrading HAProxy
bectl create before-haproxy-upgrade
bectl mount before-haproxy-upgrade /mnt
# Upgrade HAProxy
dnf upgrade -y haproxy
# If the upgrade breaks something, boot back
bectl activate before-haproxy-upgrade
reboot
Replicate LB config to standby node
# On lb1: send config snapshots to lb2 continuously
zfs snapshot rpool/etc/haproxy@$(date +%Y%m%d-%H%M%S)
# Initial replication
zfs send rpool/etc/haproxy@initial | \
ssh lb2 "zfs recv rpool/etc/haproxy"
# Incremental replication (after every change)
LAST=$(zfs list -t snapshot -H -o name rpool/etc/haproxy | tail -2 | head -1)
NOW=$(zfs list -t snapshot -H -o name rpool/etc/haproxy | tail -1)
zfs send -i $LAST $NOW | ssh lb2 "zfs recv rpool/etc/haproxy"
# lb2 always has an up-to-date copy of the config
# failover is instant — no config drift
11. Cilium Load Balancing (Kubernetes)
Inside a Kubernetes cluster, Cilium replaces kube-proxy with eBPF load balancing. Every Kubernetes Service becomes an entry in an eBPF map. Lookups are O(1) hash operations in the kernel, not O(n) iptables chain traversals. At scale, this is not a minor optimization — it is the difference between a cluster that programs new services in milliseconds and one that stalls for 10-30 seconds.
Replace kube-proxy with Cilium
# Install Cilium with kube-proxy replacement
helm install cilium cilium/cilium \
--namespace kube-system \
--set kubeProxyReplacement=true \
--set k8sServiceHost=10.10.0.1 \
--set k8sServicePort=6443
# Verify kube-proxy replacement is active
kubectl exec -n kube-system ds/cilium -- \
cilium status | grep "KubeProxyReplacement"
# Should show: KubeProxyReplacement: True
DSR — Direct Server Return
# DSR makes backends respond directly to clients, bypassing the LB node
# Eliminates the return-path bottleneck for high-throughput services
helm upgrade cilium cilium/cilium \
--namespace kube-system \
--reuse-values \
--set loadBalancer.mode=dsr
BGP-announced LoadBalancer services
# Cilium BGP speaker announces service IPs to your router
# LoadBalancer services get real IPs that your LAN router knows about
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: service-pool
spec:
cidrs:
- cidr: "10.20.0.0/24" # your LAN routable range
---
# Any service of type LoadBalancer gets an IP from this pool
apiVersion: v1
kind: Service
metadata:
name: webapp
spec:
type: LoadBalancer
selector:
app: webapp
ports:
- port: 80
targetPort: 8080
12. Global Load Balancing (Multi-Site)
When you have more than one datacenter or site, you need global load balancing — routing users to the nearest available site and failing over between sites when one goes down. Three patterns: DNS-based failover, anycast BGP, and GeoDNS.
Cloudflare DNS failover
# Cloudflare can health-check your origin and fail over DNS automatically
# Site 1: 203.0.113.10 (primary)
# Site 2: 203.0.113.20 (failover)
# Set up health checks in Cloudflare dashboard:
# Type: HTTP, URL: https://www.example.com/health, expected: 200
# DNS records with failover:
# A www.example.com 203.0.113.10 (primary, Proxied)
# A www-failover.example.com 203.0.113.20 (secondary, DNS Only)
# Load Balancing rules (Cloudflare Load Balancing product):
# Pool 1: 203.0.113.10 (primary)
# Pool 2: 203.0.113.20 (failover)
# Origin health check on /health every 60s
# Failover: if primary pool unhealthy, route to Pool 2
PowerDNS with health checks (self-hosted)
# PowerDNS + pdns-recursor + Lua scripts for health-aware DNS
# /etc/pdns/pdns.conf
launch=gsqlite3
gsqlite3-database=/var/lib/powerdns/pdns.db
enable-lua-records=yes
# Lua record for health-checked failover
CREATE OR REPLACE TABLE records ...
-- Lua A record:
-- www IN LUA A "ifportup(80, {'203.0.113.10', '203.0.113.20'})"
-- Returns 203.0.113.10 if port 80 is open, else 203.0.113.20
Anycast BGP (same IP, multiple sites)
# Both sites announce the same IP prefix from different ASNs
# BGP routing selects the nearest site for each client
# If a site fails, its BGP announcement withdraws, traffic routes to the other
# On each kldload site's border router (FRRouting):
router bgp 65001
bgp router-id 203.0.113.10
neighbor 198.51.100.1 remote-as 65000 # upstream provider
address-family ipv4 unicast
network 203.0.113.0/24 # announce your anycast prefix
neighbor 198.51.100.1 activate
# When you withdraw the announcement (site goes down or maintenance):
vtysh -c "configure" -c "router bgp 65001" \
-c "address-family ipv4 unicast" \
-c "no network 203.0.113.0/24"
Two kldload sites with Cloudflare failover
# Site 1 (primary): HAProxy + keepalived, VIP at 203.0.113.10
# Site 2 (DR): HAProxy + keepalived, VIP at 203.0.113.20
# Both sites: identical WireGuard backplane configs
# Config replication: zfs send | zfs recv over WireGuard tunnel
# Cloudflare health monitor pings /health on both sites every 60s
# DNS TTL: 60 seconds (fast failover)
# If Site 1 /health returns non-200 for 2 consecutive checks:
# Cloudflare updates DNS to 203.0.113.20
# Users start hitting Site 2 within 60-120 seconds
13. Monitoring and Troubleshooting
HAProxy stats page
# Enable stats page (bind to backplane address, not public)
frontend stats
bind 10.10.0.1:8404
stats enable
stats uri /stats
stats refresh 5s
stats show-legends
stats show-node
stats auth admin:changeme
# Access at http://10.10.0.1:8404/stats
# Shows: frontend/backend/server state, connection rates, error rates, health check status
HAProxy runtime API via socat
# HAProxy exposes a runtime API over a Unix socket
# Enable in global section:
# stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
# Show backend status
echo "show servers state" | socat stdio /run/haproxy/admin.sock
# Drain a server (stop sending new connections, let existing finish)
echo "set server web_servers/web1 state drain" | socat stdio /run/haproxy/admin.sock
# Take a server offline for maintenance
echo "set server web_servers/web1 state maint" | socat stdio /run/haproxy/admin.sock
# Bring it back
echo "set server web_servers/web1 state ready" | socat stdio /run/haproxy/admin.sock
# Show current connections
echo "show info" | socat stdio /run/haproxy/admin.sock | grep "CurrConns"
Common issues and fixes
| Symptom | Likely cause | Fix |
|---|---|---|
| Connection refused on frontend port | HAProxy not running, or firewall blocking | systemctl status haproxy, check haproxy -c for config errors, check nftables rules |
| 502 Bad Gateway | Backend is up but returning an error | Check backend application logs, verify the health check endpoint returns 200 |
| 503 Service Unavailable | All backends are DOWN | Check stats page for backend health, show servers state via socat, verify health check config |
| Timeouts on long requests | HAProxy timeout too short | Increase timeout server and timeout client for long-lived connections (uploads, WebSockets) |
| VIP not floating after node failure | keepalived not running, or VRRP blocked | systemctl status keepalived, verify VRRP protocol (112) is not blocked by firewall |
| HAProxy reload fails | Config syntax error | haproxy -c -f /etc/haproxy/haproxy.cfg — always run this before reload; rollback ZFS snapshot if needed |
| Backends marked DOWN but they respond | Health check misconfigured | Verify health check URL, expected status code, and Host header; curl the health endpoint manually from the LB |
| Split-brain on keepalived | Both nodes become MASTER simultaneously | Check VRRP multicast reachability between nodes, verify identical virtual_router_id and auth_pass |
Config validation and debugging workflow
# 1. Always validate before reload
haproxy -c -f /etc/haproxy/haproxy.cfg
# "Configuration file is valid" means it is safe to reload
# 2. Reload without downtime
systemctl reload haproxy
# HAProxy keeps existing connections alive, loads new config
# 3. Check logs
journalctl -u haproxy -f
# HAProxy logs to syslog; look for "Server backend/server is DOWN"
# 4. Test a specific backend from the LB node
curl -v http://10.10.0.11:8080/health
# If this fails, the backend is down or the health check path is wrong
# 5. Check keepalived state
ip addr show | grep -A2 "inet 10.10.0.10"
# Should appear on exactly one node
# 6. Verify WireGuard connectivity to backends
ping 10.10.0.11
wg show # verify handshakes are recent
# 7. Check HAProxy stats for server state
echo "show servers state web_servers" | socat stdio /run/haproxy/admin.sock
Related pages
- WireGuard Masterclass — the backplane that hides your backend pool
- nftables Masterclass — firewall rules for the LB node
- Cilium Masterclass — eBPF load balancing inside Kubernetes
- BIRD & BGP Masterclass — BGP for multi-site anycast
- Observability Masterclass — Prometheus, Grafana, HAProxy metrics
- WireGuard Mesh & Multi-Site — building the backplane
- Cluster & Blue/Green — zero-downtime deploy patterns
- Monitoring Stack Glossary (355 terms) Help & Links — haproxy_exporter, Prometheus, Grafana