Backplane Networks Masterclass
This guide is about the invisible network that runs underneath your production services. Not the network your users see. Not the interface with a public IP. The encrypted substrate that carries SSH, database traffic, monitoring, and replication between your machines — the part that doesn't exist, as far as the internet is concerned.
If you have read the WireGuard Masterclass and understand the four-plane mesh from WireGuard Mesh & Multi-Site, this is the operational guide: how to design, build, and run that mesh in production, from a single two-node setup through to a multi-site deployment with BGP, BFD, DNS, monitoring, and ZFS replication all running through encrypted planes.
What a backplane is: A backplane is an invisible network that runs underneath your production services. Your services bind to backplane addresses. The outside world sees nothing. The physical interface only has one port open: WireGuard's UDP port, which doesn't respond to unauthenticated traffic. From the internet's perspective, your servers don't exist. From the backplane's perspective, they're running a full production stack. This is how every serious infrastructure team operates, and kldload makes it trivial.
What this guide builds: You start with two fresh kldload servers — no WireGuard, everything exposed. You finish with a three-node, four-plane encrypted backplane with DNS, Prometheus monitoring, ZFS replication, multi-site BGP routing, and a complete security posture. Every step builds on the last. Every config is complete and deployable.
1. What a Backplane Is and Why You Need One
Most servers are deployed the way they were in 2005: a public IP, a few iptables rules, SSH on port 22. The attacker's job is trivial — scan the port, find a vulnerability, exploit it. The defender's job is impossible — you can't patch everything, you can't predict every exploit, and every service you expose is a surface.
A backplane inverts this entirely. The physical interface carries one thing: WireGuard UDP datagrams on a single port. WireGuard does not respond to unauthenticated traffic. There is no banner, no handshake, no error message. From the scanner's perspective, the port is closed. From the backplane's perspective, your server is running a full production stack — SSH, PostgreSQL, Prometheus, Redis, whatever you need — all bound to private WireGuard addresses that are only reachable by authenticated peers.
The physical interface
eth0 (or enp0s3, or whatever the cloud calls it) gets one inbound rule: UDP on the WireGuard port. Everything else is dropped. There is no SSH on the public interface. No HTTP. No ICMP. Just encrypted WireGuard datagrams.
The backplane interfaces
wg0, wg1, wg2, wg3 — each a separate encrypted point-to-point or mesh network. SSH listens on wg1. Prometheus scrapes over wg2. ZFS replication runs over wg3. Services bind to 10.200.x.x, not 0.0.0.0.
The access model
To reach any service, you must first authenticate to WireGuard. No valid key means no tunnel. No tunnel means no access. The WireGuard handshake IS your authentication layer — everything behind it can trust peer IPs implicitly.
Why kldload
kldload installs WireGuard in the kernel (wireguard-tools + kernel module), configures wg-quick systemd units, and ships nftables for per-interface firewall rules — all in one installation step, across all supported distros.
2. Zero to Hero: Your First Backplane
Start state: two fresh kldload servers. Node A has public IP 203.0.113.10. Node B
has public IP 203.0.113.20. SSH is open to the world. Everything is exposed.
End state: both servers are invisible. SSH works through the backplane.
Step 1: Generate keys on both nodes
# On node-a
umask 077
wg genkey | tee /etc/wireguard/node-a-private.key | wg pubkey > /etc/wireguard/node-a-public.key
cat /etc/wireguard/node-a-private.key # save this
cat /etc/wireguard/node-a-public.key # save this — goes into node-b's peer block
# On node-b
umask 077
wg genkey | tee /etc/wireguard/node-b-private.key | wg pubkey > /etc/wireguard/node-b-public.key
cat /etc/wireguard/node-b-private.key # save this
cat /etc/wireguard/node-b-public.key # save this — goes into node-a's peer block
Step 2: Create /etc/wireguard/wg0.conf on node-a
[Interface]
Address = 10.200.0.1/24
PrivateKey = <node-a-private-key>
ListenPort = 51820
[Peer]
PublicKey = <node-b-public-key>
AllowedIPs = 10.200.0.2/32
Endpoint = 203.0.113.20:51820
PersistentKeepalive = 25
Step 3: Create /etc/wireguard/wg0.conf on node-b
[Interface]
Address = 10.200.0.2/24
PrivateKey = <node-b-private-key>
ListenPort = 51820
[Peer]
PublicKey = <node-a-public-key>
AllowedIPs = 10.200.0.1/32
Endpoint = 203.0.113.10:51820
PersistentKeepalive = 25
Step 4: Bring up the tunnel on both nodes
# On both nodes
systemctl enable --now wg-quick@wg0
# Verify
wg show wg0
# Should show: peer with handshake within last 30s
# Test connectivity
ping -c3 10.200.0.2 # from node-a
ping -c3 10.200.0.1 # from node-b
Step 5: Move SSH to the backplane
# Edit /etc/ssh/sshd_config on both nodes — add:
ListenAddress 10.200.0.1 # node-a uses its WG address
# ListenAddress 10.200.0.2 # node-b uses its WG address
# IMPORTANT: do not remove the public ListenAddress yet
# Verify WG connectivity before cutting over
# Reload SSH
systemctl reload sshd
# Test: open a NEW SSH session through the backplane while keeping your existing session
ssh -i ~/.ssh/id_rsa user@10.200.0.1
# If that works, remove the public ListenAddress
# Edit /etc/ssh/sshd_config — remove the 0.0.0.0 line
systemctl reload sshd
Step 6: Move other services to the backplane
# PostgreSQL — edit /etc/postgresql/*/main/postgresql.conf (Debian)
# or /var/lib/pgsql/data/postgresql.conf (RHEL/Rocky)
listen_addresses = '10.200.0.1' # on node-a
# Prometheus — edit /etc/prometheus/prometheus.yml
# Change --web.listen-address to:
--web.listen-address=10.200.0.1:9090
# nginx admin vhost — change listen directive
server {
listen 10.200.0.1:8080;
# ...
}
Step 7: Lock down the physical interface with nftables
# /etc/nftables.conf — replace on both nodes
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
# loopback — always allow
iifname lo accept
# established/related — allow return traffic
ct state established,related accept
# WireGuard UDP — the only open port on the physical interface
iifname eth0 udp dport 51820 accept
# everything on backplane interfaces — allow
iifname wg0 accept
# drop everything else
drop
}
chain forward {
type filter hook forward priority 0; policy drop;
}
chain output {
type filter hook output priority 0; policy accept;
}
}
# Apply
nft -f /etc/nftables.conf
systemctl enable --now nftables
Step 8: Verify from outside
# From a machine that is NOT in the backplane:
nmap -p- 203.0.113.10
# Result: all ports filtered — server does not exist
nmap -sU -p 51820 203.0.113.10
# Result: open|filtered — WireGuard responds to nothing
# From node-b (inside the backplane):
ssh 10.200.0.1 # works instantly
psql -h 10.200.0.1 -U postgres # works
curl http://10.200.0.1:8080/ # works
3. Adding a Third Node (and Beyond)
Adding a third node to a two-node mesh is straightforward. Adding a twentieth node by hand is tedious. This section covers both — the manual process for small meshes, and a script for anything larger.
Adding node-c manually
# On node-c: generate keys
umask 077
wg genkey | tee /etc/wireguard/node-c-private.key | wg pubkey > /etc/wireguard/node-c-public.key
# /etc/wireguard/wg0.conf on node-c:
[Interface]
Address = 10.200.0.3/24
PrivateKey = <node-c-private-key>
ListenPort = 51820
[Peer]
# node-a
PublicKey = <node-a-public-key>
AllowedIPs = 10.200.0.1/32
Endpoint = 203.0.113.10:51820
PersistentKeepalive = 25
[Peer]
# node-b
PublicKey = <node-b-public-key>
AllowedIPs = 10.200.0.2/32
Endpoint = 203.0.113.20:51820
PersistentKeepalive = 25
# Add node-c to node-a's wg0.conf:
[Peer]
# node-c
PublicKey = <node-c-public-key>
AllowedIPs = 10.200.0.3/32
Endpoint = 203.0.113.30:51820
PersistentKeepalive = 25
# Add node-c to node-b's wg0.conf (same block, different endpoint)
# Reload WireGuard on all three nodes — no downtime
wg syncconf wg0 <(wg-quick strip wg0)
The kvm-clone trick
When deploying from a kldload golden image, the cleanest approach is to include WireGuard in the template with everything except the private key and the node-specific IP. On first boot, a cloud-init script generates the key pair, assigns the next IP from your address space, and distributes the public key to existing nodes via SSH.
# cloud-init user-data snippet (template nodes):
runcmd:
- umask 077 && wg genkey | tee /etc/wireguard/wg0.key | wg pubkey > /etc/wireguard/wg0.pub
- /usr/local/sbin/join-backplane.sh
add-to-mesh.sh — for anything over 5 nodes
#!/bin/bash
# add-to-mesh.sh — add a new node to the wg0 mesh
# Usage: ./add-to-mesh.sh <new-node-hostname> <new-node-ip> <new-node-endpoint>
set -euo pipefail
NEW_HOST="$1"
NEW_WG_IP="$2"
NEW_ENDPOINT="$3"
EXISTING_NODES=(node-a node-b) # add node hostnames here
# Generate keys on the new node
ssh "$NEW_HOST" 'umask 077; wg genkey | tee /etc/wireguard/wg0.key | wg pubkey > /etc/wireguard/wg0.pub'
NEW_PUBKEY=$(ssh "$NEW_HOST" 'cat /etc/wireguard/wg0.pub')
echo "New node public key: $NEW_PUBKEY"
# Add new node as peer to all existing nodes
for node in "${EXISTING_NODES[@]}"; do
echo "Adding $NEW_HOST to $node..."
ssh "$node" "wg set wg0 peer '$NEW_PUBKEY' allowed-ips '${NEW_WG_IP}/32' endpoint '${NEW_ENDPOINT}:51820' persistent-keepalive 25"
# Persist to wg0.conf
ssh "$node" "wg-quick save wg0"
done
# Build wg0.conf for the new node
CONF="[Interface]
Address = ${NEW_WG_IP}/24
PrivateKey = \$(cat /etc/wireguard/wg0.key)
ListenPort = 51820
"
for node in "${EXISTING_NODES[@]}"; do
NODE_PUBKEY=$(ssh "$node" 'cat /etc/wireguard/wg0.pub')
NODE_WG_IP=$(ssh "$node" 'wg show wg0 | grep "address:" | awk "{print \$2}" | cut -d/ -f1')
NODE_ENDPOINT=$(ssh "$node" 'curl -s ifconfig.me')
CONF+="
[Peer]
# $node
PublicKey = $NODE_PUBKEY
AllowedIPs = ${NODE_WG_IP}/32
Endpoint = ${NODE_ENDPOINT}:51820
PersistentKeepalive = 25
"
done
ssh "$NEW_HOST" "echo '$CONF' > /etc/wireguard/wg0.conf"
ssh "$NEW_HOST" "systemctl enable --now wg-quick@wg0"
echo "Done. $NEW_HOST is now in the mesh."
4. Multiple Planes (Traffic Isolation)
One WireGuard interface carries everything. That works at small scale. At production scale, it creates two problems: performance (high-bandwidth replication traffic competing with SSH makes interactive sessions feel degraded) and security (a compromised management plane means a compromised everything plane).
The solution is multiple planes — separate WireGuard interfaces, separate key pairs, separate address spaces, separate firewall rules. Traffic is isolated at the network layer, not the application layer.
The four-plane pattern
| Interface | Subnet | Port | Traffic |
|---|---|---|---|
| wg0 | 10.200.0.0/24 | 51820 | Enrollment / bootstrapping new nodes |
| wg1 | 10.201.0.0/24 | 51821 | Management — SSH, admin APIs, config push |
| wg2 | 10.202.0.0/24 | 51822 | Monitoring — Prometheus scrapes, Grafana, alerting |
| wg3 | 10.203.0.0/24 | 51823 | Storage — ZFS replication, database sync, bulk data |
Complete four-plane config for node-a (three-node cluster)
# /etc/wireguard/wg0.conf — enrollment plane
[Interface]
Address = 10.200.0.1/24
PrivateKey = <node-a-wg0-private>
ListenPort = 51820
[Peer]
# node-b
PublicKey = <node-b-wg0-public>
AllowedIPs = 10.200.0.2/32
Endpoint = 203.0.113.20:51820
PersistentKeepalive = 25
[Peer]
# node-c
PublicKey = <node-c-wg0-public>
AllowedIPs = 10.200.0.3/32
Endpoint = 203.0.113.30:51820
PersistentKeepalive = 25
# /etc/wireguard/wg1.conf — management plane (SSH lives here)
[Interface]
Address = 10.201.0.1/24
PrivateKey = <node-a-wg1-private> # DIFFERENT key pair from wg0
ListenPort = 51821
[Peer]
PublicKey = <node-b-wg1-public>
AllowedIPs = 10.201.0.2/32
Endpoint = 203.0.113.20:51821
PersistentKeepalive = 25
[Peer]
PublicKey = <node-c-wg1-public>
AllowedIPs = 10.201.0.3/32
Endpoint = 203.0.113.30:51821
PersistentKeepalive = 25
# /etc/wireguard/wg2.conf — monitoring plane
[Interface]
Address = 10.202.0.1/24
PrivateKey = <node-a-wg2-private>
ListenPort = 51822
[Peer]
PublicKey = <node-b-wg2-public>
AllowedIPs = 10.202.0.2/32
Endpoint = 203.0.113.20:51822
PersistentKeepalive = 25
[Peer]
PublicKey = <node-c-wg2-public>
AllowedIPs = 10.202.0.3/32
Endpoint = 203.0.113.30:51822
PersistentKeepalive = 25
# /etc/wireguard/wg3.conf — storage plane
[Interface]
Address = 10.203.0.1/24
PrivateKey = <node-a-wg3-private>
ListenPort = 51823
[Peer]
PublicKey = <node-b-wg3-public>
AllowedIPs = 10.203.0.2/32
Endpoint = 203.0.113.20:51823
PersistentKeepalive = 25
[Peer]
PublicKey = <node-c-wg3-public>
AllowedIPs = 10.203.0.3/32
Endpoint = 203.0.113.30:51823
PersistentKeepalive = 25
# Enable all four planes
for iface in wg0 wg1 wg2 wg3; do
systemctl enable --now wg-quick@$iface
done
Binding services to specific planes
# /etc/ssh/sshd_config — SSH on management plane only
ListenAddress 10.201.0.1
# /etc/prometheus/prometheus.yml — Prometheus on monitoring plane only
# (see section 6 for full config)
# PostgreSQL — on management or storage plane depending on access pattern
# /etc/postgresql/*/main/postgresql.conf
listen_addresses = '10.201.0.1,127.0.0.1'
# node_exporter — on monitoring plane only
ExecStart=/usr/bin/node_exporter \
--web.listen-address=10.202.0.1:9100
nftables per-plane access control
# /etc/nftables.conf — four-plane rules
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
iifname lo accept
ct state established,related accept
# Physical interface: only WireGuard UDP on all four ports
iifname eth0 udp dport { 51820, 51821, 51822, 51823 } accept
iifname eth0 drop
# wg0 (enrollment) — limited: only SSH and wg-management traffic
iifname wg0 tcp dport 22 accept
iifname wg0 drop
# wg1 (management) — SSH, admin APIs, DNS
iifname wg1 tcp dport { 22, 8080, 8443, 53 } accept
iifname wg1 udp dport 53 accept
iifname wg1 drop
# wg2 (monitoring) — Prometheus, node_exporter, alertmanager
iifname wg2 tcp dport { 9090, 9100, 9093, 3000 } accept
iifname wg2 drop
# wg3 (storage) — ZFS replication SSH, database sync
iifname wg3 tcp dport { 22, 5432, 3306 } accept
iifname wg3 drop
}
chain forward {
type filter hook forward priority 0; policy drop;
}
chain output {
type filter hook output priority 0; policy accept;
}
}
5. IP Addressing and DNS
The backplane address scheme is infrastructure — it should be planned once and never changed. Changing a WireGuard IP means updating every peer config on every node. Plan the space upfront, leave room to grow, and document it.
Recommended address scheme
# Four planes, each with a /24
10.200.0.0/24 wg0 — enrollment
10.201.0.0/24 wg1 — management / SSH
10.202.0.0/24 wg2 — monitoring
10.203.0.0/24 wg3 — storage
# Node assignment (consistent last-octet across all planes)
10.200.0.1 / 10.201.0.1 / 10.202.0.1 / 10.203.0.1 → node-a
10.200.0.2 / 10.201.0.2 / 10.202.0.2 / 10.203.0.2 → node-b
10.200.0.3 / 10.201.0.3 / 10.202.0.3 / 10.203.0.3 → node-c
# Multi-site: use the second octet for site
10.200.0.0/24 site-a, wg0
10.200.1.0/24 site-b, wg0
10.201.0.0/24 site-a, wg1
10.201.1.0/24 site-b, wg1
DNS for backplane hosts: Unbound on wg1
Run one Unbound instance on the management plane. All nodes use it as their resolver. You get human-readable hostnames for all backplane addresses.
# Install Unbound on node-a (the DNS server)
# CentOS/Rocky/RHEL
dnf install -y unbound
# Debian/Ubuntu
apt install -y unbound
# /etc/unbound/unbound.conf — backplane DNS server on node-a
server:
interface: 10.201.0.1 # only listen on management plane
interface: 127.0.0.1
access-control: 10.201.0.0/24 allow # only wg1 peers can query
access-control: 127.0.0.1/32 allow
# Local zone for backplane hostnames
local-zone: "mgmt." static
local-data: "node-a.mgmt. IN A 10.201.0.1"
local-data: "node-b.mgmt. IN A 10.201.0.2"
local-data: "node-c.mgmt. IN A 10.201.0.3"
local-zone: "mon." static
local-data: "node-a.mon. IN A 10.202.0.1"
local-data: "node-b.mon. IN A 10.202.0.2"
local-data: "node-c.mon. IN A 10.202.0.3"
local-zone: "store." static
local-data: "node-a.store. IN A 10.203.0.1"
local-data: "node-b.store. IN A 10.203.0.2"
local-data: "node-c.store. IN A 10.203.0.3"
# Forward public DNS to upstream
forward-zone:
name: "."
forward-addr: 1.1.1.1
forward-addr: 8.8.8.8
systemctl enable --now unbound
# On all other nodes: point resolver at the backplane DNS server
# /etc/resolv.conf (or managed by systemd-resolved / NetworkManager)
nameserver 10.201.0.1
search mgmt. mon. store.
Dynamic registration with PostUp nsupdate
When nodes bring up their WireGuard interface, a PostUp hook registers their address with the DNS server automatically. This is useful for dynamic environments where IPs are assigned at boot.
# /etc/wireguard/wg1.conf — on any node that needs dynamic DNS registration
[Interface]
Address = 10.201.0.5/24
PrivateKey = <private-key>
ListenPort = 51821
PostUp = /usr/local/sbin/register-dns.sh wg1 10.201.0.5 $(hostname -s).mgmt.
PreDown = /usr/local/sbin/deregister-dns.sh $(hostname -s).mgmt.
#!/bin/bash
# /usr/local/sbin/register-dns.sh
# Usage: register-dns.sh <iface> <ip> <fqdn>
IFACE="$1"; IP="$2"; FQDN="$3"
nsupdate -k /etc/unbound/tsig.key <<EOF
server 10.201.0.1
update delete $FQDN A
update add $FQDN 300 A $IP
send
EOF
6. Monitoring Through the Backplane
Prometheus scrapes over wg2. Grafana is accessible over wg1. node_exporter binds only to the monitoring plane address. Nothing monitoring-related is visible on the public interface or on the management plane.
node_exporter on every node — bound to wg2 only
# /etc/systemd/system/node_exporter.service.d/backplane.conf
[Service]
ExecStart=
ExecStart=/usr/bin/node_exporter \
--web.listen-address=10.202.0.1:9100 \
--collector.systemd \
--collector.processes \
--no-collector.wifi
# Apply
systemctl daemon-reload
systemctl restart node_exporter
# Verify — should NOT appear on public or management interfaces
ss -tlnp | grep 9100
# tcp LISTEN 0 128 10.202.0.1:9100 *:* users:(("node_exporter",...))
Prometheus scrape config using backplane addresses
# /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
# Prometheus itself bound to monitoring plane
# Start with: --web.listen-address=10.202.0.1:9090
scrape_configs:
- job_name: 'node'
static_configs:
- targets:
- 'node-a.mon:9100'
- 'node-b.mon:9100'
- 'node-c.mon:9100'
relabel_configs:
- source_labels: [__address__]
regex: '([^.]+)\.mon:.*'
target_label: instance
replacement: '$1'
- job_name: 'wireguard'
static_configs:
- targets:
- 'node-a.mon:9586' # prometheus-wireguard-exporter
- 'node-b.mon:9586'
- 'node-c.mon:9586'
- job_name: 'prometheus'
static_configs:
- targets: ['node-a.mon:9090']
WireGuard handshake monitoring
# Install prometheus-wireguard-exporter
# Available at: https://github.com/MindFlavor/prometheus_wireguard_exporter
# /etc/systemd/system/prometheus-wireguard-exporter.service
[Unit]
Description=Prometheus WireGuard Exporter
After=network.target wg-quick@wg0.service
[Service]
ExecStart=/usr/local/bin/prometheus_wireguard_exporter \
-a 10.202.0.1:9586 \
-n /etc/wireguard/wg0.conf \
-n /etc/wireguard/wg1.conf \
-n /etc/wireguard/wg2.conf \
-n /etc/wireguard/wg3.conf
Restart=always
[Install]
WantedBy=multi-user.target
# Prometheus alert rule: peer handshake too old
groups:
- name: wireguard
rules:
- alert: WireGuardPeerHandshakeStale
expr: time() - wireguard_latest_handshake_seconds > 180
for: 2m
labels:
severity: warning
annotations:
summary: "WireGuard peer handshake stale on {{ $labels.instance }}"
description: "Peer {{ $labels.public_key }} last handshake {{ $value | humanizeDuration }} ago"
Grafana on the management plane
# /etc/grafana/grafana.ini
[server]
http_addr = 10.201.0.1 # management plane only
http_port = 3000
# Datasource: connect Grafana to Prometheus over monitoring plane
# URL: http://10.202.0.1:9090
7. ZFS Replication Through the Backplane
ZFS replication (via syncoid) is the highest-bandwidth traffic on your backplane. A daily snapshot delta can be anywhere from a few gigabytes to tens of gigabytes. This traffic belongs on wg3 (storage plane) — isolated from management, monitoring, and enrollment traffic.
SSH keys for replication — dedicated key pair
# On node-a (the source): generate a dedicated replication key
ssh-keygen -t ed25519 -f /etc/zfs/replication.key -N '' -C 'zfs-replication@node-a'
# On node-b (the target): install the public key with restricted command
cat >> /root/.ssh/authorized_keys <<'EOF'
command="sudo /usr/sbin/zfs receive -F -d rpool",no-port-forwarding,no-X11-forwarding,no-agent-forwarding <replication-pubkey-here>
EOF
# The command= restriction means this key can ONLY run zfs receive.
# Even if the private key is stolen, it cannot open a shell.
syncoid configuration — storage plane only
# Test manually first
syncoid \
--sshkey /etc/zfs/replication.key \
--sshoption "BindAddress=10.203.0.1" \
rpool/data root@10.203.0.2:rpool/replicas/node-a
# --sshoption BindAddress=10.203.0.1 forces SSH to use the storage plane interface
# SSH goes out on wg3, not wg1 — replication traffic stays on the storage plane
Systemd timer for automated replication
# /etc/systemd/system/zfs-replication.service
[Unit]
Description=ZFS syncoid replication to node-b
After=network.target wg-quick@wg3.service
Wants=wg-quick@wg3.service
[Service]
Type=oneshot
ExecStart=/usr/sbin/syncoid \
--sshkey /etc/zfs/replication.key \
--sshoption "BindAddress=10.203.0.1" \
--no-privilege-elevation \
--compress lz4 \
rpool/data root@10.203.0.2:rpool/replicas/node-a
StandardOutput=journal
StandardError=journal
# /etc/systemd/system/zfs-replication.timer
[Unit]
Description=Run ZFS replication every 4 hours
[Timer]
OnBootSec=15min
OnUnitActiveSec=4h
RandomizedDelaySec=5min
[Install]
WantedBy=timers.target
# Enable
systemctl daemon-reload
systemctl enable --now zfs-replication.timer
Monitor replication lag via Prometheus
#!/bin/bash
# /usr/local/sbin/zfs-replication-check.sh
# Outputs a metric: zfs_replication_lag_seconds{dataset="rpool/data"}
DATASET="rpool/data"
LAST_SNAP=$(zfs list -H -t snapshot -o name,creation -s creation "$DATASET" | tail -1)
SNAP_TIME=$(date -d "$(echo "$LAST_SNAP" | awk '{print $2, $3, $4, $5, $6}')" +%s 2>/dev/null || echo 0)
NOW=$(date +%s)
LAG=$((NOW - SNAP_TIME))
echo "# HELP zfs_replication_lag_seconds Age of the most recent ZFS snapshot"
echo "# TYPE zfs_replication_lag_seconds gauge"
echo "zfs_replication_lag_seconds{dataset=\"$DATASET\"} $LAG"
# Expose via node_exporter textfile collector
# /etc/systemd/system/zfs-replication-metrics.timer
[Timer]
OnBootSec=1min
OnUnitActiveSec=5min
[Install]
WantedBy=timers.target
# Service writes to /var/lib/node_exporter/textfile_collector/
ExecStart=/bin/bash -c '/usr/local/sbin/zfs-replication-check.sh > /var/lib/node_exporter/textfile_collector/zfs_replication.prom.tmp && mv /var/lib/node_exporter/textfile_collector/zfs_replication.prom.tmp /var/lib/node_exporter/textfile_collector/zfs_replication.prom'
8. Backplane Security Hardening
A properly built backplane has a small, well-defined attack surface. This section covers the hardening steps that take it from "pretty secure" to "as secure as you can reasonably make it."
Pre-shared keys on every peer (post-quantum protection)
# Generate a PSK for each peer pair
wg genpsk > /etc/wireguard/psk-node-a-node-b.key
# Add to both sides of the peer block
[Peer]
PublicKey = <pubkey>
PresharedKey = <psk-value>
AllowedIPs = 10.200.0.2/32
Endpoint = 203.0.113.20:51820
A PSK is a symmetric secret shared between two peers, on top of the asymmetric key exchange. If quantum computers ever break Curve25519 (WireGuard's DH algorithm), the PSK layer remains secure as long as the PSK itself hasn't been compromised. The PSK adds no latency and minimal CPU overhead — there is no reason not to use it.
Key rotation strategy
#!/bin/bash
# rotate-keys.sh — rotate WireGuard keys on all nodes quarterly
# Run on each node; distribute new public keys to peers
set -euo pipefail
IFACE="${1:-wg0}"
KEYFILE="/etc/wireguard/${IFACE}.key"
PUBFILE="/etc/wireguard/${IFACE}.pub"
# Generate new key pair
umask 077
wg genkey | tee "${KEYFILE}.new" | wg pubkey > "${PUBFILE}.new"
NEW_PUB=$(cat "${PUBFILE}.new")
echo "New public key for ${IFACE}: $NEW_PUB"
echo "Distribute this key to all peers before proceeding."
echo "Press Enter when peers are updated, Ctrl-C to abort."
read -r
# Atomically replace the key
mv "${KEYFILE}.new" "${KEYFILE}"
mv "${PUBFILE}.new" "${PUBFILE}"
# Reload WireGuard with new identity
wg-quick down "$IFACE" && wg-quick up "$IFACE"
echo "Keys rotated. Verify handshakes: wg show $IFACE"
Detecting unauthorized IPs on backplane interfaces
# nftables: drop traffic from IPs not in our AllowedIPs list
# This is defense-in-depth — WireGuard already enforces AllowedIPs at the crypto level
# This nftables rule catches any misconfiguration
table inet backplane-guard {
set wg1-allowed-peers {
type ipv4_addr
elements = { 10.201.0.1, 10.201.0.2, 10.201.0.3 }
}
chain wg1-input {
type filter hook input priority -10; policy accept;
iifname wg1 ip saddr != @wg1-allowed-peers log prefix "BACKPLANE-UNEXPECTED: " drop
}
}
# Monitor for unexpected backplane IPs in the kernel log
journalctl -k -f | grep BACKPLANE-UNEXPECTED
fail2ban: not needed — here is why
fail2ban scans logs for failed authentication attempts and bans the source IP. On a properly configured backplane, there are no authentication attempts to fail. WireGuard does not respond to unauthenticated traffic. There is no banner, no challenge, no error message. The scanner sends a UDP datagram to port 51820. WireGuard checks the cryptographic handshake. If the peer is unknown, the datagram is silently discarded. No log entry. Nothing to scan. fail2ban has nothing to do.
9. Multi-Site Backplanes
Two sites, each with their own local mesh, connected by a site-to-site WireGuard tunnel with BGP routing between them. Each site has its own subnets. Each site's management plane can reach the other's. Failure of the site-to-site link does not affect intra-site connectivity.
Site subnet design
# Site A — datacenter
10.200.0.0/24 site-a wg0 (enrollment)
10.201.0.0/24 site-a wg1 (management)
10.202.0.0/24 site-a wg2 (monitoring)
10.203.0.0/24 site-a wg3 (storage)
# Site B — cloud / secondary site
10.200.1.0/24 site-b wg0
10.201.1.0/24 site-b wg1
10.202.1.0/24 site-b wg2
10.203.1.0/24 site-b wg3
# Site-to-site tunnel: dedicated interface
10.254.0.0/30 site-to-site link (wg-site)
10.254.0.1 site-a gateway
10.254.0.2 site-b gateway
Site-to-site WireGuard tunnel
# /etc/wireguard/wg-site.conf on site-a gateway node
[Interface]
Address = 10.254.0.1/30
PrivateKey = <site-a-gateway-private>
ListenPort = 51830
[Peer]
# site-b gateway
PublicKey = <site-b-gateway-public>
AllowedIPs = 10.254.0.0/30, 10.200.1.0/24, 10.201.1.0/24, 10.202.1.0/24, 10.203.1.0/24
Endpoint = <site-b-public-ip>:51830
PersistentKeepalive = 10
BGP between sites with FRRouting
# /etc/frr/frr.conf on site-a gateway
frr defaults traditional
hostname site-a-gw
router bgp 65001
bgp router-id 10.254.0.1
no bgp default ipv4-unicast
neighbor 10.254.0.2 remote-as 65002
neighbor 10.254.0.2 description "site-b gateway"
neighbor 10.254.0.2 timers 10 30
neighbor 10.254.0.2 timers connect 10
address-family ipv4 unicast
network 10.200.0.0/24
network 10.201.0.0/24
network 10.202.0.0/24
network 10.203.0.0/24
neighbor 10.254.0.2 activate
neighbor 10.254.0.2 soft-reconfiguration inbound
exit-address-family
# /etc/frr/frr.conf on site-b gateway
router bgp 65002
bgp router-id 10.254.0.2
no bgp default ipv4-unicast
neighbor 10.254.0.1 remote-as 65001
neighbor 10.254.0.1 description "site-a gateway"
neighbor 10.254.0.1 timers 10 30
neighbor 10.254.0.1 timers connect 10
address-family ipv4 unicast
network 10.200.1.0/24
network 10.201.1.0/24
network 10.202.1.0/24
network 10.203.1.0/24
neighbor 10.254.0.1 activate
neighbor 10.254.0.1 soft-reconfiguration inbound
exit-address-family
BFD for fast failure detection
# /etc/frr/frr.conf — add BFD to the BGP neighbor (on both gateways)
router bgp 65001
neighbor 10.254.0.2 bfd
bfd
peer 10.254.0.2
detect-multiplier 3
receive-interval 300
transmit-interval 300
!
# BFD detects link failure in 300ms * 3 = 900ms
# Without BFD: BGP hold timer = 30s — you wait 30 seconds to detect failure
# With BFD: failure detected in under 1 second, BGP reconverges immediately
Verify multi-site routing
# On site-a, verify you can reach site-b subnets
ping 10.201.1.1 # site-b management plane
ssh root@10.201.1.1 # SSH to site-b node-a over management plane
# Check BGP learned routes
vtysh -c "show ip bgp"
# Should show site-b subnets learned from 10.254.0.2
# Check BFD status
vtysh -c "show bfd peers"
10. The Dark Mode Pattern
Dark mode is the maximum-stealth configuration: no public services, no DNS records, no discoverable ports, no response to unauthenticated traffic of any kind. The only visible thing on the internet is silence where your servers should be.
What dark mode looks like to a scanner
# An attacker runs a comprehensive scan
nmap -sS -sU -p- --open 203.0.113.10
# Result: 0 open ports (or "Host seems down" with -Pn)
masscan 203.0.113.10 -p0-65535
# Result: 0 open ports
# They try ICMP
ping -c4 203.0.113.10
# Result: 0 packets received (ICMP dropped at nftables)
# They try IPv6
nmap -6 2001:db8::1
# Result: 0 open ports
# They check DNS
dig @8.8.8.8 node-a.yoursite.com
# Result: NXDOMAIN (no DNS records, no PTR records)
The complete dark mode nftables config
# /etc/nftables.conf — maximum stealth
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
# loopback only
iifname lo accept
# established/related
ct state established,related accept
# WireGuard UDP — the only open port
iifname eth0 udp dport { 51820, 51821, 51822, 51823 } accept
# All four WG planes — allow full traffic within the backplane
iifname { wg0, wg1, wg2, wg3 } accept
# Drop everything else — no ICMP, no response, nothing
# Don't use 'reject' — that gives the scanner confirmation
drop
}
chain forward { type filter hook forward priority 0; policy drop; }
chain output { type filter hook output priority 0; policy accept; }
}
# Apply and verify
nft -f /etc/nftables.conf
# Test: nmap from outside → 0 open ports
# Test: ping from outside → 0 responses
No DNS records — passive stealth
# Don't register A records for production nodes in public DNS
# If you need to reach them externally, use a bastion with a public record
# (the bastion's only open port is WireGuard — same dark mode applies)
# For IPv6: disable SLAAC responses so the address doesn't appear in passive scans
# /etc/sysctl.d/99-ipv6-stealth.conf
net.ipv6.conf.eth0.accept_ra = 0
net.ipv6.conf.eth0.autoconf = 0
sysctl -p /etc/sysctl.d/99-ipv6-stealth.conf
11. Troubleshooting Backplane Issues
The diagnostic sequence for any backplane problem is four steps: check the tunnel, check the path, check the service, check the firewall. This covers 95% of all backplane issues.
The diagnostic ladder
# Step 1: Is the WireGuard tunnel up?
wg show wg1
# Look for:
# - "latest handshake: X seconds ago" (should be under 180)
# - If no handshake: check keys, check endpoint, check UDP port on both sides
# - If handshake > 180s: tunnel is stale, likely firewall blocking UDP
# Step 2: Is the tunnel actually passing traffic?
ping -I wg1 10.201.0.2
# Use -I to force the ping over a specific interface
# If ping fails but handshake is present: check AllowedIPs, check routing
# Step 3: Is the service listening on the right address?
ss -tlnp | grep 22
# Should show: 10.201.0.x:22 (not 0.0.0.0:22)
# If showing 0.0.0.0:22 — service is not bound to the backplane, update ListenAddress
# Step 4: Is the firewall allowing it?
nft list ruleset | grep -A5 'chain input'
# Check: is the port allowed on the right interface?
Common problems and fixes
| Symptom | Most likely cause | Fix |
|---|---|---|
| No handshake after key exchange | Wrong public key in peer block | Re-copy the public key, check for whitespace |
| Handshake present, ping fails | AllowedIPs too narrow | Check that destination IP is in AllowedIPs on source |
| SSH refused on backplane IP | sshd still bound to 0.0.0.0 | Add ListenAddress to sshd_config, reload |
| Replication slow or timing out | syncoid using wrong interface | Add --sshoption BindAddress=10.203.0.x |
| Prometheus gaps in metrics | wg2 handshake stale on one peer | Check wg show wg2, restart wg-quick@wg2 |
| New node can't join mesh | UDP port blocked by host firewall | Check nftables on both sides, check cloud security group |
| Stale handshake after reboot | wg-quick service not enabled | systemctl enable wg-quick@wg0 (and wg1, wg2, wg3) |
Debugging WireGuard handshake failures
# Enable kernel WireGuard debug logging temporarily
modprobe wireguard
echo module wireguard +p > /sys/kernel/debug/dynamic_debug/control
dmesg -wT | grep wireguard
# Look for:
# "peer rejected" → wrong key
# "invalid endpoint" → can't reach the endpoint IP/port
# "replay" → packet replay attack or clock skew
# "cookie" → rate limiting triggered (scanner detected)
# Disable debug logging when done
echo module wireguard -p > /sys/kernel/debug/dynamic_debug/control
12. Complete Backplane Reference
A complete three-node, four-plane deployment with all components configured. Copy, adapt, and deploy.
Deployment checklist: new node to the backplane
- Generate four WireGuard key pairs (one per plane)
- Assign consistent last-octet IP addresses across all four planes
- Add peer blocks to all existing nodes (all four planes)
- Create four wg.conf files on the new node
- Enable wg-quick@wg0 through wg-quick@wg3 (systemctl enable --now)
- Verify handshakes on all four planes: wg show wg0; wg show wg1; wg show wg2; wg show wg3
- Update /etc/nftables.conf to include new plane IPs in allowed-peers sets
- Update Prometheus scrape targets (wg2 address)
- Configure node_exporter to bind to wg2 address only
- Configure sshd ListenAddress to wg1 address only
- Update Unbound local-data records on DNS server (wg1 address)
- Test: ping all plane addresses from all existing nodes
- Test: SSH from existing node via wg1 address
- Test: Prometheus scrapes new node via wg2 address
- Verify: nmap from outside shows zero open ports
Quarterly key rotation checklist
- Generate new key pairs on each node (use rotate-keys.sh from section 8)
- Collect new public keys from all nodes
- Update peer blocks on all nodes for each plane — do not activate yet
- Schedule a maintenance window (30 minutes, all tunnels will briefly drop)
- Apply new keys simultaneously (wg-quick down/up or wg syncconf)
- Verify handshakes on all four planes within 60 seconds
- Test SSH, Prometheus scrapes, and replication path
- Rotate PSKs separately (wg genpsk, update both ends of each peer pair)
- Document rotation date
Summary config: node-a, all four planes
# Quick reference — node-a addresses across all planes
wg0 (enrollment): 10.200.0.1/24, port 51820
wg1 (management): 10.201.0.1/24, port 51821
wg2 (monitoring): 10.202.0.1/24, port 51822
wg3 (storage): 10.203.0.1/24, port 51823
# Services and their planes:
sshd → wg1 10.201.0.1:22
node_exporter → wg2 10.202.0.1:9100
prometheus → wg2 10.202.0.1:9090 (scrapes over wg2)
grafana → wg1 10.201.0.1:3000
unbound (DNS) → wg1 10.201.0.1:53
postgresql → wg1 10.201.0.1:5432 (or wg3 for bulk replication)
syncoid → wg3 bound to 10.203.0.1
# nftables: physical interface eth0 allows only:
# UDP 51820, 51821, 51822, 51823 — the four WireGuard ports
# Everything else: drop (no reject, no ICMP, nothing)
Related pages
- WireGuard Masterclass — deep dive on keys, routing, and WireGuard internals
- WireGuard Mesh & Multi-Site — the mesh generator and multi-site topology patterns
- Networking tutorial — VXLAN, BGP, eBPF dataplane fundamentals
- nftables Masterclass — per-interface firewall rules and set-based policy
- Observability Masterclass — Prometheus, Grafana, and alerting over the monitoring plane