Masterclass

Backplane Networks Masterclass

This guide is about the invisible network that runs underneath your production services. Not the network your users see. Not the interface with a public IP. The encrypted substrate that carries SSH, database traffic, monitoring, and replication between your machines — the part that doesn't exist, as far as the internet is concerned.

If you have read the WireGuard Masterclass and understand the four-plane mesh from WireGuard Mesh & Multi-Site, this is the operational guide: how to design, build, and run that mesh in production, from a single two-node setup through to a multi-site deployment with BGP, BFD, DNS, monitoring, and ZFS replication all running through encrypted planes.

What a backplane is: A backplane is an invisible network that runs underneath your production services. Your services bind to backplane addresses. The outside world sees nothing. The physical interface only has one port open: WireGuard's UDP port, which doesn't respond to unauthenticated traffic. From the internet's perspective, your servers don't exist. From the backplane's perspective, they're running a full production stack. This is how every serious infrastructure team operates, and kldload makes it trivial.

What this guide builds: You start with two fresh kldload servers — no WireGuard, everything exposed. You finish with a three-node, four-plane encrypted backplane with DNS, Prometheus monitoring, ZFS replication, multi-site BGP routing, and a complete security posture. Every step builds on the last. Every config is complete and deployable.

The concept is simple: stop exposing services on the internet. SSH, databases, monitoring, internal APIs — they all move to an encrypted WireGuard network that only your machines can reach. The physical interface becomes a dumb pipe that carries WireGuard UDP. Everything else is invisible. This single architectural decision eliminates 90% of the attack surface most servers expose. You are not installing a firewall on top of exposed services. You are making the services unreachable from outside the encrypted tunnel in the first place. The difference is not cosmetic — it is architectural.

1. What a Backplane Is and Why You Need One

Most servers are deployed the way they were in 2005: a public IP, a few iptables rules, SSH on port 22. The attacker's job is trivial — scan the port, find a vulnerability, exploit it. The defender's job is impossible — you can't patch everything, you can't predict every exploit, and every service you expose is a surface.

A backplane inverts this entirely. The physical interface carries one thing: WireGuard UDP datagrams on a single port. WireGuard does not respond to unauthenticated traffic. There is no banner, no handshake, no error message. From the scanner's perspective, the port is closed. From the backplane's perspective, your server is running a full production stack — SSH, PostgreSQL, Prometheus, Redis, whatever you need — all bound to private WireGuard addresses that are only reachable by authenticated peers.

The physical interface

eth0 (or enp0s3, or whatever the cloud calls it) gets one inbound rule: UDP on the WireGuard port. Everything else is dropped. There is no SSH on the public interface. No HTTP. No ICMP. Just encrypted WireGuard datagrams.

// nmap -p- your-server → all ports filtered // wg show → 3 peers, all connected

The backplane interfaces

wg0, wg1, wg2, wg3 — each a separate encrypted point-to-point or mesh network. SSH listens on wg1. Prometheus scrapes over wg2. ZFS replication runs over wg3. Services bind to 10.200.x.x, not 0.0.0.0.

// sshd ListenAddress 10.201.0.1 // postgres listen_addresses = '10.201.0.1' // node_exporter --web.listen-address=10.202.0.1:9100

The access model

To reach any service, you must first authenticate to WireGuard. No valid key means no tunnel. No tunnel means no access. The WireGuard handshake IS your authentication layer — everything behind it can trust peer IPs implicitly.

// Traditional: firewall → auth → service // Backplane: WireGuard key → service (auth is the transport)

Why kldload

kldload installs WireGuard in the kernel (wireguard-tools + kernel module), configures wg-quick systemd units, and ships nftables for per-interface firewall rules — all in one installation step, across all supported distros.

// same config, same commands, same result // CentOS, Debian, Ubuntu, Rocky, RHEL, Fedora, Arch

2. Zero to Hero: Your First Backplane

Start state: two fresh kldload servers. Node A has public IP 203.0.113.10. Node B has public IP 203.0.113.20. SSH is open to the world. Everything is exposed. End state: both servers are invisible. SSH works through the backplane.

Step 1: Generate keys on both nodes

# On node-a
umask 077
wg genkey | tee /etc/wireguard/node-a-private.key | wg pubkey > /etc/wireguard/node-a-public.key
cat /etc/wireguard/node-a-private.key   # save this
cat /etc/wireguard/node-a-public.key    # save this — goes into node-b's peer block

# On node-b
umask 077
wg genkey | tee /etc/wireguard/node-b-private.key | wg pubkey > /etc/wireguard/node-b-public.key
cat /etc/wireguard/node-b-private.key   # save this
cat /etc/wireguard/node-b-public.key    # save this — goes into node-a's peer block

Step 2: Create /etc/wireguard/wg0.conf on node-a

[Interface]
Address = 10.200.0.1/24
PrivateKey = <node-a-private-key>
ListenPort = 51820

[Peer]
PublicKey = <node-b-public-key>
AllowedIPs = 10.200.0.2/32
Endpoint = 203.0.113.20:51820
PersistentKeepalive = 25

Step 3: Create /etc/wireguard/wg0.conf on node-b

[Interface]
Address = 10.200.0.2/24
PrivateKey = <node-b-private-key>
ListenPort = 51820

[Peer]
PublicKey = <node-a-public-key>
AllowedIPs = 10.200.0.1/32
Endpoint = 203.0.113.10:51820
PersistentKeepalive = 25

Step 4: Bring up the tunnel on both nodes

# On both nodes
systemctl enable --now wg-quick@wg0

# Verify
wg show wg0
# Should show: peer with handshake within last 30s

# Test connectivity
ping -c3 10.200.0.2   # from node-a
ping -c3 10.200.0.1   # from node-b

Step 5: Move SSH to the backplane

# Edit /etc/ssh/sshd_config on both nodes — add:
ListenAddress 10.200.0.1   # node-a uses its WG address
# ListenAddress 10.200.0.2   # node-b uses its WG address

# IMPORTANT: do not remove the public ListenAddress yet
# Verify WG connectivity before cutting over

# Reload SSH
systemctl reload sshd

# Test: open a NEW SSH session through the backplane while keeping your existing session
ssh -i ~/.ssh/id_rsa user@10.200.0.1

# If that works, remove the public ListenAddress
# Edit /etc/ssh/sshd_config — remove the 0.0.0.0 line
systemctl reload sshd

Step 6: Move other services to the backplane

# PostgreSQL — edit /etc/postgresql/*/main/postgresql.conf (Debian)
# or /var/lib/pgsql/data/postgresql.conf (RHEL/Rocky)
listen_addresses = '10.200.0.1'   # on node-a

# Prometheus — edit /etc/prometheus/prometheus.yml
# Change --web.listen-address to:
--web.listen-address=10.200.0.1:9090

# nginx admin vhost — change listen directive
server {
  listen 10.200.0.1:8080;
  # ...
}

Step 7: Lock down the physical interface with nftables

# /etc/nftables.conf — replace on both nodes
table inet filter {
  chain input {
    type filter hook input priority 0; policy drop;

    # loopback — always allow
    iifname lo accept

    # established/related — allow return traffic
    ct state established,related accept

    # WireGuard UDP — the only open port on the physical interface
    iifname eth0 udp dport 51820 accept

    # everything on backplane interfaces — allow
    iifname wg0 accept

    # drop everything else
    drop
  }

  chain forward {
    type filter hook forward priority 0; policy drop;
  }

  chain output {
    type filter hook output priority 0; policy accept;
  }
}

# Apply
nft -f /etc/nftables.conf
systemctl enable --now nftables

Step 8: Verify from outside

# From a machine that is NOT in the backplane:
nmap -p- 203.0.113.10
# Result: all ports filtered — server does not exist

nmap -sU -p 51820 203.0.113.10
# Result: open|filtered — WireGuard responds to nothing

# From node-b (inside the backplane):
ssh 10.200.0.1   # works instantly
psql -h 10.200.0.1 -U postgres   # works
curl http://10.200.0.1:8080/     # works

You just made two servers invisible. nmap shows zero open ports from the outside. SSH works through the backplane. PostgreSQL works through the backplane. But from the internet, these servers don't exist. This took about 10 minutes and zero dollars. The key insight: you did not add a firewall on top of exposed services. You moved the services to an address that is only reachable through an authenticated encrypted tunnel, then dropped everything on the public interface except the tunnel's UDP port. There is no service to attack on the public interface. There is nothing to brute force. There is nothing to scan. The attack surface is literally zero visible ports — and even that port responds with silence to unauthenticated probes.

3. Adding a Third Node (and Beyond)

Adding a third node to a two-node mesh is straightforward. Adding a twentieth node by hand is tedious. This section covers both — the manual process for small meshes, and a script for anything larger.

Adding node-c manually

# On node-c: generate keys
umask 077
wg genkey | tee /etc/wireguard/node-c-private.key | wg pubkey > /etc/wireguard/node-c-public.key

# /etc/wireguard/wg0.conf on node-c:
[Interface]
Address = 10.200.0.3/24
PrivateKey = <node-c-private-key>
ListenPort = 51820

[Peer]
# node-a
PublicKey = <node-a-public-key>
AllowedIPs = 10.200.0.1/32
Endpoint = 203.0.113.10:51820
PersistentKeepalive = 25

[Peer]
# node-b
PublicKey = <node-b-public-key>
AllowedIPs = 10.200.0.2/32
Endpoint = 203.0.113.20:51820
PersistentKeepalive = 25

# Add node-c to node-a's wg0.conf:
[Peer]
# node-c
PublicKey = <node-c-public-key>
AllowedIPs = 10.200.0.3/32
Endpoint = 203.0.113.30:51820
PersistentKeepalive = 25

# Add node-c to node-b's wg0.conf (same block, different endpoint)

# Reload WireGuard on all three nodes — no downtime
wg syncconf wg0 <(wg-quick strip wg0)

The kvm-clone trick

When deploying from a kldload golden image, the cleanest approach is to include WireGuard in the template with everything except the private key and the node-specific IP. On first boot, a cloud-init script generates the key pair, assigns the next IP from your address space, and distributes the public key to existing nodes via SSH.

# cloud-init user-data snippet (template nodes):
runcmd:
  - umask 077 && wg genkey | tee /etc/wireguard/wg0.key | wg pubkey > /etc/wireguard/wg0.pub
  - /usr/local/sbin/join-backplane.sh

add-to-mesh.sh — for anything over 5 nodes

#!/bin/bash
# add-to-mesh.sh — add a new node to the wg0 mesh
# Usage: ./add-to-mesh.sh <new-node-hostname> <new-node-ip> <new-node-endpoint>

set -euo pipefail

NEW_HOST="$1"
NEW_WG_IP="$2"
NEW_ENDPOINT="$3"
EXISTING_NODES=(node-a node-b)   # add node hostnames here

# Generate keys on the new node
ssh "$NEW_HOST" 'umask 077; wg genkey | tee /etc/wireguard/wg0.key | wg pubkey > /etc/wireguard/wg0.pub'
NEW_PUBKEY=$(ssh "$NEW_HOST" 'cat /etc/wireguard/wg0.pub')

echo "New node public key: $NEW_PUBKEY"

# Add new node as peer to all existing nodes
for node in "${EXISTING_NODES[@]}"; do
  echo "Adding $NEW_HOST to $node..."
  ssh "$node" "wg set wg0 peer '$NEW_PUBKEY' allowed-ips '${NEW_WG_IP}/32' endpoint '${NEW_ENDPOINT}:51820' persistent-keepalive 25"
  # Persist to wg0.conf
  ssh "$node" "wg-quick save wg0"
done

# Build wg0.conf for the new node
CONF="[Interface]
Address = ${NEW_WG_IP}/24
PrivateKey = \$(cat /etc/wireguard/wg0.key)
ListenPort = 51820
"

for node in "${EXISTING_NODES[@]}"; do
  NODE_PUBKEY=$(ssh "$node" 'cat /etc/wireguard/wg0.pub')
  NODE_WG_IP=$(ssh "$node" 'wg show wg0 | grep "address:" | awk "{print \$2}" | cut -d/ -f1')
  NODE_ENDPOINT=$(ssh "$node" 'curl -s ifconfig.me')
  CONF+="
[Peer]
# $node
PublicKey = $NODE_PUBKEY
AllowedIPs = ${NODE_WG_IP}/32
Endpoint = ${NODE_ENDPOINT}:51820
PersistentKeepalive = 25
"
done

ssh "$NEW_HOST" "echo '$CONF' > /etc/wireguard/wg0.conf"
ssh "$NEW_HOST" "systemctl enable --now wg-quick@wg0"
echo "Done. $NEW_HOST is now in the mesh."

The manual threshold is about 5 to 7 nodes. Below that, copy-pasting peer blocks takes 5 minutes and is easy to reason about. Above that, you need a script or config management — the combinatorial explosion of peer blocks becomes error-prone. The mesh generator above scales to around 20 nodes. Past that, look at a coordination service (Consul, etcd) or kldload's built-in fleet enrollment, which handles key distribution automatically. The sweet spot for the manual approach is the two-to-five node setup that most teams actually run — a couple of app servers, a database, a monitoring host. That mesh fits in your head and in a handful of config files.

4. Multiple Planes (Traffic Isolation)

One WireGuard interface carries everything. That works at small scale. At production scale, it creates two problems: performance (high-bandwidth replication traffic competing with SSH makes interactive sessions feel degraded) and security (a compromised management plane means a compromised everything plane).

The solution is multiple planes — separate WireGuard interfaces, separate key pairs, separate address spaces, separate firewall rules. Traffic is isolated at the network layer, not the application layer.

The four-plane pattern

Interface	Subnet	Port	Traffic
wg0	10.200.0.0/24	51820	Enrollment / bootstrapping new nodes
wg1	10.201.0.0/24	51821	Management — SSH, admin APIs, config push
wg2	10.202.0.0/24	51822	Monitoring — Prometheus scrapes, Grafana, alerting
wg3	10.203.0.0/24	51823	Storage — ZFS replication, database sync, bulk data

Complete four-plane config for node-a (three-node cluster)

# /etc/wireguard/wg0.conf — enrollment plane
[Interface]
Address = 10.200.0.1/24
PrivateKey = <node-a-wg0-private>
ListenPort = 51820

[Peer]
# node-b
PublicKey = <node-b-wg0-public>
AllowedIPs = 10.200.0.2/32
Endpoint = 203.0.113.20:51820
PersistentKeepalive = 25

[Peer]
# node-c
PublicKey = <node-c-wg0-public>
AllowedIPs = 10.200.0.3/32
Endpoint = 203.0.113.30:51820
PersistentKeepalive = 25

# /etc/wireguard/wg1.conf — management plane (SSH lives here)
[Interface]
Address = 10.201.0.1/24
PrivateKey = <node-a-wg1-private>   # DIFFERENT key pair from wg0
ListenPort = 51821

[Peer]
PublicKey = <node-b-wg1-public>
AllowedIPs = 10.201.0.2/32
Endpoint = 203.0.113.20:51821
PersistentKeepalive = 25

[Peer]
PublicKey = <node-c-wg1-public>
AllowedIPs = 10.201.0.3/32
Endpoint = 203.0.113.30:51821
PersistentKeepalive = 25

# /etc/wireguard/wg2.conf — monitoring plane
[Interface]
Address = 10.202.0.1/24
PrivateKey = <node-a-wg2-private>
ListenPort = 51822

[Peer]
PublicKey = <node-b-wg2-public>
AllowedIPs = 10.202.0.2/32
Endpoint = 203.0.113.20:51822
PersistentKeepalive = 25

[Peer]
PublicKey = <node-c-wg2-public>
AllowedIPs = 10.202.0.3/32
Endpoint = 203.0.113.30:51822
PersistentKeepalive = 25

# /etc/wireguard/wg3.conf — storage plane
[Interface]
Address = 10.203.0.1/24
PrivateKey = <node-a-wg3-private>
ListenPort = 51823

[Peer]
PublicKey = <node-b-wg3-public>
AllowedIPs = 10.203.0.2/32
Endpoint = 203.0.113.20:51823
PersistentKeepalive = 25

[Peer]
PublicKey = <node-c-wg3-public>
AllowedIPs = 10.203.0.3/32
Endpoint = 203.0.113.30:51823
PersistentKeepalive = 25

# Enable all four planes
for iface in wg0 wg1 wg2 wg3; do
  systemctl enable --now wg-quick@$iface
done

Binding services to specific planes

# /etc/ssh/sshd_config — SSH on management plane only
ListenAddress 10.201.0.1

# /etc/prometheus/prometheus.yml — Prometheus on monitoring plane only
# (see section 6 for full config)

# PostgreSQL — on management or storage plane depending on access pattern
# /etc/postgresql/*/main/postgresql.conf
listen_addresses = '10.201.0.1,127.0.0.1'

# node_exporter — on monitoring plane only
ExecStart=/usr/bin/node_exporter \
  --web.listen-address=10.202.0.1:9100

nftables per-plane access control

# /etc/nftables.conf — four-plane rules
table inet filter {
  chain input {
    type filter hook input priority 0; policy drop;

    iifname lo accept
    ct state established,related accept

    # Physical interface: only WireGuard UDP on all four ports
    iifname eth0 udp dport { 51820, 51821, 51822, 51823 } accept
    iifname eth0 drop

    # wg0 (enrollment) — limited: only SSH and wg-management traffic
    iifname wg0 tcp dport 22 accept
    iifname wg0 drop

    # wg1 (management) — SSH, admin APIs, DNS
    iifname wg1 tcp dport { 22, 8080, 8443, 53 } accept
    iifname wg1 udp dport 53 accept
    iifname wg1 drop

    # wg2 (monitoring) — Prometheus, node_exporter, alertmanager
    iifname wg2 tcp dport { 9090, 9100, 9093, 3000 } accept
    iifname wg2 drop

    # wg3 (storage) — ZFS replication SSH, database sync
    iifname wg3 tcp dport { 22, 5432, 3306 } accept
    iifname wg3 drop
  }

  chain forward {
    type filter hook forward priority 0; policy drop;
  }

  chain output {
    type filter hook output priority 0; policy accept;
  }
}

Four planes sounds like overengineering until you have had the incident. The scenario: Prometheus is scraping 15 nodes every 15 seconds. Node_exporter on one host has a bug that causes it to return 200MB of metrics. Prometheus retries. The wg0 interface saturates. SSH becomes unusable. You cannot diagnose the problem because the diagnostic tools use the same overloaded tunnel. With four planes, monitoring floods wg2 while wg1 — management — is completely unaffected. SSH feels instant. You can pull up Grafana (which is also on wg1) and see exactly what is happening. The failure domain is isolated by design. Four interfaces, four failure domains. Management always works, even when data traffic is melting down.

5. IP Addressing and DNS

The backplane address scheme is infrastructure — it should be planned once and never changed. Changing a WireGuard IP means updating every peer config on every node. Plan the space upfront, leave room to grow, and document it.

Recommended address scheme

# Four planes, each with a /24
10.200.0.0/24   wg0 — enrollment
10.201.0.0/24   wg1 — management / SSH
10.202.0.0/24   wg2 — monitoring
10.203.0.0/24   wg3 — storage

# Node assignment (consistent last-octet across all planes)
10.200.0.1 / 10.201.0.1 / 10.202.0.1 / 10.203.0.1  →  node-a
10.200.0.2 / 10.201.0.2 / 10.202.0.2 / 10.203.0.2  →  node-b
10.200.0.3 / 10.201.0.3 / 10.202.0.3 / 10.203.0.3  →  node-c

# Multi-site: use the second octet for site
10.200.0.0/24   site-a, wg0
10.200.1.0/24   site-b, wg0
10.201.0.0/24   site-a, wg1
10.201.1.0/24   site-b, wg1

DNS for backplane hosts: Unbound on wg1

Run one Unbound instance on the management plane. All nodes use it as their resolver. You get human-readable hostnames for all backplane addresses.

# Install Unbound on node-a (the DNS server)
# CentOS/Rocky/RHEL
dnf install -y unbound

# Debian/Ubuntu
apt install -y unbound

# /etc/unbound/unbound.conf — backplane DNS server on node-a
server:
  interface: 10.201.0.1       # only listen on management plane
  interface: 127.0.0.1
  access-control: 10.201.0.0/24 allow   # only wg1 peers can query
  access-control: 127.0.0.1/32 allow

  # Local zone for backplane hostnames
  local-zone: "mgmt." static
  local-data: "node-a.mgmt.  IN A 10.201.0.1"
  local-data: "node-b.mgmt.  IN A 10.201.0.2"
  local-data: "node-c.mgmt.  IN A 10.201.0.3"

  local-zone: "mon." static
  local-data: "node-a.mon.   IN A 10.202.0.1"
  local-data: "node-b.mon.   IN A 10.202.0.2"
  local-data: "node-c.mon.   IN A 10.202.0.3"

  local-zone: "store." static
  local-data: "node-a.store. IN A 10.203.0.1"
  local-data: "node-b.store. IN A 10.203.0.2"
  local-data: "node-c.store. IN A 10.203.0.3"

  # Forward public DNS to upstream
  forward-zone:
    name: "."
    forward-addr: 1.1.1.1
    forward-addr: 8.8.8.8

systemctl enable --now unbound

# On all other nodes: point resolver at the backplane DNS server
# /etc/resolv.conf (or managed by systemd-resolved / NetworkManager)
nameserver 10.201.0.1
search mgmt. mon. store.

Dynamic registration with PostUp nsupdate

When nodes bring up their WireGuard interface, a PostUp hook registers their address with the DNS server automatically. This is useful for dynamic environments where IPs are assigned at boot.

# /etc/wireguard/wg1.conf — on any node that needs dynamic DNS registration
[Interface]
Address = 10.201.0.5/24
PrivateKey = <private-key>
ListenPort = 51821
PostUp = /usr/local/sbin/register-dns.sh wg1 10.201.0.5 $(hostname -s).mgmt.
PreDown = /usr/local/sbin/deregister-dns.sh $(hostname -s).mgmt.

#!/bin/bash
# /usr/local/sbin/register-dns.sh
# Usage: register-dns.sh <iface> <ip> <fqdn>
IFACE="$1"; IP="$2"; FQDN="$3"
nsupdate -k /etc/unbound/tsig.key <<EOF
server 10.201.0.1
update delete $FQDN A
update add    $FQDN 300 A $IP
send
EOF

The number one complaint after building a WireGuard mesh is: "I cannot remember which IP is which server." DNS fixes this. One Unbound instance on the management plane, all nodes point at it. Now you SSH to node-b.mgmt instead of 10.201.0.2. You connect Prometheus to node-c.mon:9100 instead of 10.202.0.3:9100. The zone names are short by design — .mgmt., .mon., .store. are two-character TLDs that tab-complete quickly. PostUp nsupdate means nodes register themselves when the tunnel comes up, so a new node added via add-to-mesh.sh is immediately resolvable by name, no manual DNS edits required.

6. Monitoring Through the Backplane

Prometheus scrapes over wg2. Grafana is accessible over wg1. node_exporter binds only to the monitoring plane address. Nothing monitoring-related is visible on the public interface or on the management plane.

node_exporter on every node — bound to wg2 only

# /etc/systemd/system/node_exporter.service.d/backplane.conf
[Service]
ExecStart=
ExecStart=/usr/bin/node_exporter \
  --web.listen-address=10.202.0.1:9100 \
  --collector.systemd \
  --collector.processes \
  --no-collector.wifi

# Apply
systemctl daemon-reload
systemctl restart node_exporter

# Verify — should NOT appear on public or management interfaces
ss -tlnp | grep 9100
# tcp LISTEN 0 128 10.202.0.1:9100  *:*  users:(("node_exporter",...))

Prometheus scrape config using backplane addresses

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

# Prometheus itself bound to monitoring plane
# Start with: --web.listen-address=10.202.0.1:9090

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets:
          - 'node-a.mon:9100'
          - 'node-b.mon:9100'
          - 'node-c.mon:9100'
    relabel_configs:
      - source_labels: [__address__]
        regex: '([^.]+)\.mon:.*'
        target_label: instance
        replacement: '$1'

  - job_name: 'wireguard'
    static_configs:
      - targets:
          - 'node-a.mon:9586'   # prometheus-wireguard-exporter
          - 'node-b.mon:9586'
          - 'node-c.mon:9586'

  - job_name: 'prometheus'
    static_configs:
      - targets: ['node-a.mon:9090']

WireGuard handshake monitoring

# Install prometheus-wireguard-exporter
# Available at: https://github.com/MindFlavor/prometheus_wireguard_exporter

# /etc/systemd/system/prometheus-wireguard-exporter.service
[Unit]
Description=Prometheus WireGuard Exporter
After=network.target wg-quick@wg0.service

[Service]
ExecStart=/usr/local/bin/prometheus_wireguard_exporter \
  -a 10.202.0.1:9586 \
  -n /etc/wireguard/wg0.conf \
  -n /etc/wireguard/wg1.conf \
  -n /etc/wireguard/wg2.conf \
  -n /etc/wireguard/wg3.conf
Restart=always

[Install]
WantedBy=multi-user.target

# Prometheus alert rule: peer handshake too old
groups:
  - name: wireguard
    rules:
      - alert: WireGuardPeerHandshakeStale
        expr: time() - wireguard_latest_handshake_seconds > 180
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "WireGuard peer handshake stale on {{ $labels.instance }}"
          description: "Peer {{ $labels.public_key }} last handshake {{ $value | humanizeDuration }} ago"

Grafana on the management plane

# /etc/grafana/grafana.ini
[server]
http_addr = 10.201.0.1   # management plane only
http_port = 3000

# Datasource: connect Grafana to Prometheus over monitoring plane
# URL: http://10.202.0.1:9090

Monitoring traffic belongs on wg2, not wg1. This is not theoretical. Prometheus scraping 20 nodes every 15 seconds generates a constant stream of small connections. If that traffic shares wg1 with SSH, your SSH sessions get microsecond jitter — barely perceptible but enough to make interactive work feel slightly off. Large metric payloads (node_exporter with all collectors enabled can return 300KB+) make it worse. Separate planes, separate experience. wg2 can be as noisy as it wants. wg1 stays quiet and responsive. This is why the four-plane pattern exists.

7. ZFS Replication Through the Backplane

ZFS replication (via syncoid) is the highest-bandwidth traffic on your backplane. A daily snapshot delta can be anywhere from a few gigabytes to tens of gigabytes. This traffic belongs on wg3 (storage plane) — isolated from management, monitoring, and enrollment traffic.

SSH keys for replication — dedicated key pair

# On node-a (the source): generate a dedicated replication key
ssh-keygen -t ed25519 -f /etc/zfs/replication.key -N '' -C 'zfs-replication@node-a'

# On node-b (the target): install the public key with restricted command
cat >> /root/.ssh/authorized_keys <<'EOF'
command="sudo /usr/sbin/zfs receive -F -d rpool",no-port-forwarding,no-X11-forwarding,no-agent-forwarding <replication-pubkey-here>
EOF

# The command= restriction means this key can ONLY run zfs receive.
# Even if the private key is stolen, it cannot open a shell.

syncoid configuration — storage plane only

# Test manually first
syncoid \
  --sshkey /etc/zfs/replication.key \
  --sshoption "BindAddress=10.203.0.1" \
  rpool/data root@10.203.0.2:rpool/replicas/node-a

# --sshoption BindAddress=10.203.0.1 forces SSH to use the storage plane interface
# SSH goes out on wg3, not wg1 — replication traffic stays on the storage plane

Systemd timer for automated replication

# /etc/systemd/system/zfs-replication.service
[Unit]
Description=ZFS syncoid replication to node-b
After=network.target wg-quick@wg3.service
Wants=wg-quick@wg3.service

[Service]
Type=oneshot
ExecStart=/usr/sbin/syncoid \
  --sshkey /etc/zfs/replication.key \
  --sshoption "BindAddress=10.203.0.1" \
  --no-privilege-elevation \
  --compress lz4 \
  rpool/data root@10.203.0.2:rpool/replicas/node-a
StandardOutput=journal
StandardError=journal

# /etc/systemd/system/zfs-replication.timer
[Unit]
Description=Run ZFS replication every 4 hours

[Timer]
OnBootSec=15min
OnUnitActiveSec=4h
RandomizedDelaySec=5min

[Install]
WantedBy=timers.target

# Enable
systemctl daemon-reload
systemctl enable --now zfs-replication.timer

Monitor replication lag via Prometheus

#!/bin/bash
# /usr/local/sbin/zfs-replication-check.sh
# Outputs a metric: zfs_replication_lag_seconds{dataset="rpool/data"}

DATASET="rpool/data"
LAST_SNAP=$(zfs list -H -t snapshot -o name,creation -s creation "$DATASET" | tail -1)
SNAP_TIME=$(date -d "$(echo "$LAST_SNAP" | awk '{print $2, $3, $4, $5, $6}')" +%s 2>/dev/null || echo 0)
NOW=$(date +%s)
LAG=$((NOW - SNAP_TIME))

echo "# HELP zfs_replication_lag_seconds Age of the most recent ZFS snapshot"
echo "# TYPE zfs_replication_lag_seconds gauge"
echo "zfs_replication_lag_seconds{dataset=\"$DATASET\"} $LAG"

# Expose via node_exporter textfile collector
# /etc/systemd/system/zfs-replication-metrics.timer
[Timer]
OnBootSec=1min
OnUnitActiveSec=5min

[Install]
WantedBy=timers.target

# Service writes to /var/lib/node_exporter/textfile_collector/
ExecStart=/bin/bash -c '/usr/local/sbin/zfs-replication-check.sh > /var/lib/node_exporter/textfile_collector/zfs_replication.prom.tmp && mv /var/lib/node_exporter/textfile_collector/zfs_replication.prom.tmp /var/lib/node_exporter/textfile_collector/zfs_replication.prom'

ZFS replication is the highest-bandwidth traffic on your backplane by a wide margin. A daily snapshot delta is typically 10 to 50GB — sometimes more if you have a busy database. Running this over the same WireGuard interface as SSH and Prometheus is a bad time. The WireGuard interface becomes saturated, SSH sessions stall, and you end up with replication failures because the timeout expires before the transfer completes. wg3 (storage plane) exists specifically for this use case. The storage plane can saturate the physical link — gigabit, 10G, whatever you have — and SSH on wg1 still feels instant because it is on a completely separate encrypted tunnel with its own queue and its own kernel interface. The BindAddress option on syncoid is the key detail: it forces SSH outbound through the storage plane interface rather than the default route, which would use the management plane.

8. Backplane Security Hardening

A properly built backplane has a small, well-defined attack surface. This section covers the hardening steps that take it from "pretty secure" to "as secure as you can reasonably make it."

Pre-shared keys on every peer (post-quantum protection)

# Generate a PSK for each peer pair
wg genpsk > /etc/wireguard/psk-node-a-node-b.key

# Add to both sides of the peer block
[Peer]
PublicKey = <pubkey>
PresharedKey = <psk-value>
AllowedIPs = 10.200.0.2/32
Endpoint = 203.0.113.20:51820

A PSK is a symmetric secret shared between two peers, on top of the asymmetric key exchange. If quantum computers ever break Curve25519 (WireGuard's DH algorithm), the PSK layer remains secure as long as the PSK itself hasn't been compromised. The PSK adds no latency and minimal CPU overhead — there is no reason not to use it.

Key rotation strategy

#!/bin/bash
# rotate-keys.sh — rotate WireGuard keys on all nodes quarterly
# Run on each node; distribute new public keys to peers

set -euo pipefail
IFACE="${1:-wg0}"
KEYFILE="/etc/wireguard/${IFACE}.key"
PUBFILE="/etc/wireguard/${IFACE}.pub"

# Generate new key pair
umask 077
wg genkey | tee "${KEYFILE}.new" | wg pubkey > "${PUBFILE}.new"

NEW_PUB=$(cat "${PUBFILE}.new")
echo "New public key for ${IFACE}: $NEW_PUB"
echo "Distribute this key to all peers before proceeding."
echo "Press Enter when peers are updated, Ctrl-C to abort."
read -r

# Atomically replace the key
mv "${KEYFILE}.new" "${KEYFILE}"
mv "${PUBFILE}.new" "${PUBFILE}"

# Reload WireGuard with new identity
wg-quick down "$IFACE" && wg-quick up "$IFACE"
echo "Keys rotated. Verify handshakes: wg show $IFACE"

Detecting unauthorized IPs on backplane interfaces

# nftables: drop traffic from IPs not in our AllowedIPs list
# This is defense-in-depth — WireGuard already enforces AllowedIPs at the crypto level
# This nftables rule catches any misconfiguration

table inet backplane-guard {
  set wg1-allowed-peers {
    type ipv4_addr
    elements = { 10.201.0.1, 10.201.0.2, 10.201.0.3 }
  }

  chain wg1-input {
    type filter hook input priority -10; policy accept;
    iifname wg1 ip saddr != @wg1-allowed-peers log prefix "BACKPLANE-UNEXPECTED: " drop
  }
}

# Monitor for unexpected backplane IPs in the kernel log
journalctl -k -f | grep BACKPLANE-UNEXPECTED

fail2ban: not needed — here is why

fail2ban scans logs for failed authentication attempts and bans the source IP. On a properly configured backplane, there are no authentication attempts to fail. WireGuard does not respond to unauthenticated traffic. There is no banner, no challenge, no error message. The scanner sends a UDP datagram to port 51820. WireGuard checks the cryptographic handshake. If the peer is unknown, the datagram is silently discarded. No log entry. Nothing to scan. fail2ban has nothing to do.

On a properly configured backplane, fail2ban is useless. WireGuard does not respond to unauthenticated traffic — there is nothing to brute force. The actual attack surface reduces to: steal a private key. If someone steals a key, they get access to one plane (because keys are per-plane, and planes are isolated by nftables). Pre-shared keys add a second layer that survives quantum key compromise. The full security model, in priority order: crypto first (WireGuard + PSK), firewall second (nftables drops everything not WireGuard UDP on the physical interface), monitoring third (stale handshake alerts). Intrusion detection systems, WAFs, fail2ban — none of these apply to a WireGuard backplane because the thing they protect against (exposed services on the public network) does not exist.

9. Multi-Site Backplanes

Two sites, each with their own local mesh, connected by a site-to-site WireGuard tunnel with BGP routing between them. Each site has its own subnets. Each site's management plane can reach the other's. Failure of the site-to-site link does not affect intra-site connectivity.

Site subnet design

# Site A — datacenter
10.200.0.0/24   site-a wg0 (enrollment)
10.201.0.0/24   site-a wg1 (management)
10.202.0.0/24   site-a wg2 (monitoring)
10.203.0.0/24   site-a wg3 (storage)

# Site B — cloud / secondary site
10.200.1.0/24   site-b wg0
10.201.1.0/24   site-b wg1
10.202.1.0/24   site-b wg2
10.203.1.0/24   site-b wg3

# Site-to-site tunnel: dedicated interface
10.254.0.0/30   site-to-site link (wg-site)
10.254.0.1      site-a gateway
10.254.0.2      site-b gateway

Site-to-site WireGuard tunnel

# /etc/wireguard/wg-site.conf on site-a gateway node
[Interface]
Address = 10.254.0.1/30
PrivateKey = <site-a-gateway-private>
ListenPort = 51830

[Peer]
# site-b gateway
PublicKey = <site-b-gateway-public>
AllowedIPs = 10.254.0.0/30, 10.200.1.0/24, 10.201.1.0/24, 10.202.1.0/24, 10.203.1.0/24
Endpoint = <site-b-public-ip>:51830
PersistentKeepalive = 10

BGP between sites with FRRouting

# /etc/frr/frr.conf on site-a gateway
frr defaults traditional
hostname site-a-gw

router bgp 65001
  bgp router-id 10.254.0.1
  no bgp default ipv4-unicast

  neighbor 10.254.0.2 remote-as 65002
  neighbor 10.254.0.2 description "site-b gateway"
  neighbor 10.254.0.2 timers 10 30
  neighbor 10.254.0.2 timers connect 10

  address-family ipv4 unicast
    network 10.200.0.0/24
    network 10.201.0.0/24
    network 10.202.0.0/24
    network 10.203.0.0/24
    neighbor 10.254.0.2 activate
    neighbor 10.254.0.2 soft-reconfiguration inbound
  exit-address-family

# /etc/frr/frr.conf on site-b gateway
router bgp 65002
  bgp router-id 10.254.0.2
  no bgp default ipv4-unicast

  neighbor 10.254.0.1 remote-as 65001
  neighbor 10.254.0.1 description "site-a gateway"
  neighbor 10.254.0.1 timers 10 30
  neighbor 10.254.0.1 timers connect 10

  address-family ipv4 unicast
    network 10.200.1.0/24
    network 10.201.1.0/24
    network 10.202.1.0/24
    network 10.203.1.0/24
    neighbor 10.254.0.1 activate
    neighbor 10.254.0.1 soft-reconfiguration inbound
  exit-address-family

BFD for fast failure detection

# /etc/frr/frr.conf — add BFD to the BGP neighbor (on both gateways)
router bgp 65001
  neighbor 10.254.0.2 bfd

bfd
  peer 10.254.0.2
    detect-multiplier 3
    receive-interval 300
    transmit-interval 300
  !

# BFD detects link failure in 300ms * 3 = 900ms
# Without BFD: BGP hold timer = 30s — you wait 30 seconds to detect failure
# With BFD: failure detected in under 1 second, BGP reconverges immediately

Verify multi-site routing

# On site-a, verify you can reach site-b subnets
ping 10.201.1.1   # site-b management plane
ssh root@10.201.1.1   # SSH to site-b node-a over management plane

# Check BGP learned routes
vtysh -c "show ip bgp"
# Should show site-b subnets learned from 10.254.0.2

# Check BFD status
vtysh -c "show bfd peers"

Multi-site backplanes are where the full kldload networking stack comes together. WireGuard provides encrypted transport between sites — the wg-site interface is the physical-to-logical bridge across the internet. BGP exchanges routes so each site's gateway knows about the other's subnets. BFD detects link failures in milliseconds rather than the 30-second BGP hold timer. VXLAN (optional) extends Layer 2 across sites for live VM migration — the VM keeps its IP as it crosses the WireGuard tunnel. Each layer is independent. WireGuard encrypts. BGP routes. BFD detects. VXLAN extends. A failure at any one layer does not cascade to the others. The site-to-site link can fail and reconnect; BGP reconverges within a second. A gateway node can reboot; the BGP session restarts, routes are re-advertised, and traffic flows again. This is what "resilient infrastructure" looks like at the network layer.

10. The Dark Mode Pattern

Dark mode is the maximum-stealth configuration: no public services, no DNS records, no discoverable ports, no response to unauthenticated traffic of any kind. The only visible thing on the internet is silence where your servers should be.

What dark mode looks like to a scanner

# An attacker runs a comprehensive scan
nmap -sS -sU -p- --open 203.0.113.10
# Result: 0 open ports (or "Host seems down" with -Pn)

masscan 203.0.113.10 -p0-65535
# Result: 0 open ports

# They try ICMP
ping -c4 203.0.113.10
# Result: 0 packets received (ICMP dropped at nftables)

# They try IPv6
nmap -6 2001:db8::1
# Result: 0 open ports

# They check DNS
dig @8.8.8.8 node-a.yoursite.com
# Result: NXDOMAIN (no DNS records, no PTR records)

The complete dark mode nftables config

# /etc/nftables.conf — maximum stealth
table inet filter {
  chain input {
    type filter hook input priority 0; policy drop;

    # loopback only
    iifname lo accept

    # established/related
    ct state established,related accept

    # WireGuard UDP — the only open port
    iifname eth0 udp dport { 51820, 51821, 51822, 51823 } accept

    # All four WG planes — allow full traffic within the backplane
    iifname { wg0, wg1, wg2, wg3 } accept

    # Drop everything else — no ICMP, no response, nothing
    # Don't use 'reject' — that gives the scanner confirmation
    drop
  }

  chain forward { type filter hook forward priority 0; policy drop; }
  chain output  { type filter hook output  priority 0; policy accept; }
}

# Apply and verify
nft -f /etc/nftables.conf
# Test: nmap from outside → 0 open ports
# Test: ping from outside → 0 responses

No DNS records — passive stealth

# Don't register A records for production nodes in public DNS
# If you need to reach them externally, use a bastion with a public record
# (the bastion's only open port is WireGuard — same dark mode applies)

# For IPv6: disable SLAAC responses so the address doesn't appear in passive scans
# /etc/sysctl.d/99-ipv6-stealth.conf
net.ipv6.conf.eth0.accept_ra = 0
net.ipv6.conf.eth0.autoconf = 0

sysctl -p /etc/sysctl.d/99-ipv6-stealth.conf

Dark mode is not paranoia — it is the security posture that every government and financial infrastructure uses. If your servers do not exist on the internet, they cannot be port-scanned, cannot be fingerprinted, cannot be exploited via exposed services. The attack surface is literally zero visible ports. WireGuard is the airlock: authenticated peers get in, everything else gets nothing. Not a rejection, not an error, not a RST. Silence. The scanner cannot tell the difference between a server that is down, a server that is firewalled, and a server that is running a full production stack invisible behind WireGuard. That ambiguity is itself a security property — it wastes attacker time and defeats fingerprinting tools. The cost of this posture: zero. It takes two nftables rules and fifteen minutes to configure. The return: you have removed your infrastructure from the internet's threat model entirely.

11. Troubleshooting Backplane Issues

The diagnostic sequence for any backplane problem is four steps: check the tunnel, check the path, check the service, check the firewall. This covers 95% of all backplane issues.

The diagnostic ladder

# Step 1: Is the WireGuard tunnel up?
wg show wg1
# Look for:
# - "latest handshake: X seconds ago" (should be under 180)
# - If no handshake: check keys, check endpoint, check UDP port on both sides
# - If handshake > 180s: tunnel is stale, likely firewall blocking UDP

# Step 2: Is the tunnel actually passing traffic?
ping -I wg1 10.201.0.2
# Use -I to force the ping over a specific interface
# If ping fails but handshake is present: check AllowedIPs, check routing

# Step 3: Is the service listening on the right address?
ss -tlnp | grep 22
# Should show: 10.201.0.x:22 (not 0.0.0.0:22)
# If showing 0.0.0.0:22 — service is not bound to the backplane, update ListenAddress

# Step 4: Is the firewall allowing it?
nft list ruleset | grep -A5 'chain input'
# Check: is the port allowed on the right interface?

Common problems and fixes

Symptom	Most likely cause	Fix
No handshake after key exchange	Wrong public key in peer block	Re-copy the public key, check for whitespace
Handshake present, ping fails	AllowedIPs too narrow	Check that destination IP is in AllowedIPs on source
SSH refused on backplane IP	sshd still bound to 0.0.0.0	Add ListenAddress to sshd_config, reload
Replication slow or timing out	syncoid using wrong interface	Add --sshoption BindAddress=10.203.0.x
Prometheus gaps in metrics	wg2 handshake stale on one peer	Check wg show wg2, restart wg-quick@wg2
New node can't join mesh	UDP port blocked by host firewall	Check nftables on both sides, check cloud security group
Stale handshake after reboot	wg-quick service not enabled	systemctl enable wg-quick@wg0 (and wg1, wg2, wg3)

Debugging WireGuard handshake failures

# Enable kernel WireGuard debug logging temporarily
modprobe wireguard
echo module wireguard +p > /sys/kernel/debug/dynamic_debug/control
dmesg -wT | grep wireguard

# Look for:
# "peer rejected" → wrong key
# "invalid endpoint" → can't reach the endpoint IP/port
# "replay" → packet replay attack or clock skew
# "cookie" → rate limiting triggered (scanner detected)

# Disable debug logging when done
echo module wireguard -p > /sys/kernel/debug/dynamic_debug/control

Ninety percent of backplane problems are one of three things: (1) wrong key in the peer block — happens most often when copy-pasting from a terminal and picking up a trailing newline; (2) firewall blocking WireGuard UDP — your cloud provider's security group, not nftables, is the culprit 80% of the time; (3) service still bound to 0.0.0.0 instead of the backplane address. The diagnostic sequence — wg show, ping over WG, ss -tlnp, nft list ruleset — answers all three questions. Four commands, four answers. If none of those explain the problem, dmesg | grep wireguard is the next step. The kernel logging for WireGuard is remarkably informative once debug mode is enabled — it tells you exactly why it rejected or accepted a handshake.

12. Complete Backplane Reference

A complete three-node, four-plane deployment with all components configured. Copy, adapt, and deploy.

Deployment checklist: new node to the backplane

Generate four WireGuard key pairs (one per plane)
Assign consistent last-octet IP addresses across all four planes
Add peer blocks to all existing nodes (all four planes)
Create four wg.conf files on the new node
Enable wg-quick@wg0 through wg-quick@wg3 (systemctl enable --now)
Verify handshakes on all four planes: wg show wg0; wg show wg1; wg show wg2; wg show wg3
Update /etc/nftables.conf to include new plane IPs in allowed-peers sets
Update Prometheus scrape targets (wg2 address)
Configure node_exporter to bind to wg2 address only
Configure sshd ListenAddress to wg1 address only
Update Unbound local-data records on DNS server (wg1 address)
Test: ping all plane addresses from all existing nodes
Test: SSH from existing node via wg1 address
Test: Prometheus scrapes new node via wg2 address
Verify: nmap from outside shows zero open ports

Quarterly key rotation checklist

Generate new key pairs on each node (use rotate-keys.sh from section 8)
Collect new public keys from all nodes
Update peer blocks on all nodes for each plane — do not activate yet
Schedule a maintenance window (30 minutes, all tunnels will briefly drop)
Apply new keys simultaneously (wg-quick down/up or wg syncconf)
Verify handshakes on all four planes within 60 seconds
Test SSH, Prometheus scrapes, and replication path
Rotate PSKs separately (wg genpsk, update both ends of each peer pair)
Document rotation date

Summary config: node-a, all four planes

# Quick reference — node-a addresses across all planes
wg0 (enrollment):  10.200.0.1/24, port 51820
wg1 (management):  10.201.0.1/24, port 51821
wg2 (monitoring):  10.202.0.1/24, port 51822
wg3 (storage):     10.203.0.1/24, port 51823

# Services and their planes:
sshd           → wg1   10.201.0.1:22
node_exporter  → wg2   10.202.0.1:9100
prometheus     → wg2   10.202.0.1:9090  (scrapes over wg2)
grafana        → wg1   10.201.0.1:3000
unbound (DNS)  → wg1   10.201.0.1:53
postgresql     → wg1   10.201.0.1:5432  (or wg3 for bulk replication)
syncoid        → wg3   bound to 10.203.0.1

# nftables: physical interface eth0 allows only:
# UDP 51820, 51821, 51822, 51823 — the four WireGuard ports
# Everything else: drop (no reject, no ICMP, nothing)

WireGuard Masterclass — deep dive on keys, routing, and WireGuard internals
WireGuard Mesh & Multi-Site — the mesh generator and multi-site topology patterns
Networking tutorial — VXLAN, BGP, eBPF dataplane fundamentals
nftables Masterclass — per-interface firewall rules and set-based policy
Observability Masterclass — Prometheus, Grafana, and alerting over the monitoring plane

← Packer & IaC Masterclass WireGuard Masterclass →