Masterclass

DNS Masterclass

This guide goes deep on DNS — the infrastructure layer that every network request depends on before anything else happens. Whether you are running a single kldload node at home, a multi-site fleet behind WireGuard, or a Kubernetes cluster with CoreDNS, DNS determines whether your services find each other. This page covers the full stack: how DNS actually works, recursive resolvers, authoritative servers, split-horizon, service discovery, DNSSEC, WireGuard name resolution, and production fleet architecture.

What this page covers: the DNS resolution chain from stub resolver to root servers, Unbound as a recursive resolver with DNS-over-TLS, BIND9 for authoritative internal zones, split-horizon views for internal/external routing, SRV-based service discovery, CoreDNS for Kubernetes, DNSSEC validation and zone signing, DNS for WireGuard meshes, Pi-hole and blocklist DNS, dig debugging, and a complete production DNS architecture for a kldload fleet.

1. DNS Is the Foundation of Everything

Every network request starts with a DNS query. Before your browser loads a page, before your API calls a backend, before your container pulls an image — DNS resolves a name to an IP. If DNS breaks, everything breaks. Understanding DNS means understanding the first thing that happens in every network interaction.

DNS is not just a lookup table. It is a distributed, hierarchical, cached, delegated naming system that spans the entire internet. It has been running since 1983. Every protocol you use sits on top of it. HTTP, HTTPS, SMTP, SSH, gRPC, every service mesh, every container registry — all of them start with a DNS query.

DNS is the most critical infrastructure service and the least understood. Most sysadmins can configure a static IP in their sleep but cannot explain the difference between an authoritative server and a recursive resolver. They know that "DNS went down" is bad, but they do not know what went down or why, so they cannot fix it quickly and they cannot design around it. This page fixes that. Once you understand the resolution chain, every DNS problem becomes diagnosable in under a minute with dig +trace.

2. How DNS Actually Works

DNS resolution is a chain. When your application asks "what is the IP of api.example.com?", it does not go directly to the authoritative server for example.com. It goes through a hierarchy designed so that no single server needs to know everything.

The resolution chain

client (stub resolver) | | "What is api.example.com?" v recursive resolver (your Unbound, or 1.1.1.1, or 8.8.8.8) | | "Who knows about .com?" v root servers (13 anycast groups, e.g. a.root-servers.net) | | "Ask the .com TLD servers" v .com TLD servers (e.g. a.gtld-servers.net) | | "Ask ns1.example.com — they are authoritative" v ns1.example.com (the authoritative nameserver for example.com) | | "api.example.com is 203.0.113.50" v recursive resolver (caches the answer for TTL seconds) | v client (gets the IP, opens TCP connection)

Caching at every level

Every layer caches. The recursive resolver caches the answer for the duration of the TTL (Time To Live) in the DNS record. If the TTL is 300 seconds, every query for that name in the next five minutes gets an instant answer from cache — no upstream queries at all. Your operating system has a stub resolver that may cache too. Your browser has its own DNS cache. TTL is the knob that controls the tradeoff between freshness and performance.

Low TTL (30–60s): changes propagate fast. Useful when you need to cut over quickly — before a migration, or when using DNS for failover. Cost: more queries, slightly higher latency on cache misses.

High TTL (3600s, 86400s): changes are slow to propagate. Useful for stable records — MX, NS, SPF. Benefit: almost every query hits cache. Cost: if you need to change the IP, clients will keep hitting the old one for up to 24 hours.

Record types

A / AAAA

The most common records. A maps a name to an IPv4 address. AAAA maps to IPv6. A single name can have multiple A records — clients typically try all of them (round-robin or preference order).

api.example.com. 300 IN A 203.0.113.50 api.example.com. 300 IN AAAA 2001:db8::1

CNAME

Canonical name — an alias. www.example.com CNAME example.com means "resolve www by resolving example.com." CNAMEs chain. They cannot be used at the zone apex (you cannot CNAME the root domain itself). MX and NS records cannot point to CNAMEs.

www.example.com. 300 IN CNAME example.com. # Resolving www follows to example.com, then resolves A

MX

Mail exchange — where to deliver email for a domain. Has a priority number; lowest priority wins. Multiple MX records provide redundancy. Must point to A/AAAA records, never CNAMEs.

example.com. 3600 IN MX 10 mail1.example.com. example.com. 3600 IN MX 20 mail2.example.com.

TXT

Free-form text. Used for SPF (email sender policy), DKIM (email signing key), DMARC, domain verification (Let's Encrypt DNS-01, Google Search Console), and any custom metadata you want to attach to a domain.

example.com. IN TXT "v=spf1 include:_spf.google.com ~all" _dmarc.example.com. IN TXT "v=DMARC1; p=reject"

SRV

Service record — maps a service name to a host and port. Format: priority, weight, port, target. Used for service discovery without a service mesh. _http._tcp.web.infra.local tells clients where the HTTP service lives.

_http._tcp.api.infra.local. IN SRV 0 10 8080 api01.infra.local. # priority=0, weight=10, port=8080, host=api01

NS / SOA

NS records delegate a zone to specific nameservers. SOA (Start of Authority) contains zone metadata: primary nameserver, admin email, serial number, refresh/retry/expire timers, and minimum TTL. Every zone has exactly one SOA.

example.com. IN NS ns1.example.com. example.com. IN SOA ns1 admin 2026040101 3600 900 604800 300

PTR

Pointer record — reverse DNS. Maps an IP address back to a hostname. Stored under the special in-addr.arpa (IPv4) or ip6.arpa (IPv6) domains. Used by mail servers, logging systems, and security tools to resolve IPs to names.

# Reverse for 10.100.10.50: 50.10.100.10.in-addr.arpa. IN PTR node1.infra.local.

The entire internet depends on approximately 13 root server anycast groups. There are not 13 physical machines — there are hundreds of physical servers across the world, each announcing the same anycast IP. Your query hits the closest one geographically. A query for "api.example.com" hits the root, which says "ask the .com servers," which says "ask ns1.example.com." Three hops. After the first query, caching means the next thousand queries are instant. The whole system is elegantly designed so that no one server is a bottleneck: the root only knows TLD delegations, the TLD only knows second-level delegations, and the authoritative server only knows its own zone. Hierarchical delegation scales infinitely because each level only manages its slice.

3. Recursive Resolvers on kldload

Unbound is the standard recursive resolver for kldload fleets. It is fast, secure, supports DNS-over-TLS, DNSSEC validation, and local caching. One Unbound instance on your network serves all nodes — they send queries to it, it caches results and queries upstream on cache misses.

Install Unbound

# CentOS / RHEL / Rocky
dnf install -y unbound

# Debian / Ubuntu
apt install -y unbound

# Enable and start
systemctl enable --now unbound

# Verify
dig @127.0.0.1 example.com +short

Basic configuration

# /etc/unbound/unbound.conf

server:
  # Listen on all interfaces (restrict with interface: if needed)
  interface: 0.0.0.0
  port: 53

  # Allow queries from your network
  access-control: 127.0.0.0/8 allow
  access-control: 10.0.0.0/8 allow
  access-control: 172.16.0.0/12 allow
  access-control: 192.168.0.0/16 allow

  # Cache settings
  cache-min-ttl: 60
  cache-max-ttl: 86400
  msg-cache-size: 64m
  rrset-cache-size: 128m

  # Privacy: hide server identity and version
  hide-identity: yes
  hide-version: yes

  # DNSSEC validation (see section 8)
  auto-trust-anchor-file: "/var/lib/unbound/root.key"

  # Log level (0=minimal, 2=verbose, 5=debug)
  verbosity: 1

  # Prefetch popular records before TTL expires
  prefetch: yes
  prefetch-key: yes

  # Use 0x20 bit randomization to defeat cache poisoning
  use-caps-for-id: yes

Forwarding mode vs full recursive

Unbound can operate in two modes. Full recursive (default with no forward-zone) queries root servers directly for every cache miss — completely independent, no third-party resolver in the path. Forwarding mode sends cache misses to an upstream resolver (1.1.1.1, 8.8.8.8, your ISP's resolver) and caches the results.

# Forwarding mode — send cache misses to Cloudflare
# Add to unbound.conf:

forward-zone:
  name: "."
  forward-addr: 1.1.1.1@853#cloudflare-dns.com   # DNS over TLS
  forward-addr: 1.0.0.1@853#cloudflare-dns.com
  forward-tls-upstream: yes

# Full recursive (no forward-zone block)
# Unbound queries root servers directly.
# Get the root hints file:
curl -o /etc/unbound/root.hints https://www.internic.net/domain/named.root

# Reference it in unbound.conf:
server:
  root-hints: "/etc/unbound/root.hints"

DNS over TLS upstream

# Verify DoT is working — Unbound logs the TLS handshake at verbosity: 2
# Check with:
dig @127.0.0.1 cloudflare.com +short

# If using systemd-resolved as stub on the local machine:
# Point it at Unbound instead of the default
# /etc/systemd/resolved.conf:
[Resolve]
DNS=127.0.0.1
DNSStubListener=no

# Then symlink resolv.conf:
ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf

Running your own recursive resolver means your DNS queries do not leave your network until there is a cache miss. Your ISP does not see your queries. Cloudflare and Google do not see your queries. For a fleet of kldload nodes, one Unbound instance caches resolutions for the entire fleet — faster and more private than every node querying the internet individually. Full recursive mode is the most private option: the only parties that see your queries are the authoritative servers for each domain, not a single resolver that sees everything. Forwarding with DoT is a reasonable middle ground — you trust one upstream (with TLS), gain their cache, and avoid the latency of cold recursive queries. For internal names, neither mode reaches the internet at all: stub zones (see section 4) intercept them locally.

4. Authoritative DNS with BIND or NSD

A recursive resolver answers questions by asking other servers. An authoritative server answers questions from its own data — it is the source of truth for a zone. For your internal domain (infra.local, cluster.home, kldload.internal), you need an authoritative server so that nodes can resolve each other by name.

BIND9

The most widely deployed DNS server. Does everything: recursion, authoritative serving, dynamic updates (DDNS), DNSSEC signing, views (split-horizon), zone transfers, and catalog zones. The right choice for internal infrastructure where you need DHCP integration, dynamic registration, and split-horizon views.

// BIND9: the Swiss Army knife of DNS // Full-featured, battle-tested since 1984

NSD

Authoritative-only. NSD does not do recursion — it only serves zones from disk. This makes it simpler, faster for pure authoritative serving, and harder to misconfigure as an open resolver. The right choice when you want a dedicated authoritative server and use Unbound separately for recursion.

// NSD: does one thing — serve zones — and does it fast // No recursion = no open resolver risk

Install BIND9

# CentOS / RHEL / Rocky
dnf install -y bind bind-utils

# Debian / Ubuntu
apt install -y bind9 bind9-utils dnsutils

# Enable
systemctl enable --now named

Configure BIND9 as authoritative for infra.local

# /etc/named.conf  (CentOS path — Debian uses /etc/bind/named.conf)

options {
    directory "/var/named";
    listen-on { 127.0.0.1; 10.100.10.1; };   # your DNS server's IPs
    listen-on-v6 { none; };

    # This server is authoritative only — no recursion for external queries
    recursion no;
    allow-query { 10.0.0.0/8; 172.16.0.0/12; 192.168.0.0/16; 127.0.0.0/8; };

    # Allow dynamic updates from localhost (for nsupdate / DHCP)
    allow-update { 127.0.0.1; };

    # Disable zone transfers except to secondary
    allow-transfer { none; };

    # DNSSEC (section 8)
    dnssec-validation auto;
};

# Forward zone — infra.local
zone "infra.local" IN {
    type master;
    file "/var/named/infra.local.zone";
    allow-update { 127.0.0.1; 10.100.10.0/24; };
};

# Reverse zone — 10.100.10.x
zone "10.100.10.in-addr.arpa" IN {
    type master;
    file "/var/named/10.100.10.rev";
    allow-update { 127.0.0.1; 10.100.10.0/24; };
};

Zone file: infra.local with all kldload nodes

# /var/named/infra.local.zone
$TTL 300
@   IN SOA  ns1.infra.local.  admin.infra.local. (
              2026040101  ; serial (YYYYMMDDnn — increment on every change)
              3600        ; refresh
              900         ; retry
              604800      ; expire
              300 )       ; minimum TTL

; Nameservers
@       IN NS   ns1.infra.local.

; DNS server itself
ns1     IN A    10.100.10.1

; kldload nodes
node1   IN A    10.100.10.10
node2   IN A    10.100.10.11
node3   IN A    10.100.10.12
node4   IN A    10.100.10.13
node5   IN A    10.100.10.14

; Service aliases
monitor IN A    10.100.10.10    ; Grafana/Prometheus lives on node1
api     IN A    10.100.10.50    ; load balancer VIP
db      IN A    10.100.10.20    ; database primary
db-replica IN A 10.100.10.21   ; database replica

; WireGuard interface names (optional)
node1-wg IN A  10.200.0.1
node2-wg IN A  10.200.0.2
node3-wg IN A  10.200.0.3

# /var/named/10.100.10.rev  (reverse zone)
$TTL 300
@   IN SOA  ns1.infra.local.  admin.infra.local. (
              2026040101 3600 900 604800 300 )

@       IN NS   ns1.infra.local.

; PTR records — IP last-octet → hostname
1   IN PTR  ns1.infra.local.
10  IN PTR  node1.infra.local.
11  IN PTR  node2.infra.local.
12  IN PTR  node3.infra.local.
20  IN PTR  db.infra.local.
50  IN PTR  api.infra.local.

# Reload zones after editing
rndc reload

# Check zone syntax before reloading
named-checkzone infra.local /var/named/infra.local.zone
named-checkconf /etc/named.conf

# Test
dig @10.100.10.1 node1.infra.local +short
dig @10.100.10.1 -x 10.100.10.10 +short

BIND does everything — recursion, authoritative, dynamic updates, DNSSEC. NSD does one thing — serve authoritative zones — and does it faster than BIND for that specific use case. For an internal domain (infra.local, cluster.home), BIND9 is the right choice because you will want dynamic updates from DHCP and split-horizon views. You can also run Unbound for recursion and BIND for authoritative serving side by side: configure Unbound with a stub-zone that forwards queries for infra.local to BIND, and BIND handles everything authoritative while Unbound handles everything else. The two are designed to work together.

5. Split-Horizon DNS

Split-horizon (also called split-brain DNS) means the same domain name resolves differently depending on who asks. External clients get a public IP. Internal clients get a private IP. One domain, two answers, controlled by the DNS server based on the source address of the query.

This is how every production deployment works. Your API load balancer has a public IP for the internet and a private IP for internal services. Without split-horizon, internal services hairpin through the public IP — their packets leave the server, hit your firewall, and come back in. With split-horizon, internal traffic stays on the LAN.

BIND9 views configuration

# /etc/named.conf — split-horizon with views

acl "internal" {
    10.0.0.0/8;
    172.16.0.0/12;
    192.168.0.0/16;
    127.0.0.0/8;
};

acl "external" {
    any;
};

# INTERNAL VIEW — seen by LAN clients
view "internal" {
    match-clients { internal; };
    recursion yes;                      # allow recursion for internal clients

    zone "example.com" IN {
        type master;
        file "/var/named/example.com.internal.zone";
    };

    # Forward everything else to Unbound for recursion
    zone "." IN {
        type forward;
        forwarders { 10.100.10.1 port 5300; };   # Unbound on alt port
    };
};

# EXTERNAL VIEW — seen by everyone else
view "external" {
    match-clients { external; };
    recursion no;                       # no recursion for external clients

    zone "example.com" IN {
        type master;
        file "/var/named/example.com.external.zone";
    };
};

Zone files with different answers

# /var/named/example.com.internal.zone
$TTL 300
@   IN SOA ns1.example.com. admin.example.com. (2026040101 3600 900 604800 300)
@   IN NS  ns1.example.com.

; Internal clients get private IPs — traffic stays on LAN
api     IN A    10.100.10.50    ; private load balancer VIP
www     IN A    10.100.10.51    ; private web server
db      IN A    10.100.10.20    ; internal DB (never exposed externally)

# /var/named/example.com.external.zone
$TTL 300
@   IN SOA ns1.example.com. admin.example.com. (2026040101 3600 900 604800 300)
@   IN NS  ns1.example.com.

; External clients get the public IP
api     IN A    203.0.113.10    ; public load balancer
www     IN A    203.0.113.10    ; same public IP, different vhost
; db has no external record — it simply does not exist outside

# Verify split-horizon is working:

# From inside the network (should get 10.100.10.50)
dig @10.100.10.1 api.example.com +short

# From outside (or simulate with a public resolver)
dig @8.8.8.8 api.example.com +short

Split-horizon is how every production deployment works. Without it, internal services that call api.example.com send traffic to the public IP, the firewall translates it back to the private IP (NAT hairpin), and it arrives at the server — adding a round trip through the firewall for every internal call. With split-horizon, the resolver returns the private IP directly and the traffic never leaves the LAN. The performance improvement is real. The security improvement is also real: internal services that should never be reachable externally simply have no external DNS record. They are not firewalled out — they literally do not exist in the external view. An attacker scanning your public IP range gets no hostname resolution and no indication those services exist.

6. Service Discovery with DNS

DNS-based service discovery predates Consul, Kubernetes, and every service mesh that has ever been built. SRV records encode both the host and the port for a service. A client queries _http._tcp.api.infra.local and gets back an IP and port — no configuration files, no hardcoded ports, no service registry daemon required.

SRV records for service discovery

# In /var/named/infra.local.zone, add SRV records:

; _service._proto.name  TTL  IN SRV  priority  weight  port  target
_http._tcp.api      IN SRV  0 10 8080  api01.infra.local.
_http._tcp.api      IN SRV  0 10 8080  api02.infra.local.   ; second instance
_grpc._tcp.api      IN SRV  0 10 9090  api01.infra.local.
_https._tcp.grafana IN SRV  0 10 3000  monitor.infra.local.

# Query SRV records
dig _http._tcp.api.infra.local SRV

# Output:
# _http._tcp.api.infra.local.  300  IN  SRV  0 10 8080 api01.infra.local.
# _http._tcp.api.infra.local.  300  IN  SRV  0 10 8080 api02.infra.local.

# The client reads the SRV records, picks one (weight-based), resolves the
# target A record, and connects to host:port. No service registry needed.

Dynamic DNS updates with nsupdate

# Register a service on boot using nsupdate
# This sends a dynamic update to BIND9

nsupdate << EOF
server 10.100.10.1
zone infra.local
update add api03.infra.local 300 A 10.100.10.53
update add _http._tcp.api.infra.local 300 SRV 0 10 8080 api03.infra.local.
send
EOF

kldload firstboot integration

# /etc/kldload/firstboot.d/50-register-dns.sh
# Runs once on first boot after install. Registers this node in DNS.

HOSTNAME=$(hostname -s)
IP=$(ip -4 addr show eth0 | awk '/inet /{print $2}' | cut -d/ -f1)
DNS_SERVER="10.100.10.1"
ZONE="infra.local"

nsupdate << EOF
server ${DNS_SERVER}
zone ${ZONE}
update delete ${HOSTNAME}.${ZONE} A
update add ${HOSTNAME}.${ZONE} 300 A ${IP}
send
EOF

echo "Registered ${HOSTNAME}.${ZONE} → ${IP}"

Consul DNS interface

# If using Consul for service discovery, Consul serves DNS on port 8600
# Configure Unbound to forward *.consul queries to Consul:

# /etc/unbound/unbound.conf
stub-zone:
  name: "consul"
  stub-addr: 127.0.0.1@8600

# Now "dig web.service.consul" resolves to all healthy web instances
# Consul returns only healthy instances — unhealthy ones are removed from DNS
dig web.service.consul
dig _http._tcp.web.service.consul SRV

Before Kubernetes and Consul, there was DNS-based service discovery. It still works. An SRV record says "the HTTP service for api.infra.local is at 10.100.10.50 port 8080." The client queries DNS, gets the IP and port, connects directly. No service mesh, no sidecar, no orchestrator. For a kldload fleet without Kubernetes, DNS-SD is the simplest service discovery that actually works. The limitation is that DNS caches — if a service goes down, clients that cached the SRV record will still try to connect until TTL expires. Consul solves this with health checking: it removes unhealthy instances from DNS responses in real time. For a small fleet with stable services, pure DNS is fine. For dynamic fleets, Consul plus DNS is the right answer.

7. CoreDNS and Kubernetes DNS

Every Kubernetes cluster runs CoreDNS as its in-cluster DNS server. Every pod gets /etc/resolv.conf configured to point at the CoreDNS ClusterIP. Every service gets a DNS name automatically. Understanding how CoreDNS works lets you customize it — forward external queries through your Unbound resolver, integrate with your internal BIND server, and tune caching.

How pod DNS resolution works

# Inside a pod, /etc/resolv.conf looks like:
nameserver 10.96.0.10      # CoreDNS ClusterIP
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

# DNS names for services follow this pattern:
# <service>.<namespace>.svc.cluster.local

# Examples:
# Service "web" in namespace "production":
dig web.production.svc.cluster.local

# Same namespace — short name works due to search domains:
dig web

# Pod-to-pod (less common, usually use service DNS):
# <pod-ip-dashes>.<namespace>.pod.cluster.local
dig 10-100-1-5.production.pod.cluster.local

Inspect CoreDNS configuration

# CoreDNS config lives in a ConfigMap
kubectl -n kube-system get configmap coredns -o yaml

# Default Corefile looks like:
# .:53 {
#     errors
#     health
#     ready
#     kubernetes cluster.local in-addr.arpa ip6.arpa {
#         pods insecure
#         fallthrough in-addr.arpa ip6.arpa
#     }
#     prometheus :9153
#     forward . /etc/resolv.conf      ← external queries go HERE
#     cache 30
#     loop
#     reload
#     loadbalance
# }

Forward external queries to your Unbound instance

# Edit the CoreDNS ConfigMap
kubectl -n kube-system edit configmap coredns

# Change the forward line to point at your Unbound:
# forward . 10.100.10.1 {
#     prefer_udp
# }

# Or use the kubectl patch approach:
kubectl -n kube-system patch configmap coredns --type merge -p '
{
  "data": {
    "Corefile": ".:53 {\n    errors\n    health\n    ready\n    kubernetes cluster.local in-addr.arpa ip6.arpa {\n        pods insecure\n        fallthrough in-addr.arpa ip6.arpa\n    }\n    prometheus :9153\n    forward . 10.100.10.1\n    cache 30\n    loop\n    reload\n    loadbalance\n}\n"
  }
}'

# CoreDNS reloads automatically when the ConfigMap changes

Stub domains — forward *.infra.local to your BIND server

# Add a stub-zone block so pods can resolve your internal domain
# Edit the CoreDNS ConfigMap and add a new server block:

# .:53 {
#     ... existing config ...
# }
#
# infra.local:53 {
#     errors
#     cache 30
#     forward . 10.100.10.1    ← forward infra.local queries to BIND
# }

# After this, from any pod:
dig node1.infra.local          # resolves via BIND
dig api.infra.local            # resolves via BIND
dig google.com                 # resolves via Unbound → internet

Every Kubernetes pod gets DNS automatically via CoreDNS. But by default, external queries go to whatever /etc/resolv.conf says on the node — often a cloud provider's resolver or the host's systemd-resolved. On a kldload Kubernetes cluster, configure CoreDNS to forward to your Unbound instance so all DNS goes through your resolver, cached and private. Your nodes already use Unbound. Your pods should too. The stub-zone trick is particularly powerful: pods can resolve your internal infra.local names and external internet names through a single DNS infrastructure, and your BIND server is the single source of truth for internal names regardless of whether the query came from a bare-metal service, a VM, or a Kubernetes pod.

8. DNSSEC — Signed DNS

DNSSEC adds cryptographic signatures to DNS records. A resolver that validates DNSSEC can prove that the answer it received is authentic — it came from the authoritative server and was not modified in transit. DNS spoofing, cache poisoning, and BGP hijacking attacks that redirect DNS cannot forge a valid DNSSEC signature.

Enable DNSSEC validation in Unbound

# Unbound validates DNSSEC by default when auto-trust-anchor-file is set
# /etc/unbound/unbound.conf:

server:
  auto-trust-anchor-file: "/var/lib/unbound/root.key"
  val-log-level: 2          # log DNSSEC validation failures

# Initialize the trust anchor (done automatically by unbound-anchor on install)
unbound-anchor -a /var/lib/unbound/root.key

# Test: query a DNSSEC-signed domain
dig @127.0.0.1 cloudflare.com +dnssec

# Look for "ad" flag in the flags line:
# ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
# "ad" = Authenticated Data — DNSSEC validation passed

# Test validation failure (should get SERVFAIL):
dig @127.0.0.1 dnssec-failed.org +dnssec

Sign your own zones with BIND9

# Step 1: Generate zone signing keys (ZSK and KSK)
cd /var/named

# Key Signing Key (KSK) — signs the DNSKEY records themselves
dnssec-keygen -a ECDSAP256SHA256 -f KSK infra.local
# Produces Kinfra.local.+013+XXXXX.key and .private

# Zone Signing Key (ZSK) — signs all other records
dnssec-keygen -a ECDSAP256SHA256 infra.local

# Step 2: Include keys in the zone file
# Add to /var/named/infra.local.zone:
$INCLUDE Kinfra.local.+013+YYYYY.key   ; KSK public key
$INCLUDE Kinfra.local.+013+ZZZZZ.key   ; ZSK public key

# Step 3: Sign the zone
dnssec-signzone -A -3 $(head -c 1000 /dev/random | sha1sum | cut -b 1-16) \
  -N INCREMENT -o infra.local -t \
  /var/named/infra.local.zone

# Produces infra.local.zone.signed
# Update named.conf to use the signed file:
zone "infra.local" IN {
    type master;
    file "/var/named/infra.local.zone.signed";
};

Inline signing (easier — BIND manages keys automatically)

# Modern BIND9 (9.16+) supports inline signing — much simpler
# named.conf:
zone "infra.local" IN {
    type master;
    file "/var/named/infra.local.zone";
    inline-signing yes;
    auto-dnssec maintain;                # BIND generates and rotates keys
    key-directory "/var/named/keys/";
};

# BIND creates the keys, signs the zone, and handles key rollover automatically
mkdir -p /var/named/keys
rndc loadkeys infra.local
rndc sign infra.local

Key rollover

# BIND's auto-dnssec maintain handles ZSK rollover automatically.
# For KSK rollover (requires DS record update at parent):

# Generate new KSK
dnssec-keygen -a ECDSAP256SHA256 -f KSK infra.local

# Copy to key directory
cp Kinfra.local.+013+NEWKEY.key /var/named/keys/
cp Kinfra.local.+013+NEWKEY.private /var/named/keys/

# Tell BIND to use it (it will pre-publish, then activate after a delay)
rndc loadkeys infra.local

DNSSEC prevents DNS spoofing — an attacker cannot return a fake IP for your domain if the zone is signed and the resolver validates signatures. Enabling validation in Unbound is one config line. Signing your own internal zones is more work but worth it for any domain serving production traffic, especially in environments where DNS traffic might traverse untrusted networks (WireGuard tunnels, cloud inter-region links). The weak link is key management: if you lose the private key, you cannot re-sign, and validation will fail until you regenerate and re-publish. Store private keys in a secrets manager (Vault, SOPS, age-encrypted ZFS dataset) and back them up. BIND9's inline signing mode eliminates most of the operational complexity — it handles key generation, rotation, and signature refresh automatically. Enable it and forget about it.

9. DNS for WireGuard Networks

WireGuard gives you encrypted IP connectivity between nodes. It does not give you name resolution. After building a WireGuard mesh, every operator immediately hits the same problem: "I have to remember IP addresses." DNS fixes this. The solution is straightforward: run a DNS server on one node in the mesh, configure all peers to use it, and give every node a hostname.

The problem

# After WireGuard setup, you have IPs like:
# node1: 10.200.0.1
# node2: 10.200.0.2
# db:    10.200.0.10

# You SSH to: ssh todd@10.200.0.10   (who is this?)
# You want:   ssh todd@db.wg         (obvious)

Solution 1: dnsmasq on the WireGuard gateway

# Install dnsmasq on the WG hub node
dnf install -y dnsmasq   # CentOS/Rocky/RHEL
apt install -y dnsmasq   # Debian/Ubuntu

# /etc/dnsmasq.conf
interface=wg0                           # listen on WG interface
bind-interfaces
no-dhcp-interface=wg0                   # no DHCP, DNS only
domain=wg                               # short-name domain: "node1.wg"
local=/wg/                              # serve wg zone locally

# Static assignments
address=/node1.wg/10.200.0.1
address=/node2.wg/10.200.0.2
address=/node3.wg/10.200.0.3
address=/db.wg/10.200.0.10
address=/monitor.wg/10.200.0.20

# Reverse DNS
ptr-record=1.0.200.10.in-addr.arpa,node1.wg
ptr-record=10.0.200.10.in-addr.arpa,db.wg

systemctl enable --now dnsmasq

Configure all WireGuard peers to use the DNS server

# In each peer's /etc/wireguard/wg0.conf:
[Interface]
Address = 10.200.0.2/24
PrivateKey = ...
DNS = 10.200.0.1           # hub node runs dnsmasq
                           # this sets DNS for the wg0 interface when it comes up

# The DNS line in wg-quick configs sets resolv.conf when the tunnel comes up.
# It is removed when the tunnel goes down.

Solution 2: Unbound with WireGuard-specific zones

# On your existing Unbound resolver, add a local-zone for the WG subnet:
# /etc/unbound/unbound.conf

local-zone: "wg." static
local-data: "node1.wg. A 10.200.0.1"
local-data: "node2.wg. A 10.200.0.2"
local-data: "node3.wg. A 10.200.0.3"
local-data: "db.wg.    A 10.200.0.10"

# Reverse DNS
local-zone: "0.200.10.in-addr.arpa." static
local-data-ptr: "10.200.0.1 node1.wg"
local-data-ptr: "10.200.0.10 db.wg"

systemctl restart unbound

Dynamic registration on WireGuard interface up

# /etc/wireguard/wg0-up.sh — called by PostUp in wg0.conf
# Registers this node in BIND when the WG interface comes up

WG_IP=$(ip addr show wg0 | awk '/inet /{print $2}' | cut -d/ -f1)
HOSTNAME=$(hostname -s)
DNS_SERVER="10.100.10.1"

nsupdate << EOF
server ${DNS_SERVER}
zone infra.local
update delete ${HOSTNAME}-wg.infra.local A
update add ${HOSTNAME}-wg.infra.local 300 A ${WG_IP}
send
EOF

# In /etc/wireguard/wg0.conf:
[Interface]
PostUp = /etc/wireguard/wg0-up.sh
PreDown = nsupdate -l <<< "update delete $(hostname -s)-wg.infra.local A"

The most common complaint after setting up a WireGuard mesh: "I have to remember IP addresses." DNS fixes this. Run a lightweight dnsmasq or Unbound on one node, configure all WireGuard peers to use it as DNS, and suddenly you can SSH to db.wg instead of 10.200.0.47. For a static fleet (nodes do not change often), dnsmasq with static address lines is the simplest possible solution — three config lines per node. For a dynamic fleet where nodes join and leave regularly, dynamic DNS updates via nsupdate on WireGuard PostUp are more appropriate. The WireGuard DNS= line in wg-quick configs is underused — it sets the resolver for the tunnel interface automatically when you bring the tunnel up. Use it.

10. Pi-hole and Blocklist DNS

DNS-level ad and tracker blocking works by resolving known ad/tracker domains to 0.0.0.0 or NXDOMAIN instead of their real IPs. The client tries to connect, gets nothing, and the ad never loads. This works for every device and every application on your network — phones, TVs, IoT devices, anything that uses DNS.

Install Pi-hole

# Pi-hole is a DNS sinkhole — it runs a modified dnsmasq with blocklists
# Install on a dedicated node or alongside other services

# Quick install (requires curl)
curl -sSL https://install.pi-hole.net | bash

# Or use the containerized version:
podman run -d --name pihole \
  -p 53:53/udp -p 53:53/tcp \
  -p 8080:80 \
  -e TZ="America/Toronto" \
  -e WEBPASSWORD="changeme" \
  -v pihole_data:/etc/pihole \
  -v dnsmasq_data:/etc/dnsmasq.d \
  --restart=unless-stopped \
  pihole/pihole:latest

Point all kldload nodes at Pi-hole

# Via NetworkManager (persistent across reboots)
nmcli con mod "Wired connection 1" ipv4.dns "10.100.10.5"
nmcli con mod "Wired connection 1" ipv4.ignore-auto-dns yes
nmcli con up "Wired connection 1"

# Verify
cat /etc/resolv.conf
# nameserver 10.100.10.5

# Or set fleet-wide via DHCP server (dnsmasq/ISC-DHCP)
# dhcp-option=6,10.100.10.5   # option 6 = DNS server

Unbound with blocklists (no Pi-hole required)

# Download a blocklist in Unbound format
# (rpz-zone or local-zone: entries)
curl -o /etc/unbound/blocklist.conf \
  https://raw.githubusercontent.com/nicehash/unbound-blocklist/main/blocklist.conf

# /etc/unbound/unbound.conf:
include: "/etc/unbound/blocklist.conf"

# Blocklist entries look like:
# local-zone: "ads.example.com" always_nxdomain
# local-zone: "tracker.example.net" always_nxdomain

# Auto-update the blocklist weekly:
cat > /etc/cron.weekly/update-unbound-blocklist << 'EOF'
#!/bin/bash
curl -s -o /etc/unbound/blocklist.conf \
  https://raw.githubusercontent.com/nicehash/unbound-blocklist/main/blocklist.conf
systemctl reload unbound
EOF
chmod +x /etc/cron.weekly/update-unbound-blocklist

DNS-level blocking works for every device and every application without installing anything on the client. One Unbound instance with a blocklist protects the entire network — including IoT devices that cannot run ad blockers, smart TVs that phone home, printers that call back to manufacturer servers, and any application that respects DNS (which is all of them, because they have to resolve before they can connect). Pi-hole adds a nice web UI for managing blocklists and seeing query statistics. For a kldload fleet, Unbound with a local-zone blocklist is more integrated — you already have Unbound running, just add the blocklist. Pi-hole is better if you want the management UI or if you want to hand-allow/block individual domains per-device.

11. DNS Debugging

dig is the essential DNS debugging tool. Learn it. Everything else (nslookup, host) is a simplified wrapper that hides information you need. dig shows you the full DNS response, flags, TTL, and answer section exactly as returned by the server.

Essential dig commands

# Basic lookup
dig example.com

# Short answer only
dig example.com +short

# Query a specific server
dig @10.100.10.1 node1.infra.local

# Query for a specific record type
dig example.com MX
dig example.com TXT
dig example.com AAAA
dig example.com NS
dig example.com SOA

# Reverse DNS lookup
dig -x 10.100.10.10

# Show query time and server used
dig example.com +stats

# Disable recursion (ask the server what it knows directly — useful for auth servers)
dig @ns1.example.com example.com +norec

dig +trace — the most powerful debugging command

# +trace follows the entire resolution chain from root servers down
# If any step fails, you see exactly where
dig api.example.com +trace

# Example output:
# .                          518359  IN  NS  a.root-servers.net.
# a.root-servers.net.        1234    IN  A   198.41.0.4
#
# com.                       172800  IN  NS  a.gtld-servers.net.
# [Received 1174 bytes from 198.41.0.4 in 12 ms]
#
# example.com.               172800  IN  NS  ns1.example.com.
# [Received 512 bytes from 192.5.6.30 in 8 ms]
#
# api.example.com.           300     IN  A   203.0.113.50
# [Received 68 bytes from 205.251.196.1 in 2 ms]
#
# Three hops: root → .com TLD → authoritative. Total: 22ms.
# If any step returned SERVFAIL or no response, you see exactly which hop failed.

DNSSEC debugging

# Check DNSSEC validation
dig cloudflare.com +dnssec
# Look for "ad" flag — Authenticated Data

# Use drill for DNSSEC chain verification (install ldns-utils)
drill -D -k /var/lib/unbound/root.key cloudflare.com

# Test a known-bad DNSSEC domain
dig @127.0.0.1 dnssec-failed.org
# Should return SERVFAIL because signatures are intentionally broken

tcpdump for DNS

# Capture all DNS traffic (port 53)
tcpdump -n port 53

# More readable — decode DNS queries and responses
tcpdump -n -i eth0 'udp port 53' -v

# Capture DNS to a file for analysis
tcpdump -n -i any port 53 -w /tmp/dns.pcap

# Count DNS queries per second (useful during debug)
tcpdump -n -i any port 53 2>/dev/null | \
  awk '{print $1}' | cut -d. -f1 | uniq -c | sort -rn | head -20

Common DNS error codes

RCODE	Meaning	Likely cause
NOERROR	Query succeeded	Normal. Check the ANSWER section for the actual records.
NXDOMAIN	Name does not exist	Typo in the name, missing DNS record, wrong zone, split-horizon not configured.
SERVFAIL	Server failed to resolve	Upstream resolver unreachable, DNSSEC validation failure, broken delegation, expired zone.
REFUSED	Server refused the query	ACL blocked the source IP. Check `access-control` in Unbound or `allow-query` in BIND.
NODATA	Name exists but no records of that type	Querying for AAAA on a v4-only host, or MX for a domain with no mail config.
Timeout	No response	Firewall blocking UDP/53, DNS server down, wrong IP, network unreachable.

dig +trace is the most powerful DNS debugging command. It shows the full resolution chain from root servers to authoritative, including every referral and every response time. "My DNS does not work" — run dig +trace api.example.com and the answer is in the output. If the root hop succeeds but the TLD hop fails, the TLD servers are unreachable from your network. If TLD succeeds but the authoritative hop fails, the NS records are wrong or your authoritative server is down. If the authoritative hop succeeds but returns the wrong IP, you have a split-horizon misconfiguration or a stale record. The debug loop is: dig +trace → find the failing hop → fix that hop → repeat. It takes two minutes once you know the tool.

12. Production DNS Architecture for a kldload Fleet

Putting it all together. A kldload fleet needs: fast recursive resolution with caching, authoritative service for internal zones, split-horizon for external domains, DNSSEC validation, and fallback for resolver failure. Here is the complete architecture and the concrete configs to build it.

Architecture overview

┌─────────────────────────────────────────┐ │ kldload Fleet DNS │ └─────────────────────────────────────────┘ All nodes Unbound (node1, node2 — HA pair) /etc/resolv.conf port 53 → 10.100.10.10 ├── cache: all queries → 10.100.10.11 ├── forward *.infra.local → BIND (10.100.10.1) ├── forward *.consul → Consul (127.0.0.1:8600) ├── validate DNSSEC └── forward . → 1.1.1.1@853 (DoT) [cache misses] BIND9 (node1 — authoritative for internal) ├── infra.local (A, AAAA, PTR, SRV, MX) ├── wg.infra.local (WireGuard addresses) ├── view "internal" → private IPs └── view "external" → public IPs (for split-horizon) External traffic → public authoritative (Cloudflare DNS, Route53, etc.) → DNSSEC-signed external zone

Unbound configuration for the fleet resolver

# /etc/unbound/unbound.conf on node1 and node2 (identical config)

server:
  interface: 0.0.0.0
  port: 53
  access-control: 10.0.0.0/8 allow
  access-control: 172.16.0.0/12 allow
  access-control: 192.168.0.0/16 allow
  access-control: 127.0.0.0/8 allow
  access-control: 0.0.0.0/0 refuse

  # Cache
  cache-min-ttl: 60
  cache-max-ttl: 86400
  msg-cache-size: 128m
  rrset-cache-size: 256m
  prefetch: yes
  prefetch-key: yes

  # Privacy and security
  hide-identity: yes
  hide-version: yes
  use-caps-for-id: yes
  harden-glue: yes
  harden-dnssec-stripped: yes

  # DNSSEC validation
  auto-trust-anchor-file: "/var/lib/unbound/root.key"
  val-log-level: 2

  # Local zone overrides (blocklist entries go here if not using pihole)
  # local-zone: "ads.example.com" always_nxdomain

# Forward internal domains to BIND9
stub-zone:
  name: "infra.local"
  stub-addr: 10.100.10.1@53

stub-zone:
  name: "10.100.10.in-addr.arpa"
  stub-addr: 10.100.10.1@53

stub-zone:
  name: "0.200.10.in-addr.arpa"
  stub-addr: 10.100.10.1@53

# Forward Consul service discovery queries
stub-zone:
  name: "consul"
  stub-addr: 127.0.0.1@8600

# Forward everything else to upstream via DoT
forward-zone:
  name: "."
  forward-addr: 1.1.1.1@853#cloudflare-dns.com
  forward-addr: 1.0.0.1@853#cloudflare-dns.com
  forward-tls-upstream: yes

BIND9 named.conf for internal zones

# /etc/named.conf on the authoritative node

acl "internal" {
  10.0.0.0/8; 172.16.0.0/12; 192.168.0.0/16; 127.0.0.0/8;
};

options {
  directory "/var/named";
  listen-on { 10.100.10.1; 127.0.0.1; };
  recursion no;
  allow-query { internal; };
  allow-transfer { none; };
  dnssec-validation auto;
};

# Internal-only zones (no split-horizon needed — purely internal)
zone "infra.local" IN {
  type master;
  file "/var/named/infra.local.zone";
  allow-update { 127.0.0.1; 10.100.10.0/24; };
};

zone "10.100.10.in-addr.arpa" IN {
  type master;
  file "/var/named/10.100.10.rev";
  allow-update { 127.0.0.1; 10.100.10.0/24; };
};

zone "0.200.10.in-addr.arpa" IN {
  type master;
  file "/var/named/10.200.0.rev";    ; WireGuard reverse zone
};

# Split-horizon for the external domain
view "internal" {
  match-clients { internal; };
  zone "example.com" IN {
    type master;
    file "/var/named/example.com.internal.zone";
  };
};

view "external" {
  match-clients { any; };
  recursion no;
  zone "example.com" IN {
    type master;
    file "/var/named/example.com.external.zone";
  };
};

Point all nodes at the resolvers via NetworkManager

# Run this in a kldload postinstaller or firstboot script on every node

# Primary resolver: node1, secondary: node2 (failover)
nmcli con mod "$(nmcli -t -f NAME con show --active | head -1)" \
  ipv4.dns "10.100.10.10 10.100.10.11" \
  ipv4.ignore-auto-dns yes

nmcli con up "$(nmcli -t -f NAME con show --active | head -1)"

# Verify
resolvectl status
# DNS Servers: 10.100.10.10
#              10.100.10.11

High availability: keepalived for the resolver VIP

# Run keepalived on node1 and node2 to provide a single VIP for DNS
# All nodes point at 10.100.10.100 (the VIP)

# /etc/keepalived/keepalived.conf on node1 (MASTER):
vrrp_instance DNS_VIP {
  state MASTER
  interface eth0
  virtual_router_id 53
  priority 200
  advert_int 1
  virtual_ipaddress {
    10.100.10.100/24
  }
  track_script {
    chk_unbound
  }
}

vrrp_script chk_unbound {
  script "dig @127.0.0.1 cloudflare.com +short > /dev/null 2>&1"
  interval 5
  weight -100
}

# If node1's Unbound fails, the VIP moves to node2 automatically.
# All nodes still resolve DNS — they just hit node2 instead.

The complete picture: every kldload node sends DNS queries to the VIP 10.100.10.100. The VIP is owned by node1 (or node2 on failover). Unbound on node1 checks its cache first — most queries return from cache in under 1ms. Cache misses for internal names (infra.local) go to BIND9 on the same node. Cache misses for external names go to Cloudflare over DNS-over-TLS — encrypted, private, cached for future queries. DNSSEC validation runs on every external response. Split-horizon means internal calls to api.example.com get the private LAN IP, not the public IP. Kubernetes pods forward through CoreDNS to this same Unbound, so cluster DNS and node DNS share the same cache and the same private resolver. WireGuard peers get DNS names via stub zones in Unbound that delegate to BIND. The whole fleet — bare-metal nodes, VMs, Kubernetes pods, WireGuard peers — resolves names through one consistent, cached, validated DNS infrastructure.

Networking tutorial — VLANs, bonding, BGP, and the network stack that DNS sits on
WireGuard Masterclass — the mesh that needs DNS names
WireGuard Mesh & Multi-Site — multi-site WireGuard with per-site DNS
Kubernetes on KVM — the cluster where CoreDNS runs
Cilium Masterclass — L7 DNS policy enforcement at the eBPF layer
Monitoring Stack Glossary (355 terms) Help & Links — Unbound metrics in Prometheus and Grafana
Security — DNS security: DNSSEC, response policy zones, DNS-over-TLS

← Observability Masterclass WireGuard Masterclass →