Documentation

nftables Masterclass

This guide covers nftables from first principles to production deployment — the unified packet classification framework that replaced iptables, ip6tables, arptables, and ebtables across every modern Linux distro. If you are running kldload for KVM hosting, WireGuard plane isolation, container networking, or anything that touches the network stack, this is the firewall layer underneath all of it.

What this page covers: nftables architecture and terminology, writing rules from scratch, sets and maps for scalable policy, NAT (masquerade, DNAT, SNAT), per-interface trust boundaries, connection tracking internals, rate limiting and brute-force protection, logging and packet tracing, atomic persistence, Salt and Ansible automation, a complete kldload KVM host ruleset, and a troubleshooting reference.

Prerequisites: basic Linux networking familiarity (interfaces, IP addresses, ports). No prior iptables knowledge required — and if you have it, you will find nftables is a cleaner model.

1. nftables replaced iptables

nftables is the Linux kernel's packet classification framework. It replaced iptables, ip6tables, arptables, and ebtables with one unified tool. Every modern distro ships it. CentOS Stream 9, Debian 13, Ubuntu 24.04, Fedora 41, Rocky Linux 9 — all of them use nftables as the kernel-side firewall layer. If you're still writing iptables rules, you're writing for a deprecated API.

nftables is not a reimplementation of iptables with a different syntax. It is a fundamentally different architecture: user-defined tables and chains, a single expression language for both matching and actions, native sets and maps built into the kernel, and atomic ruleset replacement as a first-class operation. The old model was a fixed set of tables (filter, nat, mangle, raw) with fixed hook points and separate tools for IPv4, IPv6, ARP, and bridge. nftables replaces all of that with one address-family-aware framework.

iptables is technically still in the kernel but runs through a compatibility layer (xtables) that translates to nftables internally. On most modern distros, iptables is actually iptables-nft — a frontend that converts your commands into nftables operations. Writing iptables rules means writing for a translation layer on top of the real thing. Write nftables directly. You get cleaner syntax, better performance, and access to features — sets, maps, meters — that have no iptables equivalent.

2. nftables vs iptables vs firewalld

Understanding the relationship between these three is essential before you write a single rule.

iptables

Fixed table and chain layout (filter/nat/mangle/raw). Separate tools for IPv4, IPv6, ARP, bridge. Rules are match-and-target: each rule tests a condition and, on match, takes one action. Chains are linear lists — every packet walks the list until a rule matches. No native data structures. Adding a rule is a syscall that replaces the entire ruleset.

// Architecture: fixed schema, sequential scan // Scale: O(n) per packet, O(n) per rule change

nftables

User-defined tables and chains. One tool for all address families. Rules are expressions: multiple match conditions and multiple actions in a single rule. Built-in sets (hash tables, radix trees, intervals) for O(1) IP lookups. Atomic ruleset replacement — one syscall loads the entire file. Native rate meters, verdict maps, and connection tracking integration.

// Architecture: user-defined schema, kernel data structures // Scale: O(1) set lookups, atomic batch updates

firewalld

A zone-based frontend that generates nftables (or iptables-nft) rules underneath. Zones map interfaces and source addresses to trust levels. Works well for simple allow/deny rules on desktop and server installs. Not suited for complex per-interface policy, dynamic sets, rate limiting, or custom NAT configurations.

// firewalld is a layer on top of nftables // It writes nftables rules so you don't have to // Until you need something it can't express

Feature	iptables	nftables	firewalld
IPv4 + IPv6	Separate tools	Unified (`inet` family)	Unified (via nftables)
Large IP lists	O(n) per packet	O(1) hash set	Via ipset (awkward)
Atomic load	No (sequential)	Yes (`nft -f`)	Yes (via nftables)
Rate limiting	hashlimit module	Native meters	Not exposed
Per-interface policy	-i / -o match	iifname / oifname	Zones (coarse)
Status	Deprecated API	Current standard	Active (uses nftables)

firewalld is fine for simple port allow/deny on a workstation or basic server. But the moment you need per-interface policies, rate limiting, connection tracking manipulation, or sets with thousands of entries, you need raw nftables. kldload's KVM profile, WireGuard plane isolation, and Kubernetes node firewalling all use nftables directly. firewalld and nftables can coexist on the same system — firewalld manages its own tables, and your hand-written tables sit alongside them — but mixing the two gets confusing fast. Pick one and own the ruleset.

3. nftables fundamentals

nftables has four core concepts: families, tables, chains, and rules. Get these right and everything else follows.

Address families

Family	Handles	Use when
`inet`	IPv4 + IPv6	Almost always — one chain for both protocols
`ip`	IPv4 only	IPv4-specific rules that must not apply to IPv6
`ip6`	IPv6 only	IPv6-specific rules that must not apply to IPv4
`arp`	ARP frames	ARP spoofing protection
`bridge`	Ethernet bridge traffic	Filtering between bridged VMs
`netdev`	Per-device ingress	Early drop before routing (DDoS mitigation)

inet family handles both IPv4 and IPv6 in a single chain. Always use inet unless you have a specific reason not to. Writing separate ip and ip6 tables means maintaining two copies of every rule and keeping them in sync forever. The only reason to use separate families is if a rule genuinely needs different behavior for IPv4 vs IPv6 — which is rare in practice.

Tables, chains, and rules

A table is a namespace. It has a name, a family, and contains chains. Tables are completely independent — you can have multiple tables in the same family, and they all process packets. There is no implicit table like iptables' filter. You create your own.

A chain is an ordered list of rules attached to a netfilter hook. Chains can be base chains (attached to a kernel hook — they see packets) or regular chains (called explicitly via jump or goto from other rules). Base chains need three properties:

type — filter (allow/drop), nat (address translation), route (mark for policy routing)
hook — where in the packet path the chain runs
priority — order relative to other chains at the same hook (lower = earlier)

The five hooks for inet/ip/ip6:

Hook	When it runs	Typical use
`prerouting`	Before routing decision	DNAT, raw conntrack bypass
`input`	Packets destined for this host	Host firewall (allow/deny inbound)
`forward`	Packets being routed through this host	Router/firewall for VM traffic
`output`	Packets generated by this host	Outbound filtering (rarely needed)
`postrouting`	After routing decision, before egress	SNAT, masquerade

The chain policy is the default verdict when no rule matches: accept (permissive) or drop (deny-by-default). For a firewall, set input chain policy to drop and add rules to allow specific traffic. For a router's forward chain, also set policy to drop and allow specific flows.

Minimal ruleset with line-by-line explanation

# /etc/nftables.conf — minimal host firewall

table inet filter {                      # table named "filter", inet family (v4+v6)

  chain input {
    type filter hook input priority 0; policy drop;
    # hook: input (packets destined for this host)
    # priority 0: standard filter priority
    # policy drop: deny everything unless a rule allows it

    iifname "lo" accept                  # always accept loopback
    ct state invalid drop                # drop malformed/unknown connections
    ct state established,related accept  # allow replies to our outbound connections
    ip protocol icmp accept              # allow ICMP (ping, traceroute, path MTU)
    ip6 nexthdr icmpv6 accept            # allow ICMPv6 (NDP, router discovery)
    tcp dport 22 accept                  # allow SSH
    # policy drop handles everything else
  }

  chain forward {
    type filter hook forward priority 0; policy drop;
    # drop all forwarded traffic by default (not a router)
  }

  chain output {
    type filter hook output priority 0; policy accept;
    # allow all outbound — tighten if needed
  }

}

Load it: nft -f /etc/nftables.conf. Verify: nft list ruleset.

4. Writing rules

nftables rules follow a consistent pattern: match expressions then statement (action). Multiple match expressions in the same rule are implicitly ANDed — all must match for the statement to execute.

Selectors

Selector	Matches	Example
`iifname`	Incoming interface name	`iifname "eth0"`
`oifname`	Outgoing interface name	`oifname "wg0"`
`ip saddr`	Source IP address	`ip saddr 10.0.0.0/8`
`ip daddr`	Destination IP address	`ip daddr 192.168.1.1`
`tcp dport`	TCP destination port	`tcp dport { 80, 443 }`
`udp dport`	UDP destination port	`udp dport 51820`
`ct state`	Connection tracking state	`ct state established,related`
`meta l4proto`	Layer 4 protocol	`meta l4proto tcp`

Actions

Action	Effect
`accept`	Allow the packet, stop rule evaluation
`drop`	Silently discard the packet
`reject`	Discard and send ICMP unreachable (tells the sender immediately)
`log prefix "tag: "`	Log to dmesg/syslog, continue evaluation
`counter`	Increment packet/byte counter, continue evaluation
`queue`	Send to userspace via NFQUEUE (for IDS/IPS)
`jump <chain>`	Evaluate another chain; return here on `return`
`goto <chain>`	Evaluate another chain; do not return
`return`	Return to calling chain (from a jumped chain)

Common rule examples

# Allow SSH
tcp dport 22 accept

# Allow HTTP and HTTPS (inline anonymous set)
tcp dport { 80, 443 } accept

# Allow WireGuard (UDP 51820)
udp dport 51820 accept

# Allow ICMP ping (IPv4)
ip protocol icmp icmp type echo-request accept

# Allow ICMPv6 (IPv6 — required for NDP, router solicitation)
ip6 nexthdr icmpv6 accept

# Allow established/related (reply traffic) — put this near the top
ct state established,related accept

# Drop invalid connection state
ct state invalid drop

# Allow SSH only from a specific subnet
ip saddr 10.10.0.0/16 tcp dport 22 accept

# Allow SSH from specific interface only
iifname "wg1" tcp dport 22 accept

# Log and drop everything else
log prefix "nftables drop: " flags all drop

The most common mistake when writing a new ruleset: forgetting ct state established,related accept. Without it, you can initiate outbound connections but the reply packets get dropped by your input chain policy. Your SSH session to a remote host starts, sends the SYN, gets a SYN-ACK back — and drops it. This one rule should be at or near the top of every input chain that has a drop policy. Put it before the specific allow rules so established traffic fast-paths through without walking the rest of the chain.

5. Sets and maps — dynamic rule tables

Sets and maps are the feature that most separates nftables from iptables. They are kernel-side data structures — hash tables, radix trees, or interval trees — that can hold addresses, ports, ranges, or any matchable value. Rules reference sets by name. The kernel performs a single lookup, not a linear scan.

Named sets

table inet filter {

  # Named set of blocked IPs
  set blocklist {
    type ipv4_addr
    flags dynamic, timeout
    timeout 24h      # entries expire automatically after 24 hours
  }

  # Named set of trusted management IPs
  set mgmt_hosts {
    type ipv4_addr
    elements = { 10.10.1.5, 10.10.1.6, 10.10.1.7 }
  }

  chain input {
    type filter hook input priority 0; policy drop;

    # Drop any IP in the blocklist
    ip saddr @blocklist drop

    # Allow SSH only from management hosts
    ip saddr @mgmt_hosts tcp dport 22 accept

    ct state established,related accept
    iifname "lo" accept
  }

}

# Add an IP to the blocklist at runtime (no ruleset reload)
nft add element inet filter blocklist { 203.0.113.42 }

# Add an IP with a custom timeout
nft add element inet filter blocklist { 203.0.113.99 timeout 1h }

# Remove an IP
nft delete element inet filter blocklist { 203.0.113.42 }

Interval sets (IP ranges)

set private_ranges {
  type ipv4_addr
  flags interval                           # enables CIDR and range notation
  elements = {
    10.0.0.0/8,
    172.16.0.0/12,
    192.168.0.0/16
  }
}

# Block traffic from outside private ranges reaching internal services
ip saddr != @private_ranges tcp dport 9090 drop  # block public access to Prometheus

Verdict maps

# Map source IP to a per-IP action (e.g., different policies per client)
map client_policy {
  type ipv4_addr : verdict
  elements = {
    10.10.0.5 : accept,
    10.10.0.6 : drop,
    10.10.0.7 : jump custom_chain
  }
}

ip saddr vmap @client_policy

Sets are why nftables scales. An iptables chain with 10,000 IP rules is 10,000 sequential comparisons — every packet walks the entire list until it finds a match. An nftables set with 10,000 IPs is a hash lookup: O(1) regardless of set size. If you are managing blocklists, allowlists, geo-blocks, or any large collection of IP addresses, sets are not optional — they are the correct tool. Populating an iptables chain with 10,000 rules also takes seconds and causes a brief window of incomplete enforcement. Populating an nftables set with 10,000 elements is a single atomic operation that completes in milliseconds.

6. NAT — masquerade, DNAT, SNAT

NAT rules live in a nat type chain. For most use cases you need two chains: one at prerouting for DNAT (changing the destination) and one at postrouting for SNAT/masquerade (changing the source). First, enable IP forwarding:

sysctl -w net.ipv4.ip_forward=1
echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.d/99-forwarding.conf

Masquerade (outbound NAT)

table inet nat {

  chain postrouting {
    type nat hook postrouting priority 100; policy accept;

    # Masquerade all traffic from the VM bridge going out eth0
    iifname "virbr0" oifname "eth0" masquerade

    # Masquerade WireGuard traffic going out the uplink
    iifname "wg0" oifname "eth0" masquerade
  }

}

Destination NAT (port forwarding)

table inet nat {

  chain prerouting {
    type nat hook prerouting priority -100; policy accept;

    # Forward external port 8080 to internal VM at 192.168.122.10:80
    iifname "eth0" tcp dport 8080 dnat to 192.168.122.10:80

    # Forward WireGuard's DNS queries to local resolver
    iifname "wg0" udp dport 53 redirect to :5353

    # DNAT to a different port on the same host (transparent proxy)
    tcp dport 80 redirect to :3128
  }

  chain postrouting {
    type nat hook postrouting priority 100; policy accept;
    # Required when DNAT'ing to a different host — rewrite source so
    # the VM replies to us, not directly to the external client
    ip daddr 192.168.122.10 masquerade
  }

}

Static SNAT

# SNAT all traffic from internal range to a specific public IP
# (use instead of masquerade when you have a static external IP)
ip saddr 192.168.122.0/24 oifname "eth0" snat to 203.0.113.5

NAT on kldload matters for three scenarios. KVM bridged VMs: VMs on virbr0 need masquerade to reach the internet — the host rewrites their source IP so replies come back to the host and get forwarded. WireGuard site-to-site: traffic from a remote site's subnet may need masquerade at the receiving end so return traffic uses the right path. Containers: Docker and Podman write their own nftables NAT rules when you map ports — nft list ruleset after starting a container with -p 8080:80 and you will see a DNAT rule. If you have your own ruleset, understand which tables Docker and Podman create so your forward chain policy does not silently block container traffic.

7. Per-interface policies — the kldload pattern

The most powerful use of nftables in the kldload stack is per-interface trust policies. Each WireGuard tunnel represents a separate trust domain — management plane, replication plane, metrics plane, user plane — and each domain gets exactly the traffic it needs and nothing more. nftables iifname makes this a single keyword.

eth0 — public internet

Only WireGuard UDP should arrive here. Everything else is dropped. SSH, Prometheus, ZFS replication, internal APIs — none of these should be reachable on the public interface.

// eth0: accept udp dport 51820 (WireGuard) // eth0: drop everything else

wg1 — management plane

SSH access for operators. Only admin IPs are in this tunnel. Only port 22. No other services. Even if someone has a WireGuard key for wg1, they can only reach SSH.

// wg1: accept tcp dport 22 // wg1: drop everything else

wg2 — metrics plane

Prometheus scrape traffic. Only the metrics collector has a key for this tunnel. Prometheus node exporter (9100), ZFS exporter (9134), WireGuard exporter (9586). No SSH, no APIs.

// wg2: accept tcp dport { 9100, 9134, 9586 } // wg2: drop everything else

wg3 — replication plane

ZFS replication traffic only. Only the replication peer has a key for this tunnel. Only the ZFS replication daemon port. No management, no metrics.

// wg3: accept tcp dport 8023 (sanoid/syncoid) // wg3: drop everything else

table inet filter {

  chain input {
    type filter hook input priority 0; policy drop;

    # Universal: loopback, established, ICMP
    iifname "lo" accept
    ct state invalid drop
    ct state established,related accept
    ip protocol icmp accept
    ip6 nexthdr icmpv6 accept

    # eth0 (public): WireGuard UDP only
    iifname "eth0" udp dport 51820 accept
    iifname "eth0" drop

    # wg1 (management plane): SSH only
    iifname "wg1" tcp dport 22 accept
    iifname "wg1" drop

    # wg2 (metrics plane): Prometheus exporters only
    iifname "wg2" tcp dport { 9100, 9134, 9586 } accept
    iifname "wg2" drop

    # wg3 (replication plane): ZFS replication only
    iifname "wg3" tcp dport 8023 accept
    iifname "wg3" drop

    # virbr0 (VM bridge): DHCP + DNS for VMs
    iifname "virbr0" udp dport { 53, 67 } accept

  }

}

This is the cornerstone of kldload's security model. Each WireGuard interface is a separate trust boundary with its own firewall rules. A compromised metrics collector — even one with a valid WireGuard key — cannot SSH into the host, because the wg2 drop rule runs before the general SSH accept rule. A compromised replication peer cannot reach Prometheus. The nftables iifname selector makes this trivial to express and trivial to audit: one line per permission, one interface per trust domain. Read the full WireGuard plane architecture in the WireGuard Masterclass.

8. Connection tracking (ct)

Connection tracking (conntrack) is the subsystem that makes stateful firewalling possible. It tracks every active TCP connection, UDP "connection" (by 5-tuple), and ICMP exchange, so that reply packets can be matched back to the originating flow and allowed through without a specific rule.

Connection states

State	Meaning	Typical action
`new`	First packet of a new connection (e.g., TCP SYN)	Accept if the destination is allowed
`established`	Packet part of an already-accepted connection	Accept unconditionally
`related`	New connection related to an existing one (FTP data, ICMP error)	Accept unconditionally
`invalid`	Packet that doesn't match any known connection and isn't new	Drop immediately (before other rules)

Connection tracking helpers

Some protocols (FTP, SIP, TFTP) open secondary connections during the session. conntrack helpers parse the control channel to learn which secondary connection is coming, then mark it as related so it can be accepted automatically:

# Enable FTP helper (parses PORT/PASV commands to allow data connections)
table inet filter {
  ct helper ftp-21 {
    type "ftp" protocol tcp
    l3proto ip
  }

  chain prerouting_helpers {
    type filter hook prerouting priority -150; policy accept;
    tcp dport 21 ct helper set "ftp-21"
  }
}

Tuning conntrack

# View current conntrack table size and usage
sysctl net.netfilter.nf_conntrack_max
sysctl net.netfilter.nf_conntrack_count

# Increase conntrack table for a high-NAT KVM host (VMs + containers)
echo "net.netfilter.nf_conntrack_max = 524288" >> /etc/sysctl.d/99-conntrack.conf
sysctl -p /etc/sysctl.d/99-conntrack.conf

# Reduce TCP timeout for faster table turnover under load
echo "net.netfilter.nf_conntrack_tcp_timeout_established = 1800" >> /etc/sysctl.d/99-conntrack.conf

# Check for conntrack exhaustion
dmesg | grep "nf_conntrack: table full"

Bypassing conntrack

# Skip conntrack for high-throughput forwarded traffic (e.g., storage replication)
# Reduces CPU overhead when you don't need stateful tracking for that flow
table inet raw {
  chain prerouting {
    type filter hook prerouting priority -300; policy accept;
    iifname "wg3" notrack     # bypass conntrack for ZFS replication plane
  }
}

conntrack is why you can write "allow established replies" as a single rule instead of needing explicit rules for every return packet. But conntrack has a fixed-size table, and that table lives in kernel memory. On a kldload KVM host doing NAT for 50 VMs plus several containers, with active ZFS replication flows and WireGuard tunnels, you can exhaust the default conntrack table size under load. The symptom is not a gradual degradation — it is sudden random connection drops, and the evidence is "nf_conntrack: table full, dropping packet" in dmesg. If you see that, double or quadruple nf_conntrack_max. The memory cost is low (roughly 300 bytes per entry). Set it proactively on any host doing significant NAT.

9. Rate limiting and traffic shaping

nftables rate limiting runs in the kernel at packet time — before any userspace application sees the traffic. This makes it strictly faster and more reliable than application-level rate limiting.

Per-rule rate limiting

# Limit new SSH connections: 3 per minute (brute-force protection)
tcp dport 22 ct state new limit rate 3/minute burst 5 packets accept
tcp dport 22 ct state new drop   # drop anything over the limit

# Limit ICMP ping rate (prevent ICMP flood)
ip protocol icmp icmp type echo-request limit rate 10/second burst 20 packets accept
ip protocol icmp icmp type echo-request drop

# Limit DNS queries (UDP) — useful on resolvers under query flood
udp dport 53 limit rate 100/second accept
udp dport 53 drop

Per-source-IP rate limiting with meters

# Per-source-IP SSH rate limit — each IP gets its own 3/minute budget
# Uses a dynamic meter (hash map keyed by source IP)
tcp dport 22 ct state new \
  meter ssh_rate { ip saddr limit rate 3/minute burst 5 packets } accept
tcp dport 22 ct state new drop

# SYN flood mitigation: limit new TCP connections per source
tcp flags syn ct state new \
  meter syn_flood { ip saddr limit rate 20/second burst 50 packets } accept
tcp flags syn ct state new \
  log prefix "syn-flood: " drop

# Auto-add SSH brute-force sources to the blocklist set
tcp dport 22 ct state new \
  meter ssh_brute { ip saddr limit rate 5/minute burst 10 packets } accept
tcp dport 22 ct state new \
  add @blocklist { ip saddr timeout 1h } drop

Rate limiting at the nftables level runs in the kernel on every packet — the decision is made before any socket buffer is touched, before any process is woken up. fail2ban reads log files, parses them with regex, and then adds iptables/nftables rules. By the time fail2ban acts, the attacker has already sent thousands of packets and your sshd has already generated thousands of log lines. nftables rate limiting means the kernel itself throttles the attack at the first packet. For SSH brute-force protection, nftables is strictly superior to fail2ban. You can run both — fail2ban as a permanent ban layer, nftables meters as the immediate throttle — but the nftables meter should always be the first line of defense.

10. Logging and debugging

Inspect the live ruleset

# Show every table, chain, set, and rule currently loaded
nft list ruleset

# Show a specific table
nft list table inet filter

# Show a specific chain
nft list chain inet filter input

# Show a specific set
nft list set inet filter blocklist

# Watch ruleset changes in real time (shows add/delete events)
nft monitor

Counters

# Add counters to rules to see packet/byte hit counts
tcp dport 22 counter accept
ct state established,related counter accept
counter drop          # count everything that hits the default drop

# Named counters (survive ruleset reloads)
counter ssh_accepted {}
tcp dport 22 counter name ssh_accepted accept
nft list counter inet filter ssh_accepted

Logging

# Log dropped packets with a prefix (visible in dmesg and journalctl)
log prefix "nft-drop: " flags all drop

# Log and accept (non-terminal — evaluation continues)
log prefix "nft-ssh: " tcp dport 22 accept

# Log with level (emergency, alert, crit, err, warn, notice, info, debug)
log level warn prefix "syn-flood: " tcp flags syn ct state new drop

# View the logs
journalctl -k | grep "nft-"
dmesg | grep "nft-"

nftrace — packet path tracing

# Enable tracing for specific packets (e.g., trace SSH traffic from 10.0.0.5)
# Add this rule at the TOP of your input chain (before other rules)
ip saddr 10.0.0.5 tcp dport 22 meta nftrace set 1

# Then watch the trace output in another terminal
nft monitor trace

# Output shows every rule the packet matches:
# trace id 3a1b2c4d inet filter input rule tcp dport 22 accept (verdict accept)
# trace id 3a1b2c4d inet filter input verdict accept

# When done, remove the trace rule
nft delete rule inet filter input handle <handle-number>

nftrace is the single most useful debugging tool in nftables. When a packet is not doing what you expect — wrong rule matching, unexpected drop, traffic going through the wrong chain — add meta nftrace set 1 to a rule that matches the traffic you want to trace, then run nft monitor trace. You see the packet's path through the entire ruleset, rule by rule, chain by chain, with the final verdict. There is no equivalent in iptables. The old workflow was: add a LOG rule, send traffic, check dmesg, make a guess, repeat. With nftrace you see exactly which rule made which decision. Use it first, not last.

11. Persistence and automation

Atomic ruleset replacement

# Load entire ruleset from file (atomic — all-or-nothing, no partial state)
nft -f /etc/nftables.conf

# Test syntax without loading
nft -c -f /etc/nftables.conf

# Save current live ruleset to file
nft list ruleset > /etc/nftables.conf

# Enable at boot
systemctl enable nftables
systemctl start nftables

# Reload after changing the conf file
systemctl reload nftables

The nftables.conf file must start by flushing the existing ruleset, or rules will accumulate on each reload:

# Always start your conf file with this
flush ruleset

# Then define your tables
table inet filter { ... }
table inet nat { ... }

Salt and Ansible automation

# Ansible: template the ruleset and reload atomically
# tasks/nftables.yml
- name: Deploy nftables ruleset
  template:
    src: nftables.conf.j2
    dest: /etc/nftables.conf
    mode: '0600'
  notify: reload nftables

- name: Validate nftables syntax
  command: nft -c -f /etc/nftables.conf
  changed_when: false

# Salt: managed file + service reload
nftables_conf:
  file.managed:
    - name: /etc/nftables.conf
    - source: salt://firewall/nftables.conf.jinja
    - template: jinja
    - mode: 600

nftables_reload:
  cmd.wait:
    - name: systemctl reload nftables
    - watch:
        - file: nftables_conf

Atomic replacement is the key automation advantage over iptables. With iptables, you run sequential commands — add rule, add rule, add rule — and there is a window between each command where the ruleset is partially loaded. If the process is interrupted halfway, you end up with half your rules. If two Ansible runs overlap, rules accumulate. nftables loads the entire file as one atomic transaction: either the whole ruleset loads successfully or nothing changes. This is not a nice-to-have for automation — it is a requirement for safe fleet-wide firewall management. Template the file, validate with nft -c -f, then load. The entire operation is safe to run on a running system with active connections.

12. Complete kldload KVM host ruleset

A full production-ready nftables.conf for a kldload KVM host: WireGuard planes for management, metrics, and replication; bridged VMs with NAT; SSH brute-force protection; and counters for all critical paths.

# /etc/nftables.conf — kldload KVM host
# Interfaces:
#   eth0   — public internet uplink
#   wg1    — management plane (SSH, admin access)
#   wg2    — metrics plane (Prometheus scrape)
#   wg3    — replication plane (ZFS syncoid)
#   virbr0 — KVM VM bridge (192.168.122.0/24)
#
# Assumes WireGuard is already configured. See the WireGuard Masterclass.


flush ruleset

# ============================================================
# Filter table — host firewall
# ============================================================
table inet filter {

  # Dynamic blocklist: IPs added at runtime, expire after 1 hour
  set blocklist {
    type ipv4_addr
    flags dynamic, timeout
    timeout 1h
  }

  # Input chain — packets destined for this host
  chain input {
    type filter hook input priority 0; policy drop;

    # Fundamentals
    iifname "lo" accept
    ct state invalid drop
    ct state established,related counter accept
    ip protocol icmp accept
    ip6 nexthdr icmpv6 accept

    # Drop known-bad sources immediately
    ip saddr @blocklist counter drop

    # eth0 (public uplink): WireGuard only — drop everything else hard
    iifname "eth0" udp dport 51820 counter accept
    iifname "eth0" counter drop

    # wg1 (management): SSH with rate limiting and auto-block
    iifname "wg1" tcp dport 22 ct state new \
      meter ssh_mgmt { ip saddr limit rate 5/minute burst 10 packets } \
      counter accept
    iifname "wg1" tcp dport 22 ct state new \
      add @blocklist { ip saddr } \
      log prefix "ssh-brute: " drop
    iifname "wg1" counter drop

    # wg2 (metrics): Prometheus exporters only
    iifname "wg2" tcp dport { 9100, 9134, 9586 } counter accept
    iifname "wg2" counter drop

    # wg3 (replication): ZFS replication port only
    iifname "wg3" tcp dport 8023 counter accept
    iifname "wg3" counter drop

    # virbr0 (VM bridge): DHCP and DNS for guests
    iifname "virbr0" udp dport { 53, 67 } accept
    iifname "virbr0" tcp dport 53 accept

    # Log and count all unexpected drops
    log prefix "nft-input-drop: " counter drop
  }

  # Forward chain — VM traffic routed through this host
  chain forward {
    type filter hook forward priority 0; policy drop;

    ct state invalid drop
    ct state established,related counter accept

    # Allow VMs on virbr0 to reach the internet
    iifname "virbr0" oifname "eth0" counter accept
    iifname "eth0" oifname "virbr0" ct state established,related accept

    # Allow VMs to reach WireGuard-connected resources
    iifname "virbr0" oifname "wg1" accept
    iifname "virbr0" oifname "wg2" accept
    iifname "virbr0" oifname "wg3" accept

    log prefix "nft-fwd-drop: " counter drop
  }

  # Output chain — packets originating from this host
  chain output {
    type filter hook output priority 0; policy accept;
    # Accept all outbound — tighten per your threat model
  }

}

# ============================================================
# NAT table — masquerade for VMs
# ============================================================
table inet nat {

  chain prerouting {
    type nat hook prerouting priority -100; policy accept;
    # Add DNAT rules here as needed, e.g.:
    # iifname "eth0" tcp dport 8080 dnat to 192.168.122.10:80
  }

  chain postrouting {
    type nat hook postrouting priority 100; policy accept;
    # Masquerade VM traffic going out the public interface
    ip saddr 192.168.122.0/24 oifname "eth0" masquerade
    # Masquerade WireGuard traffic if site-to-site masquerade is needed
    iifname "wg0" oifname "eth0" masquerade
  }

}

Deploy it:

# Test syntax first
nft -c -f /etc/nftables.conf

# Load atomically
nft -f /etc/nftables.conf

# Verify
nft list ruleset

# Enable at boot
systemctl enable --now nftables

13. Troubleshooting

Common errors

"Error: table already exists"

You tried to create a table that already exists without flushing first. Either start your .conf file with flush ruleset, or delete the table first: nft delete table inet filter. If you're adding to an existing ruleset, use nft add table (idempotent) instead of nft create table.

// Fix: always start nftables.conf with: flush ruleset

"Operation not permitted"

nftables requires CAP_NET_ADMIN. Run as root, or use sudo. If you're running inside a container, the container needs CAP_NET_ADMIN granted explicitly. Rootless containers cannot modify the host nftables ruleset — only their own network namespace.

// Fix: sudo nft -f /etc/nftables.conf

Rules not matching — iifname check

Interface names must match exactly, including case. Check with ip link show. Common mistake: writing "eth0" when the interface is actually "ens3", "enp2s0", or "bond0". WireGuard interfaces are named exactly as you configured them in wg-quick — check /etc/wireguard/.

// Check: ip link show | grep ": "

Rules not matching — chain hook priority

If two chains are attached to the same hook, the one with the lower priority number runs first. NAT prerouting should use priority -100 (before the default 0). Raw/notrack should use priority -300. If your DNAT isn't working, check that it runs before the filter chain.

// Standard priorities: raw=-300, mangle=-150, nat=-100, filter=0

Migration from iptables

# iptables-translate: convert a single iptables rule to nftables syntax
iptables-translate -A INPUT -p tcp --dport 22 -j ACCEPT
# output: nft add rule ip filter INPUT tcp dport 22 counter accept

# ip6tables-translate for IPv6 rules
ip6tables-translate -A INPUT -p tcp --dport 22 -j ACCEPT

# iptables-restore-translate: convert an entire saved iptables ruleset
iptables-save | iptables-restore-translate -f /etc/nftables.conf

# If you're using firewalld and want to see what nftables rules it generates
nft list ruleset
# firewalld creates its own tables (e.g., "firewalld" table)
# Your hand-written tables coexist — both are evaluated

# Check whether iptables is actually iptables-nft (translation layer)
iptables --version
# "iptables v1.8.x (nf_tables)" = iptables-nft, translating to nftables
# "iptables v1.8.x (legacy)"   = iptables-legacy, direct xtables

Debug checklist

Is the rule loaded? — nft list ruleset. Check the exact chain, exact rule text.
Is the interface name right? — ip link show. Interface names are case-sensitive and must match exactly.
Which rule dropped the packet? — Add meta nftrace set 1 to a matching rule, run nft monitor trace.
Is conntrack interfering? — Check ct state: is the packet arriving as invalid? Run conntrack -L to inspect the table.
Is another table also processing the packet? — nft list ruleset shows all tables. Docker, Podman, and libvirt all write their own. Your forward chain drop policy may be overriding theirs, or theirs may be accepting packets before your drop runs.
Is IP forwarding enabled? — sysctl net.ipv4.ip_forward must be 1 for NAT and VM routing to work.
Is conntrack full? — dmesg | grep nf_conntrack. If you see "table full", increase nf_conntrack_max.

← ZFS Masterclass Observability Masterclass →