nftables Masterclass
This guide covers nftables from first principles to production deployment — the unified packet classification framework that replaced iptables, ip6tables, arptables, and ebtables across every modern Linux distro. If you are running kldload for KVM hosting, WireGuard plane isolation, container networking, or anything that touches the network stack, this is the firewall layer underneath all of it.
What this page covers: nftables architecture and terminology, writing rules from scratch, sets and maps for scalable policy, NAT (masquerade, DNAT, SNAT), per-interface trust boundaries, connection tracking internals, rate limiting and brute-force protection, logging and packet tracing, atomic persistence, Salt and Ansible automation, a complete kldload KVM host ruleset, and a troubleshooting reference.
Prerequisites: basic Linux networking familiarity (interfaces, IP addresses, ports). No prior iptables knowledge required — and if you have it, you will find nftables is a cleaner model.
1. nftables replaced iptables
nftables is the Linux kernel's packet classification framework. It replaced iptables, ip6tables, arptables, and ebtables with one unified tool. Every modern distro ships it. CentOS Stream 9, Debian 13, Ubuntu 24.04, Fedora 41, Rocky Linux 9 — all of them use nftables as the kernel-side firewall layer. If you're still writing iptables rules, you're writing for a deprecated API.
nftables is not a reimplementation of iptables with a different syntax. It is a fundamentally different architecture: user-defined tables and chains, a single expression language for both matching and actions, native sets and maps built into the kernel, and atomic ruleset replacement as a first-class operation. The old model was a fixed set of tables (filter, nat, mangle, raw) with fixed hook points and separate tools for IPv4, IPv6, ARP, and bridge. nftables replaces all of that with one address-family-aware framework.
iptables is actually iptables-nft — a frontend that converts your commands into nftables operations. Writing iptables rules means writing for a translation layer on top of the real thing. Write nftables directly. You get cleaner syntax, better performance, and access to features — sets, maps, meters — that have no iptables equivalent.2. nftables vs iptables vs firewalld
Understanding the relationship between these three is essential before you write a single rule.
iptables
Fixed table and chain layout (filter/nat/mangle/raw). Separate tools for IPv4, IPv6, ARP, bridge. Rules are match-and-target: each rule tests a condition and, on match, takes one action. Chains are linear lists — every packet walks the list until a rule matches. No native data structures. Adding a rule is a syscall that replaces the entire ruleset.
nftables
User-defined tables and chains. One tool for all address families. Rules are expressions: multiple match conditions and multiple actions in a single rule. Built-in sets (hash tables, radix trees, intervals) for O(1) IP lookups. Atomic ruleset replacement — one syscall loads the entire file. Native rate meters, verdict maps, and connection tracking integration.
firewalld
A zone-based frontend that generates nftables (or iptables-nft) rules underneath. Zones map interfaces and source addresses to trust levels. Works well for simple allow/deny rules on desktop and server installs. Not suited for complex per-interface policy, dynamic sets, rate limiting, or custom NAT configurations.
| Feature | iptables | nftables | firewalld |
|---|---|---|---|
| IPv4 + IPv6 | Separate tools | Unified (inet family) |
Unified (via nftables) |
| Large IP lists | O(n) per packet | O(1) hash set | Via ipset (awkward) |
| Atomic load | No (sequential) | Yes (nft -f) |
Yes (via nftables) |
| Rate limiting | hashlimit module | Native meters | Not exposed |
| Per-interface policy | -i / -o match | iifname / oifname | Zones (coarse) |
| Status | Deprecated API | Current standard | Active (uses nftables) |
3. nftables fundamentals
nftables has four core concepts: families, tables, chains, and rules. Get these right and everything else follows.
Address families
| Family | Handles | Use when |
|---|---|---|
inet |
IPv4 + IPv6 | Almost always — one chain for both protocols |
ip |
IPv4 only | IPv4-specific rules that must not apply to IPv6 |
ip6 |
IPv6 only | IPv6-specific rules that must not apply to IPv4 |
arp |
ARP frames | ARP spoofing protection |
bridge |
Ethernet bridge traffic | Filtering between bridged VMs |
netdev |
Per-device ingress | Early drop before routing (DDoS mitigation) |
inet family handles both IPv4 and IPv6 in a single chain. Always use inet unless you have a specific reason not to. Writing separate ip and ip6 tables means maintaining two copies of every rule and keeping them in sync forever. The only reason to use separate families is if a rule genuinely needs different behavior for IPv4 vs IPv6 — which is rare in practice.Tables, chains, and rules
A table is a namespace. It has a name, a family, and contains chains. Tables are completely independent — you can have multiple tables in the same family, and they all process packets. There is no implicit table like iptables' filter. You create your own.
A chain is an ordered list of rules attached to a netfilter hook. Chains can be base chains (attached to a kernel hook — they see packets) or regular chains (called explicitly via jump or goto from other rules). Base chains need three properties:
- type —
filter(allow/drop),nat(address translation),route(mark for policy routing) - hook — where in the packet path the chain runs
- priority — order relative to other chains at the same hook (lower = earlier)
The five hooks for inet/ip/ip6:
| Hook | When it runs | Typical use |
|---|---|---|
prerouting |
Before routing decision | DNAT, raw conntrack bypass |
input |
Packets destined for this host | Host firewall (allow/deny inbound) |
forward |
Packets being routed through this host | Router/firewall for VM traffic |
output |
Packets generated by this host | Outbound filtering (rarely needed) |
postrouting |
After routing decision, before egress | SNAT, masquerade |
The chain policy is the default verdict when no rule matches: accept (permissive) or drop (deny-by-default). For a firewall, set input chain policy to drop and add rules to allow specific traffic. For a router's forward chain, also set policy to drop and allow specific flows.
Minimal ruleset with line-by-line explanation
# /etc/nftables.conf — minimal host firewall table inet filter { # table named "filter", inet family (v4+v6) chain input { type filter hook input priority 0; policy drop; # hook: input (packets destined for this host) # priority 0: standard filter priority # policy drop: deny everything unless a rule allows it iifname "lo" accept # always accept loopback ct state invalid drop # drop malformed/unknown connections ct state established,related accept # allow replies to our outbound connections ip protocol icmp accept # allow ICMP (ping, traceroute, path MTU) ip6 nexthdr icmpv6 accept # allow ICMPv6 (NDP, router discovery) tcp dport 22 accept # allow SSH # policy drop handles everything else } chain forward { type filter hook forward priority 0; policy drop; # drop all forwarded traffic by default (not a router) } chain output { type filter hook output priority 0; policy accept; # allow all outbound — tighten if needed } }
Load it: nft -f /etc/nftables.conf. Verify: nft list ruleset.
4. Writing rules
nftables rules follow a consistent pattern: match expressions then statement (action). Multiple match expressions in the same rule are implicitly ANDed — all must match for the statement to execute.
Selectors
| Selector | Matches | Example |
|---|---|---|
iifname |
Incoming interface name | iifname "eth0" |
oifname |
Outgoing interface name | oifname "wg0" |
ip saddr |
Source IP address | ip saddr 10.0.0.0/8 |
ip daddr |
Destination IP address | ip daddr 192.168.1.1 |
tcp dport |
TCP destination port | tcp dport { 80, 443 } |
udp dport |
UDP destination port | udp dport 51820 |
ct state |
Connection tracking state | ct state established,related |
meta l4proto |
Layer 4 protocol | meta l4proto tcp |
Actions
| Action | Effect |
|---|---|
accept |
Allow the packet, stop rule evaluation |
drop |
Silently discard the packet |
reject |
Discard and send ICMP unreachable (tells the sender immediately) |
log prefix "tag: " |
Log to dmesg/syslog, continue evaluation |
counter |
Increment packet/byte counter, continue evaluation |
queue |
Send to userspace via NFQUEUE (for IDS/IPS) |
jump <chain> |
Evaluate another chain; return here on return |
goto <chain> |
Evaluate another chain; do not return |
return |
Return to calling chain (from a jumped chain) |
Common rule examples
# Allow SSH tcp dport 22 accept # Allow HTTP and HTTPS (inline anonymous set) tcp dport { 80, 443 } accept # Allow WireGuard (UDP 51820) udp dport 51820 accept # Allow ICMP ping (IPv4) ip protocol icmp icmp type echo-request accept # Allow ICMPv6 (IPv6 — required for NDP, router solicitation) ip6 nexthdr icmpv6 accept # Allow established/related (reply traffic) — put this near the top ct state established,related accept # Drop invalid connection state ct state invalid drop # Allow SSH only from a specific subnet ip saddr 10.10.0.0/16 tcp dport 22 accept # Allow SSH from specific interface only iifname "wg1" tcp dport 22 accept # Log and drop everything else log prefix "nftables drop: " flags all drop
ct state established,related accept. Without it, you can initiate outbound connections but the reply packets get dropped by your input chain policy. Your SSH session to a remote host starts, sends the SYN, gets a SYN-ACK back — and drops it. This one rule should be at or near the top of every input chain that has a drop policy. Put it before the specific allow rules so established traffic fast-paths through without walking the rest of the chain.5. Sets and maps — dynamic rule tables
Sets and maps are the feature that most separates nftables from iptables. They are kernel-side data structures — hash tables, radix trees, or interval trees — that can hold addresses, ports, ranges, or any matchable value. Rules reference sets by name. The kernel performs a single lookup, not a linear scan.
Named sets
table inet filter { # Named set of blocked IPs set blocklist { type ipv4_addr flags dynamic, timeout timeout 24h # entries expire automatically after 24 hours } # Named set of trusted management IPs set mgmt_hosts { type ipv4_addr elements = { 10.10.1.5, 10.10.1.6, 10.10.1.7 } } chain input { type filter hook input priority 0; policy drop; # Drop any IP in the blocklist ip saddr @blocklist drop # Allow SSH only from management hosts ip saddr @mgmt_hosts tcp dport 22 accept ct state established,related accept iifname "lo" accept } } # Add an IP to the blocklist at runtime (no ruleset reload) nft add element inet filter blocklist { 203.0.113.42 } # Add an IP with a custom timeout nft add element inet filter blocklist { 203.0.113.99 timeout 1h } # Remove an IP nft delete element inet filter blocklist { 203.0.113.42 }
Interval sets (IP ranges)
set private_ranges { type ipv4_addr flags interval # enables CIDR and range notation elements = { 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 } } # Block traffic from outside private ranges reaching internal services ip saddr != @private_ranges tcp dport 9090 drop # block public access to Prometheus
Verdict maps
# Map source IP to a per-IP action (e.g., different policies per client) map client_policy { type ipv4_addr : verdict elements = { 10.10.0.5 : accept, 10.10.0.6 : drop, 10.10.0.7 : jump custom_chain } } ip saddr vmap @client_policy
6. NAT — masquerade, DNAT, SNAT
NAT rules live in a nat type chain. For most use cases you need two chains: one at prerouting for DNAT (changing the destination) and one at postrouting for SNAT/masquerade (changing the source). First, enable IP forwarding:
sysctl -w net.ipv4.ip_forward=1 echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.d/99-forwarding.conf
Masquerade (outbound NAT)
table inet nat { chain postrouting { type nat hook postrouting priority 100; policy accept; # Masquerade all traffic from the VM bridge going out eth0 iifname "virbr0" oifname "eth0" masquerade # Masquerade WireGuard traffic going out the uplink iifname "wg0" oifname "eth0" masquerade } }
Destination NAT (port forwarding)
table inet nat { chain prerouting { type nat hook prerouting priority -100; policy accept; # Forward external port 8080 to internal VM at 192.168.122.10:80 iifname "eth0" tcp dport 8080 dnat to 192.168.122.10:80 # Forward WireGuard's DNS queries to local resolver iifname "wg0" udp dport 53 redirect to :5353 # DNAT to a different port on the same host (transparent proxy) tcp dport 80 redirect to :3128 } chain postrouting { type nat hook postrouting priority 100; policy accept; # Required when DNAT'ing to a different host — rewrite source so # the VM replies to us, not directly to the external client ip daddr 192.168.122.10 masquerade } }
Static SNAT
# SNAT all traffic from internal range to a specific public IP # (use instead of masquerade when you have a static external IP) ip saddr 192.168.122.0/24 oifname "eth0" snat to 203.0.113.5
virbr0 need masquerade to reach the internet — the host rewrites their source IP so replies come back to the host and get forwarded. WireGuard site-to-site: traffic from a remote site's subnet may need masquerade at the receiving end so return traffic uses the right path. Containers: Docker and Podman write their own nftables NAT rules when you map ports — nft list ruleset after starting a container with -p 8080:80 and you will see a DNAT rule. If you have your own ruleset, understand which tables Docker and Podman create so your forward chain policy does not silently block container traffic.7. Per-interface policies — the kldload pattern
The most powerful use of nftables in the kldload stack is per-interface trust policies. Each WireGuard tunnel represents a separate trust domain — management plane, replication plane, metrics plane, user plane — and each domain gets exactly the traffic it needs and nothing more. nftables iifname makes this a single keyword.
eth0 — public internet
Only WireGuard UDP should arrive here. Everything else is dropped. SSH, Prometheus, ZFS replication, internal APIs — none of these should be reachable on the public interface.
wg1 — management plane
SSH access for operators. Only admin IPs are in this tunnel. Only port 22. No other services. Even if someone has a WireGuard key for wg1, they can only reach SSH.
wg2 — metrics plane
Prometheus scrape traffic. Only the metrics collector has a key for this tunnel. Prometheus node exporter (9100), ZFS exporter (9134), WireGuard exporter (9586). No SSH, no APIs.
wg3 — replication plane
ZFS replication traffic only. Only the replication peer has a key for this tunnel. Only the ZFS replication daemon port. No management, no metrics.
table inet filter { chain input { type filter hook input priority 0; policy drop; # Universal: loopback, established, ICMP iifname "lo" accept ct state invalid drop ct state established,related accept ip protocol icmp accept ip6 nexthdr icmpv6 accept # eth0 (public): WireGuard UDP only iifname "eth0" udp dport 51820 accept iifname "eth0" drop # wg1 (management plane): SSH only iifname "wg1" tcp dport 22 accept iifname "wg1" drop # wg2 (metrics plane): Prometheus exporters only iifname "wg2" tcp dport { 9100, 9134, 9586 } accept iifname "wg2" drop # wg3 (replication plane): ZFS replication only iifname "wg3" tcp dport 8023 accept iifname "wg3" drop # virbr0 (VM bridge): DHCP + DNS for VMs iifname "virbr0" udp dport { 53, 67 } accept } }
wg2 drop rule runs before the general SSH accept rule. A compromised replication peer cannot reach Prometheus. The nftables iifname selector makes this trivial to express and trivial to audit: one line per permission, one interface per trust domain. Read the full WireGuard plane architecture in the WireGuard Masterclass.8. Connection tracking (ct)
Connection tracking (conntrack) is the subsystem that makes stateful firewalling possible. It tracks every active TCP connection, UDP "connection" (by 5-tuple), and ICMP exchange, so that reply packets can be matched back to the originating flow and allowed through without a specific rule.
Connection states
| State | Meaning | Typical action |
|---|---|---|
new |
First packet of a new connection (e.g., TCP SYN) | Accept if the destination is allowed |
established |
Packet part of an already-accepted connection | Accept unconditionally |
related |
New connection related to an existing one (FTP data, ICMP error) | Accept unconditionally |
invalid |
Packet that doesn't match any known connection and isn't new | Drop immediately (before other rules) |
Connection tracking helpers
Some protocols (FTP, SIP, TFTP) open secondary connections during the session. conntrack helpers parse the control channel to learn which secondary connection is coming, then mark it as related so it can be accepted automatically:
# Enable FTP helper (parses PORT/PASV commands to allow data connections) table inet filter { ct helper ftp-21 { type "ftp" protocol tcp l3proto ip } chain prerouting_helpers { type filter hook prerouting priority -150; policy accept; tcp dport 21 ct helper set "ftp-21" } }
Tuning conntrack
# View current conntrack table size and usage sysctl net.netfilter.nf_conntrack_max sysctl net.netfilter.nf_conntrack_count # Increase conntrack table for a high-NAT KVM host (VMs + containers) echo "net.netfilter.nf_conntrack_max = 524288" >> /etc/sysctl.d/99-conntrack.conf sysctl -p /etc/sysctl.d/99-conntrack.conf # Reduce TCP timeout for faster table turnover under load echo "net.netfilter.nf_conntrack_tcp_timeout_established = 1800" >> /etc/sysctl.d/99-conntrack.conf # Check for conntrack exhaustion dmesg | grep "nf_conntrack: table full"
Bypassing conntrack
# Skip conntrack for high-throughput forwarded traffic (e.g., storage replication) # Reduces CPU overhead when you don't need stateful tracking for that flow table inet raw { chain prerouting { type filter hook prerouting priority -300; policy accept; iifname "wg3" notrack # bypass conntrack for ZFS replication plane } }
nf_conntrack_max. The memory cost is low (roughly 300 bytes per entry). Set it proactively on any host doing significant NAT.9. Rate limiting and traffic shaping
nftables rate limiting runs in the kernel at packet time — before any userspace application sees the traffic. This makes it strictly faster and more reliable than application-level rate limiting.
Per-rule rate limiting
# Limit new SSH connections: 3 per minute (brute-force protection) tcp dport 22 ct state new limit rate 3/minute burst 5 packets accept tcp dport 22 ct state new drop # drop anything over the limit # Limit ICMP ping rate (prevent ICMP flood) ip protocol icmp icmp type echo-request limit rate 10/second burst 20 packets accept ip protocol icmp icmp type echo-request drop # Limit DNS queries (UDP) — useful on resolvers under query flood udp dport 53 limit rate 100/second accept udp dport 53 drop
Per-source-IP rate limiting with meters
# Per-source-IP SSH rate limit — each IP gets its own 3/minute budget # Uses a dynamic meter (hash map keyed by source IP) tcp dport 22 ct state new \ meter ssh_rate { ip saddr limit rate 3/minute burst 5 packets } accept tcp dport 22 ct state new drop # SYN flood mitigation: limit new TCP connections per source tcp flags syn ct state new \ meter syn_flood { ip saddr limit rate 20/second burst 50 packets } accept tcp flags syn ct state new \ log prefix "syn-flood: " drop # Auto-add SSH brute-force sources to the blocklist set tcp dport 22 ct state new \ meter ssh_brute { ip saddr limit rate 5/minute burst 10 packets } accept tcp dport 22 ct state new \ add @blocklist { ip saddr timeout 1h } drop
10. Logging and debugging
Inspect the live ruleset
# Show every table, chain, set, and rule currently loaded nft list ruleset # Show a specific table nft list table inet filter # Show a specific chain nft list chain inet filter input # Show a specific set nft list set inet filter blocklist # Watch ruleset changes in real time (shows add/delete events) nft monitor
Counters
# Add counters to rules to see packet/byte hit counts tcp dport 22 counter accept ct state established,related counter accept counter drop # count everything that hits the default drop # Named counters (survive ruleset reloads) counter ssh_accepted {} tcp dport 22 counter name ssh_accepted accept nft list counter inet filter ssh_accepted
Logging
# Log dropped packets with a prefix (visible in dmesg and journalctl) log prefix "nft-drop: " flags all drop # Log and accept (non-terminal — evaluation continues) log prefix "nft-ssh: " tcp dport 22 accept # Log with level (emergency, alert, crit, err, warn, notice, info, debug) log level warn prefix "syn-flood: " tcp flags syn ct state new drop # View the logs journalctl -k | grep "nft-" dmesg | grep "nft-"
nftrace — packet path tracing
# Enable tracing for specific packets (e.g., trace SSH traffic from 10.0.0.5) # Add this rule at the TOP of your input chain (before other rules) ip saddr 10.0.0.5 tcp dport 22 meta nftrace set 1 # Then watch the trace output in another terminal nft monitor trace # Output shows every rule the packet matches: # trace id 3a1b2c4d inet filter input rule tcp dport 22 accept (verdict accept) # trace id 3a1b2c4d inet filter input verdict accept # When done, remove the trace rule nft delete rule inet filter input handle <handle-number>
meta nftrace set 1 to a rule that matches the traffic you want to trace, then run nft monitor trace. You see the packet's path through the entire ruleset, rule by rule, chain by chain, with the final verdict. There is no equivalent in iptables. The old workflow was: add a LOG rule, send traffic, check dmesg, make a guess, repeat. With nftrace you see exactly which rule made which decision. Use it first, not last.11. Persistence and automation
Atomic ruleset replacement
# Load entire ruleset from file (atomic — all-or-nothing, no partial state) nft -f /etc/nftables.conf # Test syntax without loading nft -c -f /etc/nftables.conf # Save current live ruleset to file nft list ruleset > /etc/nftables.conf # Enable at boot systemctl enable nftables systemctl start nftables # Reload after changing the conf file systemctl reload nftables
The nftables.conf file must start by flushing the existing ruleset, or rules will accumulate on each reload:
# Always start your conf file with this flush ruleset # Then define your tables table inet filter { ... } table inet nat { ... }
Salt and Ansible automation
# Ansible: template the ruleset and reload atomically # tasks/nftables.yml - name: Deploy nftables ruleset template: src: nftables.conf.j2 dest: /etc/nftables.conf mode: '0600' notify: reload nftables - name: Validate nftables syntax command: nft -c -f /etc/nftables.conf changed_when: false # Salt: managed file + service reload nftables_conf: file.managed: - name: /etc/nftables.conf - source: salt://firewall/nftables.conf.jinja - template: jinja - mode: 600 nftables_reload: cmd.wait: - name: systemctl reload nftables - watch: - file: nftables_conf
nft -c -f, then load. The entire operation is safe to run on a running system with active connections.12. Complete kldload KVM host ruleset
A full production-ready nftables.conf for a kldload KVM host: WireGuard planes for management, metrics, and replication; bridged VMs with NAT; SSH brute-force protection; and counters for all critical paths.
# /etc/nftables.conf — kldload KVM host # Interfaces: # eth0 — public internet uplink # wg1 — management plane (SSH, admin access) # wg2 — metrics plane (Prometheus scrape) # wg3 — replication plane (ZFS syncoid) # virbr0 — KVM VM bridge (192.168.122.0/24) # # Assumes WireGuard is already configured. See the WireGuard Masterclass. flush ruleset # ============================================================ # Filter table — host firewall # ============================================================ table inet filter { # Dynamic blocklist: IPs added at runtime, expire after 1 hour set blocklist { type ipv4_addr flags dynamic, timeout timeout 1h } # Input chain — packets destined for this host chain input { type filter hook input priority 0; policy drop; # Fundamentals iifname "lo" accept ct state invalid drop ct state established,related counter accept ip protocol icmp accept ip6 nexthdr icmpv6 accept # Drop known-bad sources immediately ip saddr @blocklist counter drop # eth0 (public uplink): WireGuard only — drop everything else hard iifname "eth0" udp dport 51820 counter accept iifname "eth0" counter drop # wg1 (management): SSH with rate limiting and auto-block iifname "wg1" tcp dport 22 ct state new \ meter ssh_mgmt { ip saddr limit rate 5/minute burst 10 packets } \ counter accept iifname "wg1" tcp dport 22 ct state new \ add @blocklist { ip saddr } \ log prefix "ssh-brute: " drop iifname "wg1" counter drop # wg2 (metrics): Prometheus exporters only iifname "wg2" tcp dport { 9100, 9134, 9586 } counter accept iifname "wg2" counter drop # wg3 (replication): ZFS replication port only iifname "wg3" tcp dport 8023 counter accept iifname "wg3" counter drop # virbr0 (VM bridge): DHCP and DNS for guests iifname "virbr0" udp dport { 53, 67 } accept iifname "virbr0" tcp dport 53 accept # Log and count all unexpected drops log prefix "nft-input-drop: " counter drop } # Forward chain — VM traffic routed through this host chain forward { type filter hook forward priority 0; policy drop; ct state invalid drop ct state established,related counter accept # Allow VMs on virbr0 to reach the internet iifname "virbr0" oifname "eth0" counter accept iifname "eth0" oifname "virbr0" ct state established,related accept # Allow VMs to reach WireGuard-connected resources iifname "virbr0" oifname "wg1" accept iifname "virbr0" oifname "wg2" accept iifname "virbr0" oifname "wg3" accept log prefix "nft-fwd-drop: " counter drop } # Output chain — packets originating from this host chain output { type filter hook output priority 0; policy accept; # Accept all outbound — tighten per your threat model } } # ============================================================ # NAT table — masquerade for VMs # ============================================================ table inet nat { chain prerouting { type nat hook prerouting priority -100; policy accept; # Add DNAT rules here as needed, e.g.: # iifname "eth0" tcp dport 8080 dnat to 192.168.122.10:80 } chain postrouting { type nat hook postrouting priority 100; policy accept; # Masquerade VM traffic going out the public interface ip saddr 192.168.122.0/24 oifname "eth0" masquerade # Masquerade WireGuard traffic if site-to-site masquerade is needed iifname "wg0" oifname "eth0" masquerade } }
Deploy it:
# Test syntax first nft -c -f /etc/nftables.conf # Load atomically nft -f /etc/nftables.conf # Verify nft list ruleset # Enable at boot systemctl enable --now nftables
13. Troubleshooting
Common errors
"Error: table already exists"
You tried to create a table that already exists without flushing first. Either start your .conf file with flush ruleset, or delete the table first: nft delete table inet filter. If you're adding to an existing ruleset, use nft add table (idempotent) instead of nft create table.
"Operation not permitted"
nftables requires CAP_NET_ADMIN. Run as root, or use sudo. If you're running inside a container, the container needs CAP_NET_ADMIN granted explicitly. Rootless containers cannot modify the host nftables ruleset — only their own network namespace.
Rules not matching — iifname check
Interface names must match exactly, including case. Check with ip link show. Common mistake: writing "eth0" when the interface is actually "ens3", "enp2s0", or "bond0". WireGuard interfaces are named exactly as you configured them in wg-quick — check /etc/wireguard/.
Rules not matching — chain hook priority
If two chains are attached to the same hook, the one with the lower priority number runs first. NAT prerouting should use priority -100 (before the default 0). Raw/notrack should use priority -300. If your DNAT isn't working, check that it runs before the filter chain.
Migration from iptables
# iptables-translate: convert a single iptables rule to nftables syntax iptables-translate -A INPUT -p tcp --dport 22 -j ACCEPT # output: nft add rule ip filter INPUT tcp dport 22 counter accept # ip6tables-translate for IPv6 rules ip6tables-translate -A INPUT -p tcp --dport 22 -j ACCEPT # iptables-restore-translate: convert an entire saved iptables ruleset iptables-save | iptables-restore-translate -f /etc/nftables.conf # If you're using firewalld and want to see what nftables rules it generates nft list ruleset # firewalld creates its own tables (e.g., "firewalld" table) # Your hand-written tables coexist — both are evaluated # Check whether iptables is actually iptables-nft (translation layer) iptables --version # "iptables v1.8.x (nf_tables)" = iptables-nft, translating to nftables # "iptables v1.8.x (legacy)" = iptables-legacy, direct xtables
Debug checklist
- Is the rule loaded? —
nft list ruleset. Check the exact chain, exact rule text. - Is the interface name right? —
ip link show. Interface names are case-sensitive and must match exactly. - Which rule dropped the packet? — Add
meta nftrace set 1to a matching rule, runnft monitor trace. - Is conntrack interfering? — Check
ct state: is the packet arriving asinvalid? Runconntrack -Lto inspect the table. - Is another table also processing the packet? —
nft list rulesetshows all tables. Docker, Podman, and libvirt all write their own. Your forward chain drop policy may be overriding theirs, or theirs may be accepting packets before your drop runs. - Is IP forwarding enabled? —
sysctl net.ipv4.ip_forwardmust be 1 for NAT and VM routing to work. - Is conntrack full? —
dmesg | grep nf_conntrack. If you see "table full", increasenf_conntrack_max.