Documentation

ZFS Zero to Hero

Complete operational guide — from empty disk to replicating datasets between nodes. Every command, every option, every config. No shortcuts.

Works on CentOS/RHEL and Debian. Commands are identical on both.

Part 1: Pools

Installing ZFS and not configuring it is worse than not installing it. A default pool with no compression, no snapshots, no scrub schedule, and atime=on will perform worse than ext4 and give you none of the benefits. ZFS is not magic — it's a contract. You configure it correctly, and it guarantees your data. You leave it at defaults, and you'll wish you'd used something simpler.

The wrong way:

zpool create tank /dev/sdb
# Done! ...right?
# No compression. No snapshots. atime hammering every read.
# One bad sector and your data is gone. Congratulations,
# you just built a worse ext4.

The right way — datasets are your architecture:

rpool
├── ROOT/myhost          mountpoint=/     (your OS — snapshottable, rollbackable)
├── home/alice           compression=zstd, encryption=aes-256-gcm (Alice's key)
├── home/bob             compression=zstd, encryption=aes-256-gcm (Bob's key)
├── srv/postgres         recordsize=8K, logbias=throughput (database-tuned)
├── srv/media            recordsize=1M, compression=off (video streaming)
└── vms/images           recordsize=64K, primarycache=metadata (KVM zvols)

Every dataset has its own compression, encryption, recordsize, quota, and snapshot policy. They're not folders — they're independent storage domains that happen to share a pool. This is why layout matters. Get it right and you have a system where:

Replication becomes invisible:

# Userland way: install rsync, write a cron, manage SSH keys,
# handle partial transfers, pray nothing changes mid-copy,
# set up monitoring, write error handling, maintain it forever.

# ZFS way:
zfs send -i rpool/srv/postgres@yesterday rpool/srv/postgres@today \
  | ssh backup-host zfs recv rpool/replica/postgres

# That's it. Incremental. Atomic. Checksummed. Encrypted in flight
# (over WireGuard). The backup is byte-identical to the source.
# Your app doesn't know. Your users don't know. It just happens.

Userland is irrelevant. This is not a backup job — it's a replication primitive built into the storage layer. Your API triggers zfs send, the kernel handles everything else. No agents. No blob stores. No sync conflicts. Build it into your app and it's done.

Create a pool

# Single disk (no redundancy)
zpool create -o ashift=12 -O compression=lz4 -O acltype=posixacl -O xattr=sa -O relatime=on rpool /dev/sda

# Mirror (2 disks, survives 1 failure)
zpool create -o ashift=12 -O compression=lz4 -O acltype=posixacl -O xattr=sa rpool mirror /dev/sda /dev/sdb

# 3-way mirror (3 disks, survives 2 failures)
zpool create -o ashift=12 -O compression=lz4 rpool mirror /dev/sda /dev/sdb /dev/sdc

# RAIDZ1 (3+ disks, 1 parity, survives 1 failure)
zpool create -o ashift=12 -O compression=lz4 rpool raidz1 /dev/sda /dev/sdb /dev/sdc

# RAIDZ2 (4+ disks, 2 parity, survives 2 failures)
zpool create -o ashift=12 -O compression=lz4 rpool raidz2 /dev/sda /dev/sdb /dev/sdc /dev/sdd

# RAIDZ3 (5+ disks, 3 parity, survives 3 failures)
zpool create -o ashift=12 -O compression=lz4 rpool raidz3 /dev/sd{a,b,c,d,e}

# Striped mirrors (4 disks, 2 mirror vdevs, fast + redundant)
zpool create -o ashift=12 -O compression=lz4 rpool \
  mirror /dev/sda /dev/sdb \
  mirror /dev/sdc /dev/sdd

Pool creation options explained

Option	Value	Why
`ashift=12`	4K sector alignment	Matches modern drives. Never use 9 (512b). Use 13 for some NVMe.
`compression=lz4`	Fast, ~1.5-2x ratio	Always on. Zero reason to disable.
`acltype=posixacl`	POSIX ACLs	Required for systemd, containers, most apps.
`xattr=sa`	Store xattrs in dnodes	Faster than directory-based xattrs.
`relatime=on`	Relaxed atime updates	Reduces write amplification.
`normalization=formD`	Unicode normalization	Consistent filename handling.
`dnodesize=auto`	Variable dnode size	Better metadata performance.
`autotrim=on`	Automatic TRIM	For SSDs. Omit for spinning rust.

ashift is permanent. You cannot change it after pool creation. If you set it wrong, the only fix is to destroy the pool and start over. A pool created with ashift=9 (512-byte sectors) on a 4K drive will work — slowly, painfully, with massive write amplification. Every 4K write becomes eight 512-byte writes. This is the one decision you can't undo. Check your drive: cat /sys/block/sda/queue/physical_block_size. If it says 4096, use ashift=12. If it says 512 and it's an NVMe, use ashift=12 anyway — the drive is lying.

Pool operations

# List pools
zpool list
zpool list -v                    # verbose — shows vdev layout

# Pool health + config
zpool status rpool
zpool status -v rpool            # verbose — shows individual disk status

# Pool I/O stats (live, 2 second interval)
zpool iostat rpool 2
zpool iostat -v rpool 2          # per-vdev breakdown

# Pool history (every command ever run on this pool)
zpool history rpool
zpool history rpool | tail -20   # last 20 commands

# Scrub (verify all checksums — run weekly)
zpool scrub rpool
zpool status rpool | grep scan   # check scrub progress

# Import / export
zpool export rpool               # detach pool (for migration or unmount)
zpool import                     # list available pools
zpool import rpool               # re-import
zpool import -d /dev/disk/by-id rpool   # import by disk ID (more reliable)

# Upgrade pool features
zpool upgrade rpool

# Destroy pool (DESTRUCTIVE)
zpool destroy rpool

Add devices to existing pool

# Add a mirror vdev (expand capacity)
zpool add rpool mirror /dev/sde /dev/sdf

# Add a cache device (L2ARC — read cache on SSD)
zpool add rpool cache /dev/nvme0n1

# Add a log device (SLOG — synchronous write log)
zpool add rpool log mirror /dev/nvme1n1 /dev/nvme1n2

# Add a special vdev (metadata + small blocks on fast storage)
zpool add rpool special mirror /dev/nvme0n1p4 /dev/nvme1n1p4
zfs set special_small_blocks=64K rpool

# Replace a failed disk
zpool replace rpool /dev/sda /dev/sdg
zpool status rpool   # watch resilver progress

# Remove a device (mirrors and special vdevs only)
zpool remove rpool /dev/nvme0n1

# Take a device offline / online
zpool offline rpool /dev/sda
zpool online rpool /dev/sda

Part 2: Datasets

Datasets are not folders. A folder is a name in a directory tree — it shares everything with every other folder on the same filesystem. A dataset is an independent storage domain with its own compression algorithm, its own encryption key, its own snapshot timeline, its own quota, its own recordsize. Two datasets on the same pool have nothing in common except shared free space. Think of them as virtual drives that cost nothing to create and inherit properties from their parents. When someone says "just use subdirectories" — they're telling you to give up every superpower ZFS has.

Create datasets

# Basic dataset
zfs create rpool/data

# With mountpoint
zfs create -o mountpoint=/srv/app rpool/srv/app

# With compression
zfs create -o mountpoint=/srv/logs -o compression=zstd rpool/srv/logs

# With quota (limit size)
zfs create -o mountpoint=/home/alice -o quota=50G rpool/home/alice

# With reservation (guaranteed space)
zfs create -o mountpoint=/srv/db -o reservation=100G rpool/srv/db

Quotas and reservations are opposites. Use both.

A quota is a ceiling — "this dataset cannot use more than X." A reservation is a floor — "this dataset is guaranteed at least X, even if other datasets try to fill the pool."

# The problem without them:
# Alice's home directory fills 900GB of a 1TB pool.
# Now your database can't write. Your logs can't rotate.
# Everything stops. On ext4, this is "just how disks work."
# On ZFS, it's a configuration failure.

# The fix:
zfs set quota=200G      rpool/home/alice      # can't use more than 200G
zfs set reservation=50G rpool/srv/postgres     # always has 50G available
zfs set quota=100G      rpool/var/log          # logs can't eat the pool

# refquota vs quota:
# quota    = includes snapshots in the limit
# refquota = only counts live data, snapshots are free
# Use refquota for users (don't penalize them for snapshots)
zfs set refquota=200G rpool/home/alice

On a shared pool, datasets without quotas are a ticking time bomb. One runaway process, one bad log rotation, one user downloading ISOs — and the whole pool is full. Quotas are not optional on production systems.

# With recordsize tuned for workload

# With recordsize tuned for workload
zfs create -o mountpoint=/srv/postgres -o recordsize=8k rpool/srv/postgres      # PostgreSQL
zfs create -o mountpoint=/srv/mysql -o recordsize=16k rpool/srv/mysql           # MySQL
zfs create -o mountpoint=/srv/media -o recordsize=1M rpool/srv/media            # large files

Recordsize is how ZFS thinks about I/O. Every write gets padded or split to match the recordsize. Get it wrong and you're either wasting space or thrashing the disk.

The rule is simple: match your workload's I/O pattern.

Workload              recordsize    Why
─────────────────     ──────────    ─────────────────────────────────────
PostgreSQL            8K            pg writes 8K pages — exact match
MySQL/InnoDB          16K           innodb_page_size=16K — exact match
MongoDB/WiredTiger    32K           wiredtiger default page
General files         128K          default — good for documents, code
VM disk images        64K           aligns with guest filesystem blocks
Video / ISO / backup  1M            large sequential reads, max throughput
Logs (append-only)    1M            written once, read sequentially
Tiny files (configs)  4K            one block per file, no wasted space

A 5-byte config file on a 1M recordsize dataset wastes 1MB of disk. A 500MB video on a 4K recordsize dataset generates 128,000 metadata entries. Both are disasters. ZFS lets you tune this per dataset — so do it. The default 128K is fine for general use, but if you know your workload, you should set it explicitly.

Warning: recordsize can be changed on existing datasets, but it only affects new writes. Existing data keeps its original recordsize until rewritten. To re-record everything: zfs send | zfs recv into a new dataset with the right recordsize.

# Non-mountable (container for child datasets)
zfs create -o canmount=off -o mountpoint=none rpool/ROOT

# Encrypted dataset
zfs create -o encryption=aes-256-gcm -o keyformat=passphrase rpool/srv/secrets

# All options at once
zfs create \
  -o mountpoint=/srv/production \
  -o compression=lz4 \
  -o quota=500G \
  -o reservation=200G \
  -o recordsize=128k \
  -o atime=off \
  -o logbias=throughput \
  rpool/srv/production

Dataset properties

# List all datasets
zfs list
zfs list -r rpool                # recursive from rpool
zfs list -o name,used,avail,compress,mountpoint   # custom columns

# Get a property
zfs get compression rpool/data
zfs get all rpool/data           # all properties
zfs get compressratio rpool      # how much compression is saving

# Set a property
zfs set compression=zstd rpool/srv/archive
zfs set quota=100G rpool/home/alice
zfs set atime=off rpool/srv/database
zfs set recordsize=8k rpool/srv/postgres

# Inherit from parent
zfs inherit compression rpool/data

# Mount / unmount
zfs mount rpool/data
zfs unmount rpool/data
zfs mount -a                     # mount all datasets

Dataset properties reference

Property	Values	Use case
`compression`	`lz4`, `zstd`, `gzip-9`, `off`	lz4 for general, zstd for archives, off for pre-compressed
`recordsize`	`4k`–`1M`	8k=PostgreSQL, 16k=MySQL, 128k=general, 1M=media
`quota`	size or `none`	Limit dataset size
`reservation`	size or `none`	Guarantee space for dataset
`atime`	`on`, `off`	off for databases and containers
`logbias`	`latency`, `throughput`	throughput for sequential writes
`sync`	`standard`, `always`, `disabled`	disabled only if you accept data loss
`canmount`	`on`, `off`, `noauto`	noauto for boot environments
`mountpoint`	path or `none`	where the dataset mounts
`encryption`	`aes-256-gcm`, `off`	per-dataset encryption
`dedup`	`on`, `off`, `verify`	WARNING: uses massive RAM. Usually not worth it.
`snapdir`	`hidden`, `visible`	visible exposes .zfs/snapshot to users
`special_small_blocks`	`0`–`1M`	route small blocks to special vdev

Dedup: the feature everyone asks about and almost nobody should use.

Dedup works. It does exactly what it promises — identical blocks are stored once. The problem is the cost. ZFS keeps a dedup table (DDT) in RAM. Every block on the pool needs an entry. The math is brutal:

# 1TB of data at 128K recordsize = ~8 million blocks
# Each DDT entry = ~320 bytes
# DDT for 1TB ≈ 2.5 GB of RAM
# DDT for 10TB ≈ 25 GB of RAM
# DDT for 100TB ≈ 250 GB of RAM

# If the DDT doesn't fit in RAM, every single write
# hits disk to check the table. Performance collapses.
# Your pool goes from fast to unusable overnight.

When dedup makes sense: VM storage where you run 50 identical Ubuntu VMs — the base OS blocks are 95% identical. Backup servers receiving similar systems. ISP hosting with identical container images.

When it doesn't: media files (unique by definition), databases (already compressed), home directories (low duplication), anything where you don't have 5x the expected DDT size in RAM.

The alternative: compression gives you 1.5-3x space savings with zero RAM overhead. zfs send | zfs recv deduplicates across snapshots for free. Block cloning (OpenZFS 2.2+) gives you file-level dedup without the DDT. Try everything else before turning on dedup.

Part 3: Snapshots

A snapshot doesn't copy anything. It costs zero space at creation. ZFS uses copy-on-write — when you take a snapshot, you're just telling ZFS "remember what the data looked like right now." No data moves. No blocks are duplicated. The snapshot only grows as the live dataset changes, because ZFS keeps the old blocks instead of overwriting them. This means you can take a snapshot every hour, every minute if you want — and it's free until the live data diverges. This is why "I forgot to back up before the upgrade" is not a valid excuse on ZFS.

Create snapshots

# Single dataset
zfs snapshot rpool/data@mysnap

# With timestamp
zfs snapshot rpool/data@$(date +%Y%m%d-%H%M%S)

# Recursive (all child datasets)
zfs snapshot -r rpool@full-backup-$(date +%Y%m%d)

# Multiple datasets
zfs snapshot rpool/home@backup rpool/srv@backup rpool/var/log@backup

List snapshots

# All snapshots
zfs list -t snapshot

# With size and creation date
zfs list -t snapshot -o name,used,refer,creation -S creation

# Snapshots for a specific dataset
zfs list -t snapshot -r rpool/home

# Count snapshots
zfs list -t snapshot -H | wc -l

# Space used by snapshots
zfs get usedbysnapshots rpool

Access snapshot data

# Browse snapshot contents (without rollback)
ls /home/.zfs/snapshot/
ls /home/.zfs/snapshot/mysnap/alice/documents/

# Make .zfs directory visible
zfs set snapdir=visible rpool/home

# Copy a file from a snapshot
cp /home/.zfs/snapshot/mysnap/alice/important.txt /home/alice/important.txt

Rollback

# Rollback to most recent snapshot
zfs rollback rpool/data@mysnap

# Rollback destroying intermediate snapshots
zfs rollback -r rpool/data@old-snapshot

# Rollback destroying intermediate snapshots AND clones
zfs rollback -rR rpool/data@old-snapshot

Destroy snapshots

# Single snapshot
zfs destroy rpool/data@mysnap

# Range of snapshots
zfs destroy rpool/data@snap1%snap5

# All snapshots matching a pattern
zfs list -t snapshot -H -o name | grep "auto-" | xargs -n1 zfs destroy

# Destroy recursively
zfs destroy -r rpool@full-backup-20260322

Part 4: Clones

A clone is an instant, zero-cost copy of a dataset at a point in time. Your production database is 500GB. You need a staging copy for testing. On ext4, that's cp -a — 500GB of I/O, 20 minutes, double the disk usage. On ZFS: zfs clone — instant, zero bytes used. The clone shares all blocks with the original and only stores the differences. Blow up the clone, destroy it, make another. This is how you get unlimited staging environments from one production dataset.

Create clones

# Snapshot first (required — clones come from snapshots)
zfs snapshot rpool/srv/production@clone-src

# Clone
zfs clone rpool/srv/production@clone-src rpool/srv/staging

# Clone starts at near-zero space
zfs list rpool/srv/staging   # USED will be ~0

Clone properties

# Clone inherits parent properties but can be changed
zfs set mountpoint=/srv/staging rpool/srv/staging
zfs set quota=50G rpool/srv/staging

# Check clone origin
zfs get origin rpool/srv/staging

Promote a clone

# Make the clone independent (no longer depends on origin snapshot)
zfs promote rpool/srv/staging

# Now the original depends on the clone's snapshot
# The clone becomes the "real" dataset

Destroy a clone

# Must destroy the clone before the origin snapshot
zfs destroy rpool/srv/staging
zfs destroy rpool/srv/production@clone-src

Part 5: Boot Environments

Your operating system is a dataset. Read that again. On ZFS, / is just another dataset — rpool/ROOT/default. That means your entire OS can be snapshotted, cloned, rolled back, and replicated with the same commands you use for data. Before a kernel upgrade: zfs snapshot rpool/ROOT/default@before-upgrade. Upgrade breaks something? zfs rollback. Or just boot the old snapshot from ZFSBootMenu. This is not a backup strategy — it's version control for your entire operating system. Every other Linux distro prays the upgrade works. ZFS makes it reversible.

How they work

Boot environments are ZFS datasets under rpool/ROOT/. ZFSBootMenu detects them and lets you choose which one to boot.

# Current boot environment
zpool get bootfs rpool

# List all boot environments
zfs list -r rpool/ROOT -o name,used,mountpoint,creation

# The active one has mountpoint=/
zfs get mountpoint rpool/ROOT/default

Create a boot environment

# Snapshot the current root
zfs snapshot rpool/ROOT/default@before-upgrade

# Clone it as a new BE
zfs clone rpool/ROOT/default@before-upgrade rpool/ROOT/safe-rollback

Switch boot environment

# Set which BE to boot next
zpool set bootfs=rpool/ROOT/safe-rollback rpool

# Reboot into it
reboot

# At the ZFSBootMenu screen, you can also select BEs interactively

Rollback a broken upgrade

# Option 1: from command line (if you can still boot)
zpool set bootfs=rpool/ROOT/default@before-upgrade rpool
reboot

# Option 2: from kldload live ISO
krecovery import rpool
krecovery list-be
krecovery activate rpool/ROOT/default@before-upgrade
reboot

Part 6: Replication

Local replication (to a backup disk)

# Create a backup pool on a second disk
zpool create backup /dev/sdb

# Full initial send
zfs snapshot -r rpool@backup-initial
zfs send -R rpool@backup-initial | zfs receive -F backup/rpool

# Incremental daily send
zfs snapshot -r rpool@backup-day2
zfs send -R -i rpool@backup-initial rpool@backup-day2 | zfs receive -F backup/rpool

# Verify
zfs list -r backup/rpool

Remote replication (over SSH)

# Full send to remote host
zfs snapshot -r rpool@replicate
zfs send -R rpool@replicate | ssh backup-server zfs receive -F tank/backup/rpool

# Incremental
zfs snapshot -r rpool@replicate-2
zfs send -R -i rpool@replicate rpool@replicate-2 | ssh backup-server zfs receive -F tank/backup/rpool

# Compressed transfer
zfs send -R rpool@replicate | zstd -3 | ssh backup-server "zstd -d | zfs receive -F tank/backup"

# With bandwidth limit (10MB/s)
zfs send -R rpool@replicate | pv -L 10m | ssh backup-server zfs receive -F tank/backup

Replication over WireGuard

This is the payoff of kernel modules at second zero. WireGuard is in the kernel — encrypted tunnel, no daemon. ZFS is in the kernel — replication primitive, no agent. Pipe one into the other: zfs send | ssh wg-peer zfs recv. Your data is replicated, encrypted, checksummed, and atomic — and the only userland involved is ssh as a transport. No backup software. No cloud sync service. No vendor. Two kernel modules talking to each other across an encrypted tunnel that doesn't exist in userland. This is what kldload was built for.

This is where kldload shines — two kldload nodes with WireGuard form a private encrypted channel. Replication traffic never touches the public internet.

# Setup: Node A (10.200.0.1) and Node B (10.200.0.2) connected via wg0

# On Node A: send to Node B over the WireGuard tunnel
zfs snapshot -r rpool@replicate
zfs send -R rpool@replicate | ssh 10.200.0.2 zfs receive -F rpool-backup

# Incremental replication (daily cron job)
zfs snapshot -r rpool@daily-$(date +%Y%m%d)
PREV=$(zfs list -t snapshot -H -o name -S creation | grep "rpool@daily-" | sed -n '2p')
zfs send -R -i "$PREV" rpool@daily-$(date +%Y%m%d) | \
  ssh 10.200.0.2 zfs receive -F rpool-backup

Automated replication with syncoid

# Install syncoid (part of sanoid, pre-installed on kldload free)
# syncoid handles incremental tracking automatically

# Replicate a dataset
syncoid rpool/srv/data backup-server:tank/backup/data

# Replicate recursively
syncoid -r rpool backup-server:tank/backup/rpool

# Replicate over WireGuard
syncoid -r rpool 10.200.0.2:rpool-backup

# Dry run (show what would be sent)
syncoid -r --no-sync-snap --dryrun rpool backup-server:tank/backup

# Cron job — every hour
echo '0 * * * * root syncoid -r rpool 10.200.0.2:rpool-backup' >> /etc/crontab

Replication patterns

Pattern 1: Push backup (A → B)

Node A pushes snapshots to Node B.

# On Node A (cron)
syncoid -r rpool nodeB:tank/backup

Pattern 2: Pull backup (B pulls from A)

Node B pulls snapshots from Node A. Better for security — backup server initiates.

# On Node B (cron)
syncoid -r nodeA:rpool tank/backup

Pattern 3: Bidirectional (A ↔︎ B)

Both nodes replicate to each other. Different datasets in each direction.

# On Node A
syncoid rpool/srv/app nodeB:rpool/srv/app-replica

# On Node B
syncoid rpool/srv/db nodeA:rpool/srv/db-replica

Pattern 4: Fan-out (A → B, C, D)

One source replicates to multiple targets.

# On Node A
for target in nodeB nodeC nodeD; do
  syncoid -r rpool/srv/data ${target}:tank/backup/data &
done
wait

Pattern 5: Chain (A → B → C)

A replicates to B, B replicates to C. Geographic distribution.

# On Node A
syncoid -r rpool nodeB:tank/replica

# On Node B
syncoid -r tank/replica nodeC:tank/offsite

Part 7: Two-Node Setup (Complete Example)

Build two kldload nodes, connect them with WireGuard, and replicate data between them.

Step 1: Install both nodes

Boot the kldload ISO on two machines. Install with Server profile.

Node A: hostname node-a, IP 10.100.10.10
Node B: hostname node-b, IP 10.100.10.20

Step 2: Set up WireGuard

On Node A:

umask 077
wg genkey | tee /etc/wireguard/private.key | wg pubkey > /etc/wireguard/public.key
cat /etc/wireguard/public.key   # copy this

On Node B:

umask 077
wg genkey | tee /etc/wireguard/private.key | wg pubkey > /etc/wireguard/public.key
cat /etc/wireguard/public.key   # copy this

Node A — /etc/wireguard/wg0.conf:

[Interface]
Address = 10.200.0.1/24
ListenPort = 51820
PrivateKey = <node-a-private-key>

[Peer]
PublicKey = <node-b-public-key>
AllowedIPs = 10.200.0.2/32
Endpoint = 10.100.10.20:51820
PersistentKeepalive = 25

Node B — /etc/wireguard/wg0.conf:

[Interface]
Address = 10.200.0.2/24
ListenPort = 51820
PrivateKey = <node-b-private-key>

[Peer]
PublicKey = <node-a-public-key>
AllowedIPs = 10.200.0.1/32
Endpoint = 10.100.10.10:51820
PersistentKeepalive = 25

Both nodes:

systemctl enable --now wg-quick@wg0
ping 10.200.0.2   # from Node A
ping 10.200.0.1   # from Node B

Step 3: Set up SSH keys

# On Node A
ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519
ssh-copy-id admin@10.200.0.2

# On Node B
ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519
ssh-copy-id admin@10.200.0.1

Step 4: Create application datasets

On Node A:

zfs create -o mountpoint=/srv/app rpool/srv/app
zfs create -o mountpoint=/srv/db -o recordsize=8k rpool/srv/db
echo "production data" > /srv/app/config.txt

Step 5: Initial replication

# On Node A — full send to Node B over WireGuard
zfs snapshot -r rpool/srv@initial
zfs send -R rpool/srv@initial | ssh 10.200.0.2 zfs receive -F rpool/srv-replica

Verify on Node B:

zfs list -r rpool/srv-replica
cat /srv-replica/app/config.txt   # should show "production data"

Step 6: Incremental replication

# On Node A — make changes
echo "updated config" > /srv/app/config.txt
echo "new data" > /srv/db/records.csv

# Snapshot and send incremental
zfs snapshot -r rpool/srv@update1
zfs send -R -i rpool/srv@initial rpool/srv@update1 | \
  ssh 10.200.0.2 zfs receive -F rpool/srv-replica

Verify on Node B:

cat /srv-replica/app/config.txt   # should show "updated config"

Step 7: Automate with syncoid

# On Node A — set up hourly replication
cat > /etc/cron.d/zfs-replicate << 'EOF'
0 * * * * root syncoid -r rpool/srv 10.200.0.2:rpool/srv-replica 2>&1 | logger -t zfs-replicate
EOF

Step 8: Failover

If Node A dies, Node B has the replica:

# On Node B
zfs set mountpoint=/srv/app rpool/srv-replica/app
zfs set mountpoint=/srv/db rpool/srv-replica/db

# Node B is now serving production data
# When Node A recovers, reverse the replication direction

Part 8: Monitoring

# Pool health (add to monitoring)
zpool status -x   # only shows pools with problems

# Space usage
zfs list -o name,used,avail,refer,compressratio

# Snapshot space
zfs get usedbysnapshots rpool

# ARC stats
arc_summary   # if available
cat /proc/spl/kstat/zfs/arcstats | grep -E "^hits|^misses|^size|^c_max"

# ARC hit rate calculation
awk '/^hits/{h=$3} /^misses/{m=$3} END{printf "ARC hit rate: %.1f%%\n", h/(h+m)*100}' /proc/spl/kstat/zfs/arcstats

# I/O latency (with eBPF)
zfsslower 1        # operations slower than 1ms
biolatency         # block device latency histogram

# Prometheus node_exporter ZFS metrics
curl -s localhost:9100/metrics | grep zfs

Quick Reference

I want to…	Command
Create a pool	`zpool create -o ashift=12 -O compression=lz4 rpool mirror /dev/sda /dev/sdb`
Create a dataset	`zfs create -o mountpoint=/srv/app rpool/srv/app`
Snapshot everything	`zfs snapshot -r rpool@$(date +%Y%m%d-%H%M%S)`
List snapshots	`zfs list -t snapshot -o name,used,creation -S creation`
Rollback	`zfs rollback rpool/srv/app@before-change`
Clone	`zfs snapshot rpool/x@src && zfs clone rpool/x@src rpool/x-clone`
Replicate to remote	`zfs send -R rpool@snap \\| ssh remote zfs receive -F tank/backup`
Incremental replicate	`zfs send -R -i @snap1 rpool@snap2 \\| ssh remote zfs receive -F tank/backup`
Automated replication	`syncoid -r rpool remote:tank/backup`
Pool health	`zpool status rpool`
Scrub	`zpool scrub rpool`
Check compression	`zfs get compressratio rpool`
Boot environment	`zfs snapshot rpool/ROOT/default@safe && zpool set bootfs=rpool/ROOT/default rpool`

← The answers file NFS — File-level sharing →