Snapshots & Replication — the killer feature.
ZFS snapshots are instantaneous, read-only, point-in-time copies of a dataset. They cost
zero space at creation and grow only as the live data diverges. Combined with
zfs send/recv, snapshots become the foundation for block-level incremental
replication that is faster, more reliable, and more space-efficient than any file-level
backup tool. This is the single feature that makes ZFS worth the complexity.
Snapshots are NOT backups.
Snapshots protect against accidental deletion and logical corruption.
They do NOT protect against hardware failure, pool corruption, or site-wide disasters.
If the pool fails, all snapshots are lost with it.
Always use zfs send/recv to replicate snapshots to a separate system.
A snapshot on the same pool is an undo button, not disaster recovery.
How snapshots work — copy-on-write
ZFS uses a copy-on-write (COW) transactional model. When you write new data, ZFS never overwrites existing blocks. It writes new blocks to free space, then atomically updates the block pointer tree to reference the new location. The old blocks remain on disk, unreferenced by the live filesystem — but still referenced by any snapshot that existed at the time.
This is why snapshots are instant: creating a snapshot simply freezes the current block pointer tree. No data is copied. No I/O occurs. The cost is a single metadata transaction. Snapshots only consume space when the live filesystem overwrites or deletes data that the snapshot still references — those old blocks cannot be freed until the snapshot is destroyed.
Creating snapshots
Snapshot names follow the format dataset@snapname. The name after @
is arbitrary, but a consistent naming convention saves you when you have thousands.
Use timestamps, purpose labels, or both.
# Create a single snapshot
zfs snapshot rpool/srv/data@before-upgrade
# Snapshot with a timestamp name
zfs snapshot rpool/srv/data@$(date +%Y-%m-%d_%H%M%S)
# Recursive snapshot — snapshots every child dataset too
zfs snapshot -r rpool/home@nightly-2026-04-04
# Multiple datasets in one atomic operation
zfs snapshot rpool/srv/db@pre-migration rpool/srv/app@pre-migration
-r flag is your friend for consistent backups.
If you snapshot rpool/home without -r, child datasets like
rpool/home/todd are not included. You'll discover this the hard way when you
try to restore and half your data is missing. Always use -r for trees.
Listing and inspecting snapshots
# List all snapshots, sorted by creation time
zfs list -t snapshot -o name,creation,used,refer -s creation
# List snapshots for a specific dataset
zfs list -t snapshot -r rpool/srv/data
# List snapshots with space accounting details
zfs list -t snapshot -o name,used,written,refer -r rpool/srv/data
# Count snapshots per dataset
zfs list -t snapshot -o name | awk -F@ '{print $1}' | sort | uniq -c | sort -rn
# Show only snapshots consuming more than 1GB
zfs list -t snapshot -o name,used -s used -r rpool | awk '$2 ~ /[0-9].*[GT]/'
The .zfs/snapshot/ hidden directory at the root of every dataset lets you
browse snapshot contents without rolling back. This is read-only access —
you can copy files out, diff against current, or let users self-recover deleted files.
# Browse a snapshot (read-only, no rollback)
ls /srv/data/.zfs/snapshot/before-upgrade/
# Recover a single file from a snapshot
cp /home/todd/.zfs/snapshot/nightly-2026-04-04/important.doc /home/todd/
# Diff between a snapshot and live data
diff -r /srv/data/.zfs/snapshot/before-upgrade/config/ /srv/data/config/
.zfs directory doesn't show up in ls -a
by default. It's a virtual directory that ZFS injects. You have to access it directly:
ls /data/.zfs/snapshot/. If you want it visible in directory listings, set
zfs set snapdir=visible rpool/data. Most people leave it hidden so automated
tools (rsync, find, backup agents) don't accidentally traverse every snapshot.
Snapshot space accounting — USED vs REFER vs WRITTEN
Snapshot space accounting confuses everyone. The three properties you need to understand:
# Show all three properties
zfs list -t snapshot -o name,used,refer,written -r rpool/srv/data
# Example output:
# NAME USED REFER WRITTEN
# rpool/srv/data@monday 12G 450G -
# rpool/srv/data@tuesday 1.2G 452G -
# rpool/srv/data@wednesday 0B 455G 3.1G
#
# monday: 12G of blocks are unique to this snapshot (freeable)
# tuesday: 1.2G unique — most blocks shared with monday or wednesday
# wednesday: 0B unique — live dataset still references same blocks
# wednesday WRITTEN 3.1G — dataset changed 3.1G since this snapshot
Comparing snapshots with zfs diff
zfs diff shows file-level changes between two snapshots, or between a snapshot
and the live dataset. The output uses single-character prefixes:
+ (created), - (removed), M (modified),
R (renamed).
# Changes between two snapshots
zfs diff rpool/srv/data@monday rpool/srv/data@tuesday
# Changes from a snapshot to the live dataset
zfs diff rpool/srv/data@before-upgrade
# Example output:
# M /srv/data/config/app.conf
# + /srv/data/config/new-feature.conf
# - /srv/data/tmp/old-cache.db
# R /srv/data/logs/app.log -> /srv/data/logs/app.log.1
This is invaluable for forensics. After an incident, zfs diff tells you exactly
what changed — which files were modified, deleted, or created. No audit daemon required.
The information comes directly from the block pointer tree.
Destroying snapshots
# Destroy a single snapshot
zfs destroy rpool/srv/data@before-upgrade
# Destroy all snapshots matching a pattern (dry run first)
zfs destroy -nv rpool/srv/data@autosnap_%
zfs destroy rpool/srv/data@autosnap_%
# Destroy a range of snapshots (inclusive)
zfs destroy rpool/srv/data@monday%wednesday
# Recursive destroy — all datasets in the tree
zfs destroy -r rpool/home@old-snap
# Deferred destroy — mark for deletion, freed when no longer referenced
zfs destroy -d rpool/srv/data@held-snap
The % range syntax destroys all snapshots between two names (alphabetically).
The -n (dry run) and -v (verbose) flags are essential —
always preview before bulk-destroying snapshots. You cannot undo a zfs destroy.
zfs list -t snapshot -o name,used -s used to find the
actual space hogs before destroying anything.
Snapshot holds
A hold prevents a snapshot from being destroyed. This is critical for
replication workflows: you don't want a retention policy pruning a snapshot that's still
needed as the incremental base for the next zfs send.
# Place a hold on a snapshot
zfs hold keep rpool/srv/data@important-snap
# List holds
zfs holds rpool/srv/data@important-snap
# Attempt to destroy a held snapshot — will fail
zfs destroy rpool/srv/data@important-snap
# cannot destroy 'rpool/srv/data@important-snap': dataset is busy
# Release the hold, then destroy
zfs release keep rpool/srv/data@important-snap
zfs destroy rpool/srv/data@important-snap
# Recursive hold on all datasets in a tree
zfs hold -r replication rpool/home@nightly-2026-04-04
Syncoid and zrepl manage holds automatically. If you're building custom replication scripts, always hold the base snapshot before sending, and release it only after confirming the receive succeeded.
Rollback — rewinding the filesystem
zfs rollback reverts a dataset to the exact state of a snapshot.
All data written after that snapshot is permanently destroyed.
There is no undo for a rollback.
# Rollback to the most recent snapshot (safe — no intermediate snapshots)
zfs rollback rpool/srv/data@before-upgrade
# Rollback to an older snapshot — requires -r to destroy intermediates
zfs rollback -r rpool/srv/data@monday
# WARNING: this destroys all snapshots between @monday and now
# Rollback including clones of intermediate snapshots — nuclear option
zfs rollback -rR rpool/srv/data@last-known-good
Rollback with -r is destructive.
Without -r, ZFS only allows rollback to the most recent snapshot.
If you need to go further back, -r destroys every snapshot between the target
and now. If any of those snapshots have clones, you need -rR which also destroys
the clones. Always snapshot the current state before rolling back so you have
a way forward if the rollback was wrong.
# Safe rollback pattern: snapshot current state first
zfs snapshot rpool/srv/data@before-rollback-$(date +%s)
zfs rollback -r rpool/srv/data@known-good
zfs rollback.
It's a blunt instrument. Instead, I clone the snapshot, test the old state, and if it's
what I need, I promote the clone. Or I just copy files out of .zfs/snapshot/.
Rollback destroys data. Clones and copies don't. Use rollback only when you're certain
everything after the snapshot is garbage.
Clones & promotion
A clone is a writable copy of a snapshot. Like a snapshot, it shares all blocks with the original — a clone of a 500GB snapshot uses near-zero extra space until you start writing. Clones are full datasets: they can be mounted, snapshotted, and served just like any other ZFS dataset.
# Clone a snapshot into a new dataset
zfs clone rpool/srv/data@before-upgrade rpool/srv/data-test
# The clone is writable and mountable immediately
ls /srv/data-test/
echo "test change" > /srv/data-test/canary.txt
# Clone has a dependency: you cannot destroy the origin snapshot
zfs destroy rpool/srv/data@before-upgrade
# cannot destroy: snapshot has dependent clones
# Promote the clone — it becomes the independent dataset
zfs promote rpool/srv/data-test
# Now rpool/srv/data depends on rpool/srv/data-test, not the reverse
Promotion reverses the parent-child relationship. After promotion, the clone becomes the independent dataset and the original becomes the dependent. This is how you "branch" a filesystem: clone, test changes, promote if they work. The original can then be destroyed if no longer needed.
zfs send / receive — block-level replication
zfs send serializes a snapshot (or the delta between two snapshots) into a
byte stream. zfs receive consumes that stream and reconstructs the dataset.
This operates at the block level — it doesn't traverse the directory
tree or open files. It's faster and more reliable than any file-level tool (rsync, tar, cp).
Full send
# Full send to a local pool
zfs send rpool/srv/data@baseline | zfs recv backup/srv/data
# Full send to a remote machine over SSH
zfs send rpool/srv/data@baseline | ssh backup-host "zfs recv tank/srv/data"
# Recursive send — includes all child datasets and their snapshots
zfs send -R rpool/srv@baseline | ssh backup-host "zfs recv -F tank/srv"
# With progress reporting via pv
zfs send -R rpool/srv@baseline | pv -rtab | ssh backup-host "zfs recv -F tank/srv"
Incremental send
Incremental sends transmit only the blocks that changed between two snapshots. This is the core of efficient replication — the first send is large (full dataset), but every subsequent send is just the delta.
# Incremental send: only blocks changed between monday and tuesday
zfs send -i rpool/srv/data@monday rpool/srv/data@tuesday | \
ssh backup-host "zfs recv tank/srv/data"
# Incremental with -I: includes all intermediate snapshots
zfs send -I rpool/srv/data@monday rpool/srv/data@friday | \
ssh backup-host "zfs recv tank/srv/data"
# Recursive incremental
zfs send -R -i rpool/srv@monday rpool/srv@tuesday | \
ssh backup-host "zfs recv -F tank/srv"
-F on the receive side to force overwrite.recordsize is set above 128K (e.g., 1M for sequential workloads).-i vs -I distinction matters more
than you'd think. If you use -i (lowercase) and have taken multiple snapshots since
the last sync, you'll send only the final delta but the intermediate snapshots won't exist on
the receiver. This means you can't use those intermediates as a base for future incrementals.
-I (uppercase) sends all intermediates and is almost always what you want.
The common flags combo for production replication: zfs send -R -w -c -L.
Resumable send / receive
Large sends over unreliable networks (WAN, satellite, VPN) can fail midway. OpenZFS 0.7+ supports resumable send via tokens. If a receive is interrupted, ZFS records how far it got. You can resume from that point instead of starting over.
# Start a receive with -s to enable resume tokens
zfs send -R rpool/srv@snap | ssh remote "zfs recv -s -F tank/srv"
# If interrupted, check for a resume token on the receiver
ssh remote "zfs get receive_resume_token tank/srv"
# Resume the send using the token
token=$(ssh remote "zfs get -H -o value receive_resume_token tank/srv")
zfs send -t "$token" | ssh remote "zfs recv -s -F tank/srv"
# Abort a partial receive and discard the token
ssh remote "zfs recv -A tank/srv"
-w, etc.)
when resuming — you must use the same flags as the original send, or abort and restart.
Encrypted replication
The -w (raw) flag sends encrypted datasets as ciphertext. The receiving side
stores the encrypted blocks without ever seeing the plaintext. This enables replication to
untrusted backup servers, cloud storage, or off-site hosts where you don't control physical
security.
# Source has encrypted dataset
zfs get encryption rpool/srv/secrets
# NAME PROPERTY VALUE SOURCE
# rpool/srv/secrets encryption aes-256-gcm -
# Raw send — ciphertext only
zfs send -w rpool/srv/secrets@snap | ssh untrusted "zfs recv tank/secrets"
# The receiver has the data but cannot read it
ssh untrusted "zfs mount tank/secrets"
# cannot mount: encryption key not loaded
# Incremental raw send
zfs send -w -i rpool/srv/secrets@snap1 rpool/srv/secrets@snap2 | \
ssh untrusted "zfs recv tank/secrets"
Raw send works with both incremental and full sends. The receiver sees encrypted blocks,
compressed blocks (if -c is also used), and nothing else. Properties that
reveal data structure (like used) are still visible, but actual file contents
are opaque.
Bookmarks
A bookmark is a lightweight reference to a snapshot's transaction group (TXG) that persists after the snapshot is destroyed. Bookmarks take zero space. Their purpose: serve as the base for incremental sends even after the source snapshot has been pruned.
# Create a bookmark from a snapshot
zfs bookmark rpool/srv/data@monday rpool/srv/data#monday
# List bookmarks
zfs list -t bookmark -r rpool/srv/data
# Now you can destroy the snapshot — the bookmark remains
zfs destroy rpool/srv/data@monday
# Incremental send using the bookmark as the base
zfs send -i rpool/srv/data#monday rpool/srv/data@tuesday | \
ssh backup "zfs recv tank/srv/data"
# Destroy a bookmark
zfs destroy rpool/srv/data#monday
The workflow: take a snapshot, replicate it, create a bookmark, destroy the local snapshot to free space, keep the bookmark as the incremental base. The remote still has the full snapshot. Next time, send incrementally from the bookmark to the new snapshot. This is how you keep the source system lean while maintaining an unbroken replication chain.
Performance tuning for send / receive
# Use mbuffer to smooth I/O and add progress reporting
zfs send -R rpool/srv@snap | mbuffer -s 128k -m 1G | \
ssh backup "mbuffer -s 128k -m 1G | zfs recv -F tank/srv"
# Use pv for simple progress and throughput display
zfs send -R rpool/srv@snap | pv -rtab | ssh backup "zfs recv -F tank/srv"
# Compress the stream in transit (when source data is uncompressed)
zfs send rpool/srv@snap | lz4 | ssh backup "lz4 -d | zfs recv tank/srv"
# Use pigz for multi-threaded compression
zfs send rpool/srv@snap | pigz -3 | ssh backup "pigz -d | zfs recv tank/srv"
# Limit bandwidth to avoid saturating the link
zfs send rpool/srv@snap | pv -L 50M | ssh backup "zfs recv tank/srv"
| Scenario | Recommended pipeline | Notes |
|---|---|---|
| LAN (1–10 Gbps) | zfs send -c -L | ssh | zfs recv |
Use -c to skip recompression. SSH is the bottleneck; consider ssh -c aes128-gcm@openssh.com for faster cipher. |
| WAN (slow link) | zfs send -c | lz4 | ssh -s | mbuffer | zfs recv |
Compress in transit. Use mbuffer on both ends. Enable resume tokens (-s). |
| Initial seed (very large) | Physical transport (disk ship) | zfs send -R > /mnt/transport/seed.zfs — send to a portable drive, ship it, zfs recv on the other end. Resume with incremental once online. |
| Encrypted to untrusted | zfs send -w -c -L | ssh | zfs recv |
Raw send. Data stays encrypted in transit and at rest on the receiver. |
aes128-gcm@openssh.com or chacha20-poly1305
for 2–3x throughput. On a 10G LAN, SSH itself becomes the bottleneck before the disks do.
If you're replicating locally between pools on the same machine, skip SSH entirely:
zfs send | zfs recv.
Automation: Sanoid & Syncoid
Manual snapshots and replication are fine for one-off operations, but production systems need automated retention. Sanoid manages snapshot creation and pruning. Syncoid manages replication. Together they replace complex cron scripts with a declarative config.
Sanoid — automated snapshot management
Defines snapshot policies per dataset via a simple INI config. Automatically creates and prunes snapshots based on retention rules. Runs via cron or systemd timer.
# /etc/sanoid/sanoid.conf
[rpool/home]
use_template = production
recursive = yes
[rpool/srv/data]
use_template = production
[rpool/var/log]
use_template = short-retention
autosnap = yes
autoprune = yes
[template_production]
autosnap = yes
autoprune = yes
hourly = 48
daily = 30
weekly = 8
monthly = 12
yearly = 2
[template_short-retention]
hourly = 24
daily = 7
weekly = 0
monthly = 0
yearly = 0
# Run sanoid manually (usually runs via cron every 15 minutes)
sanoid --cron
# Dry run — show what would be created/pruned
sanoid --cron --verbose
# Monitor sanoid status
sanoid --monitor-snapshots --monitor-health
Syncoid — automated replication
Wraps zfs send/recv for secure, incremental replication over SSH. Automatically
determines the common snapshot, sends the incremental delta, and handles resume tokens.
One command replaces pages of shell scripting.
# Replicate a single dataset
syncoid rpool/srv/data backup-host:tank/srv/data
# Recursive replication of an entire tree
syncoid --recursive rpool/home backup-host:tank/home
# With compressed send and no-sync-snap (use existing sanoid snapshots)
syncoid --recursive --no-sync-snap --sendoptions="-w -c -L" \
rpool/srv backup-host:tank/srv
# Exclude specific datasets
syncoid --recursive --exclude="rpool/tmp" --exclude="rpool/cache" \
rpool backup-host:tank
zrepl — daemon-based replication
For complex setups: bi-directional sync, resume tokens, network drop resilience, many-to-many replication. YAML config. Runs as a daemon with built-in monitoring endpoints. Ideal for managing many ZFS hosts at scale.
# /etc/zrepl/zrepl.yml (simplified push job)
jobs:
- name: "push-to-backup"
type: push
connect:
type: ssh+stdinserver
host: backup-host
user: root
identity_file: /root/.ssh/zrepl_key
filesystems:
"rpool/srv<": true
"rpool/tmp": false
snapshotting:
type: periodic
interval: 15m
prefix: zrepl_
pruning:
keep_sender:
- type: not_replicated
- type: last_n
count: 10
keep_receiver:
- type: grid
grid: 1x1h(keep=all) | 24x1h | 30x1d | 12x30d
Boot environments
A boot environment is a snapshot + clone of the root filesystem that you can boot into. Before a kernel upgrade, OS update, or risky configuration change, create a boot environment. If the update breaks the system, reboot into the previous environment. This is the ZFS equivalent of VM snapshots, but for bare metal.
# Create a boot environment before a major update
zfs snapshot rpool/ROOT/centos@pre-kernel-update
zfs clone rpool/ROOT/centos@pre-kernel-update rpool/ROOT/centos-rollback
# If the update breaks things, set the bootfs property and reboot
zpool set bootfs=rpool/ROOT/centos-rollback rpool
reboot
# On Debian/Ubuntu with zsys or on systems with beadm/zectl:
zectl create pre-upgrade
dnf update -y
# If broken:
zectl activate pre-upgrade
reboot
# List boot environments
zectl list
kldload configures ZFS-on-root with a rpool/ROOT/<distro> dataset structure
specifically to enable boot environments. The bootloader (systemd-boot or GRUB) reads the
bootfs pool property to determine which dataset to mount as /.
See the Boot Chain page
for full details.
Real-world scenarios
Ransomware recovery
Ransomware encrypts your files. With hourly ZFS snapshots, you roll back to the last clean snapshot and lose at most one hour of work. The ransomware cannot encrypt snapshots because snapshots are read-only at the kernel level — no userspace process can modify them.
# Find the last clean snapshot (check timestamps vs. infection time)
zfs list -t snapshot -o name,creation -r rpool/srv/data | grep "2026-04-04"
# Verify it's clean
ls /srv/data/.zfs/snapshot/autosnap_2026-04-04_09:00:00_hourly/
# Roll back
zfs rollback -r rpool/srv/data@autosnap_2026-04-04_09:00:00_hourly
Database-consistent snapshots
ZFS snapshots are crash-consistent (equivalent to pulling the power plug). For true application consistency, freeze the database before snapshotting.
# PostgreSQL: checkpoint + snapshot
psql -c "CHECKPOINT;"
zfs snapshot rpool/srv/pgdata@consistent-$(date +%s)
# MySQL/MariaDB: flush + lock + snapshot + unlock
mysql -e "FLUSH TABLES WITH READ LOCK;"
zfs snapshot rpool/srv/mysql@consistent-$(date +%s)
mysql -e "UNLOCK TABLES;"
# For any filesystem: fsfreeze (blocks all I/O during snapshot)
fsfreeze --freeze /srv/data
zfs snapshot rpool/srv/data@frozen-$(date +%s)
fsfreeze --unfreeze /srv/data
# Script pattern for automated consistent snapshots
#!/bin/bash
psql -c "SELECT pg_start_backup('zfs-snap', true);" 2>/dev/null
zfs snapshot -r rpool/srv/pgdata@backup-$(date +%Y%m%d-%H%M%S)
psql -c "SELECT pg_stop_backup();" 2>/dev/null
Dev/test branching with clones
Clone production data for development without doubling storage. Each developer gets a writable copy that shares blocks with the original.
# Snapshot production
zfs snapshot rpool/srv/app@dev-branch
# Create per-developer clones
zfs clone rpool/srv/app@dev-branch rpool/dev/alice
zfs clone rpool/srv/app@dev-branch rpool/dev/bob
zfs clone rpool/srv/app@dev-branch rpool/dev/carol
# Each clone starts identical, diverges independently
# Total extra space: only the sum of changes across all clones
# When done, destroy dev clones
zfs destroy rpool/dev/alice
zfs destroy rpool/dev/bob
zfs destroy rpool/dev/carol
zfs destroy rpool/srv/app@dev-branch
Migration via send / receive
Moving a dataset to a new server, new pool, or new datacenter. Send/receive preserves everything: data, snapshots, properties, permissions, ACLs, xattrs.
# Full migration to a new server
zfs snapshot -r rpool/srv@migrate
zfs send -R -w -c -L rpool/srv@migrate | \
ssh new-server "zfs recv -F tank/srv"
# Incremental catch-up (run until cutover)
zfs snapshot -r rpool/srv@migrate-final
zfs send -R -I rpool/srv@migrate rpool/srv@migrate-final | \
ssh new-server "zfs recv -F tank/srv"
# Physical transport for initial seed (sneakernet)
zfs send -R rpool/srv@migrate > /mnt/usb/srv-seed.zfs
# Ship the drive, then on the new server:
zfs recv -F tank/srv < /mnt/usb/srv-seed.zfs
# Then incremental sync over the network to catch up
Disaster recovery
Full-site DR with automated replication to a remote datacenter.
# Cron job: replicate every 15 minutes to DR site
*/15 * * * * syncoid --recursive --no-sync-snap \
--sendoptions="-w -c -L" rpool/srv dr-host:tank/srv
# On DR failover: import the pool and adjust mountpoints
zpool import tank
zfs set mountpoint=/srv tank/srv
# Service is back online with at most 15 minutes of data loss (RPO=15m)
# Test DR regularly: clone on the DR side, boot a test VM from it
ssh dr-host "zfs clone tank/srv/app@latest tank/dr-test/app"
Snapshots vs backups — understanding the difference
| Property | Local snapshot | Replicated snapshot (send/recv) | Traditional backup (rsync, tar, Veeam) |
|---|---|---|---|
| Protects against | Accidental deletion, logical corruption, user error | All of the above + hardware failure, site disaster | All of the above (if off-site) |
| Does NOT protect against | Pool loss, disk failure, site disaster, root compromise | Simultaneous compromise of both sites | Depends on backup integrity testing |
| Recovery speed | Instant (rollback or clone) | Minutes to hours (recv or import) | Hours to days (restore over network) |
| Space efficiency | Excellent (COW, only stores deltas) | Excellent (incremental sends) | Poor to moderate (full copies or dedup overhead) |
| Granularity | Block-level, any snapshot frequency | Block-level, limited by replication schedule | File-level, limited by backup window |
| Integrity verification | Automatic (ZFS checksums every block) | Automatic (checksums verified on receive) | Manual (must run restore tests) |
The correct strategy is both: local snapshots for instant recovery from user errors, plus replicated snapshots to a separate system for disaster recovery. If you only have local snapshots, you have undo, not backup. If you only have off-site replication, recovery from user error requires a network round-trip instead of being instant.
Common pitfalls
Snapshot without autoprune = full pool
Automated snapshots without automated pruning will fill your pool. Every snapshot retains old blocks.
Over months, the cumulative USED grows until the pool hits 80%+ and performance craters. Always pair
autosnap with autoprune in Sanoid.
Destroying snapshots in wrong order
If snapshot B depends on snapshot A as the incremental base for replication, destroying A breaks the replication chain. Use holds to protect replication bases. Syncoid and zrepl manage this automatically.
Recursive rollback surprises
zfs rollback -r destroys all intermediate snapshots. If those snapshots have clones
(dev/test branches, boot environments), you need -rR which also destroys the clones.
Always snapshot current state before rolling back.
Forgetting -r on recursive operations
Snapshotting rpool/home without -r does not snapshot child datasets.
You'll discover this during restore when rpool/home/user has no snapshots.
Similarly, zfs send without -R skips child datasets.
Pool 90%+ full with many snapshots
When a pool is nearly full, ZFS performance degrades severely and you may not be able to
destroy snapshots (destroying requires free space for metadata updates). Prevention: set
zfs set reservation=10G rpool on the pool to guarantee free space for maintenance operations.
Sending to a dataset that's actively mounted and modified
Using zfs recv -F on a dataset that processes are actively writing to causes conflicts.
The receive overwrites the dataset. Use a dedicated receive dataset that nothing else touches, then
clone or rename when ready.
Assuming REFER = cost
A snapshot with REFER=500GB and USED=2GB costs 2GB, not 500GB. REFER is what the snapshot sees. USED is what it uniquely holds. Destroying it frees USED, not REFER.
Not testing restores
A replication job that's been running for two years has never been tested unless you've actually done a restore. Clone a recent snapshot on the DR host, mount it, verify the data. Do this quarterly. An untested backup is not a backup.
kldload snapshot defaults
kldload's desktop and server profiles install Sanoid automatically with the following default policy. The core profile does not install Sanoid (stock distro, no k* tooling).
rpool/home and rpool/srv are snapshotted
To customize, edit /etc/sanoid/sanoid.conf on the installed system.
To add replication, add a Syncoid cron job pointing to your backup host. The kldload
web UI can configure this during install if you provide a backup host target.
Quick reference
| Operation | Command |
|---|---|
| Create snapshot | zfs snapshot pool/data@name |
| Recursive snapshot | zfs snapshot -r pool/data@name |
| List snapshots | zfs list -t snapshot -o name,used,refer -s creation |
| Browse snapshot | ls /data/.zfs/snapshot/name/ |
| Rollback | zfs rollback pool/data@name |
| Destroy snapshot | zfs destroy pool/data@name |
| Destroy range | zfs destroy pool/data@first%last |
| Hold snapshot | zfs hold tag pool/data@name |
| Release hold | zfs release tag pool/data@name |
| Clone snapshot | zfs clone pool/data@name pool/clone |
| Promote clone | zfs promote pool/clone |
| Create bookmark | zfs bookmark pool/data@name pool/data#name |
| Full send | zfs send pool/data@name | zfs recv dest/data |
| Incremental send | zfs send -i pool/data@old pool/data@new | zfs recv dest/data |
| Recursive send | zfs send -R pool@snap | zfs recv -F dest |
| Encrypted send | zfs send -w pool/data@name | zfs recv dest/data |
| Resume interrupted | zfs send -t TOKEN | zfs recv -s dest/data |
| Diff snapshots | zfs diff pool/data@old pool/data@new |
| Diff vs live | zfs diff pool/data@snap |