Snapshots & Replication

ZFS Wiki

Snapshots & Replication — the killer feature.

ZFS snapshots are instantaneous, read-only, point-in-time copies of a dataset. They cost zero space at creation and grow only as the live data diverges. Combined with zfs send/recv, snapshots become the foundation for block-level incremental replication that is faster, more reliable, and more space-efficient than any file-level backup tool. This is the single feature that makes ZFS worth the complexity.

Snapshots are NOT backups.

Snapshots protect against accidental deletion and logical corruption. They do NOT protect against hardware failure, pool corruption, or site-wide disasters. If the pool fails, all snapshots are lost with it. Always use zfs send/recv to replicate snapshots to a separate system. A snapshot on the same pool is an undo button, not disaster recovery.

How snapshots work — copy-on-write

ZFS uses a copy-on-write (COW) transactional model. When you write new data, ZFS never overwrites existing blocks. It writes new blocks to free space, then atomically updates the block pointer tree to reference the new location. The old blocks remain on disk, unreferenced by the live filesystem — but still referenced by any snapshot that existed at the time.

This is why snapshots are instant: creating a snapshot simply freezes the current block pointer tree. No data is copied. No I/O occurs. The cost is a single metadata transaction. Snapshots only consume space when the live filesystem overwrites or deletes data that the snapshot still references — those old blocks cannot be freed until the snapshot is destroyed.

People hear "instant" and "zero space" and think it's magic. It's not — it's just that ZFS was designed from day one to never overwrite in place. Every write is already a copy. A snapshot just says "don't free the old copy." The elegance is in the block pointer tree, not in some special snapshot mechanism.

Creating snapshots

Snapshot names follow the format dataset@snapname. The name after @ is arbitrary, but a consistent naming convention saves you when you have thousands. Use timestamps, purpose labels, or both.

# Create a single snapshot
zfs snapshot rpool/srv/data@before-upgrade

# Snapshot with a timestamp name
zfs snapshot rpool/srv/data@$(date +%Y-%m-%d_%H%M%S)

# Recursive snapshot — snapshots every child dataset too
zfs snapshot -r rpool/home@nightly-2026-04-04

# Multiple datasets in one atomic operation
zfs snapshot rpool/srv/db@pre-migration rpool/srv/app@pre-migration

The -r flag is your friend for consistent backups. If you snapshot rpool/home without -r, child datasets like rpool/home/todd are not included. You'll discover this the hard way when you try to restore and half your data is missing. Always use -r for trees.

Listing and inspecting snapshots

# List all snapshots, sorted by creation time
zfs list -t snapshot -o name,creation,used,refer -s creation

# List snapshots for a specific dataset
zfs list -t snapshot -r rpool/srv/data

# List snapshots with space accounting details
zfs list -t snapshot -o name,used,written,refer -r rpool/srv/data

# Count snapshots per dataset
zfs list -t snapshot -o name | awk -F@ '{print $1}' | sort | uniq -c | sort -rn

# Show only snapshots consuming more than 1GB
zfs list -t snapshot -o name,used -s used -r rpool | awk '$2 ~ /[0-9].*[GT]/'

The .zfs/snapshot/ hidden directory at the root of every dataset lets you browse snapshot contents without rolling back. This is read-only access — you can copy files out, diff against current, or let users self-recover deleted files.

# Browse a snapshot (read-only, no rollback)
ls /srv/data/.zfs/snapshot/before-upgrade/

# Recover a single file from a snapshot
cp /home/todd/.zfs/snapshot/nightly-2026-04-04/important.doc /home/todd/

# Diff between a snapshot and live data
diff -r /srv/data/.zfs/snapshot/before-upgrade/config/ /srv/data/config/

The .zfs directory doesn't show up in ls -a by default. It's a virtual directory that ZFS injects. You have to access it directly: ls /data/.zfs/snapshot/. If you want it visible in directory listings, set zfs set snapdir=visible rpool/data. Most people leave it hidden so automated tools (rsync, find, backup agents) don't accidentally traverse every snapshot.

Snapshot space accounting — USED vs REFER vs WRITTEN

Snapshot space accounting confuses everyone. The three properties you need to understand:

USED

Space unique to this snapshot — blocks referenced only by this snapshot and no other snapshot or the live dataset. Destroying this snapshot frees exactly this much space. Often reported as 0B for recent snapshots because the live dataset still references the same blocks.

REFER

Total data the snapshot can see — the amount of data reachable from this snapshot's block pointer tree. This is the size of the dataset at the time the snapshot was taken. Does NOT indicate how much space the snapshot consumes uniquely.

WRITTEN

Data written to the dataset since this snapshot — how much the live filesystem has diverged. Only meaningful on the most recent snapshot. Useful for estimating incremental send size.

# Show all three properties
zfs list -t snapshot -o name,used,refer,written -r rpool/srv/data

# Example output:
# NAME                              USED  REFER  WRITTEN
# rpool/srv/data@monday              12G   450G       -
# rpool/srv/data@tuesday            1.2G   452G       -
# rpool/srv/data@wednesday            0B   455G    3.1G
#
# monday: 12G of blocks are unique to this snapshot (freeable)
# tuesday: 1.2G unique — most blocks shared with monday or wednesday
# wednesday: 0B unique — live dataset still references same blocks
# wednesday WRITTEN 3.1G — dataset changed 3.1G since this snapshot

The most common mistake: someone sees a snapshot with REFER=500GB and panics thinking it's using 500GB of disk. It's not. REFER is what the snapshot sees, not what it costs. USED is what it costs. A snapshot with USED=0B and REFER=500GB is essentially free — the live dataset still has identical data. Space only accumulates as the live data diverges. The second most common mistake: assuming destroying a snapshot frees REFER space. It frees USED space. If USED is 12G, you get back 12G. Period.

Comparing snapshots with zfs diff

zfs diff shows file-level changes between two snapshots, or between a snapshot and the live dataset. The output uses single-character prefixes: + (created), - (removed), M (modified), R (renamed).

# Changes between two snapshots
zfs diff rpool/srv/data@monday rpool/srv/data@tuesday

# Changes from a snapshot to the live dataset
zfs diff rpool/srv/data@before-upgrade

# Example output:
# M       /srv/data/config/app.conf
# +       /srv/data/config/new-feature.conf
# -       /srv/data/tmp/old-cache.db
# R       /srv/data/logs/app.log -> /srv/data/logs/app.log.1

This is invaluable for forensics. After an incident, zfs diff tells you exactly what changed — which files were modified, deleted, or created. No audit daemon required. The information comes directly from the block pointer tree.

Destroying snapshots

# Destroy a single snapshot
zfs destroy rpool/srv/data@before-upgrade

# Destroy all snapshots matching a pattern (dry run first)
zfs destroy -nv rpool/srv/data@autosnap_%
zfs destroy rpool/srv/data@autosnap_%

# Destroy a range of snapshots (inclusive)
zfs destroy rpool/srv/data@monday%wednesday

# Recursive destroy — all datasets in the tree
zfs destroy -r rpool/home@old-snap

# Deferred destroy — mark for deletion, freed when no longer referenced
zfs destroy -d rpool/srv/data@held-snap

The % range syntax destroys all snapshots between two names (alphabetically). The -n (dry run) and -v (verbose) flags are essential — always preview before bulk-destroying snapshots. You cannot undo a zfs destroy.

I have seen people destroy snapshots to "free disk space" and be confused when the pool usage doesn't drop. Remember: destroying a snapshot only frees its USED space (blocks unique to that snapshot). If two snapshots share the same blocks, destroying one moves the unique-reference burden to the other. The space isn't freed until the last reference is gone. Run zfs list -t snapshot -o name,used -s used to find the actual space hogs before destroying anything.

Snapshot holds

A hold prevents a snapshot from being destroyed. This is critical for replication workflows: you don't want a retention policy pruning a snapshot that's still needed as the incremental base for the next zfs send.

# Place a hold on a snapshot
zfs hold keep rpool/srv/data@important-snap

# List holds
zfs holds rpool/srv/data@important-snap

# Attempt to destroy a held snapshot — will fail
zfs destroy rpool/srv/data@important-snap
# cannot destroy 'rpool/srv/data@important-snap': dataset is busy

# Release the hold, then destroy
zfs release keep rpool/srv/data@important-snap
zfs destroy rpool/srv/data@important-snap

# Recursive hold on all datasets in a tree
zfs hold -r replication rpool/home@nightly-2026-04-04

Syncoid and zrepl manage holds automatically. If you're building custom replication scripts, always hold the base snapshot before sending, and release it only after confirming the receive succeeded.

Rollback — rewinding the filesystem

zfs rollback reverts a dataset to the exact state of a snapshot. All data written after that snapshot is permanently destroyed. There is no undo for a rollback.

# Rollback to the most recent snapshot (safe — no intermediate snapshots)
zfs rollback rpool/srv/data@before-upgrade

# Rollback to an older snapshot — requires -r to destroy intermediates
zfs rollback -r rpool/srv/data@monday
# WARNING: this destroys all snapshots between @monday and now

# Rollback including clones of intermediate snapshots — nuclear option
zfs rollback -rR rpool/srv/data@last-known-good

Rollback with -r is destructive.

Without -r, ZFS only allows rollback to the most recent snapshot. If you need to go further back, -r destroys every snapshot between the target and now. If any of those snapshots have clones, you need -rR which also destroys the clones. Always snapshot the current state before rolling back so you have a way forward if the rollback was wrong.

# Safe rollback pattern: snapshot current state first
zfs snapshot rpool/srv/data@before-rollback-$(date +%s)
zfs rollback -r rpool/srv/data@known-good

In practice, I almost never use zfs rollback. It's a blunt instrument. Instead, I clone the snapshot, test the old state, and if it's what I need, I promote the clone. Or I just copy files out of .zfs/snapshot/. Rollback destroys data. Clones and copies don't. Use rollback only when you're certain everything after the snapshot is garbage.

Clones & promotion

A clone is a writable copy of a snapshot. Like a snapshot, it shares all blocks with the original — a clone of a 500GB snapshot uses near-zero extra space until you start writing. Clones are full datasets: they can be mounted, snapshotted, and served just like any other ZFS dataset.

# Clone a snapshot into a new dataset
zfs clone rpool/srv/data@before-upgrade rpool/srv/data-test

# The clone is writable and mountable immediately
ls /srv/data-test/
echo "test change" > /srv/data-test/canary.txt

# Clone has a dependency: you cannot destroy the origin snapshot
zfs destroy rpool/srv/data@before-upgrade
# cannot destroy: snapshot has dependent clones

# Promote the clone — it becomes the independent dataset
zfs promote rpool/srv/data-test
# Now rpool/srv/data depends on rpool/srv/data-test, not the reverse

Promotion reverses the parent-child relationship. After promotion, the clone becomes the independent dataset and the original becomes the dependent. This is how you "branch" a filesystem: clone, test changes, promote if they work. The original can then be destroyed if no longer needed.

Clones are unbelievably useful for dev/test. Need to test a database migration? Clone the production dataset, run the migration on the clone, validate. If it breaks, destroy the clone. If it works, promote. Total extra space used: only the delta from the migration. I've used this pattern to test PostgreSQL major version upgrades on 2TB databases with less than 5GB of overhead.

zfs send / receive — block-level replication

zfs send serializes a snapshot (or the delta between two snapshots) into a byte stream. zfs receive consumes that stream and reconstructs the dataset. This operates at the block level — it doesn't traverse the directory tree or open files. It's faster and more reliable than any file-level tool (rsync, tar, cp).

Full send

# Full send to a local pool
zfs send rpool/srv/data@baseline | zfs recv backup/srv/data

# Full send to a remote machine over SSH
zfs send rpool/srv/data@baseline | ssh backup-host "zfs recv tank/srv/data"

# Recursive send — includes all child datasets and their snapshots
zfs send -R rpool/srv@baseline | ssh backup-host "zfs recv -F tank/srv"

# With progress reporting via pv
zfs send -R rpool/srv@baseline | pv -rtab | ssh backup-host "zfs recv -F tank/srv"

Incremental send

Incremental sends transmit only the blocks that changed between two snapshots. This is the core of efficient replication — the first send is large (full dataset), but every subsequent send is just the delta.

# Incremental send: only blocks changed between monday and tuesday
zfs send -i rpool/srv/data@monday rpool/srv/data@tuesday | \
  ssh backup-host "zfs recv tank/srv/data"

# Incremental with -I: includes all intermediate snapshots
zfs send -I rpool/srv/data@monday rpool/srv/data@friday | \
  ssh backup-host "zfs recv tank/srv/data"

# Recursive incremental
zfs send -R -i rpool/srv@monday rpool/srv@tuesday | \
  ssh backup-host "zfs recv -F tank/srv"

-i (lowercase)

Incremental from snapshot A to snapshot B. Only the delta between those two specific snapshots. The receiver must already have snapshot A.

-I (uppercase)

Incremental from A to B, including all intermediate snapshots. The receiver gets A, every snapshot between, and B. Use this when you've taken multiple snapshots since the last replication.

-R (replication)

Recursive replication stream. Includes all child datasets, all their snapshots, and all properties. The receiver mirrors the full dataset tree. Use -F on the receive side to force overwrite.

-w (raw)

Send encrypted data as raw ciphertext. The receiver cannot read the data without the key. Essential for replicating to untrusted targets.

-c (compressed)

Send compressed blocks as-is without decompressing and recompressing. Saves CPU on both sides. Use when source and destination use the same compression algorithm.

-L (large blocks)

Include blocks larger than 128K. Needed when recordsize is set above 128K (e.g., 1M for sequential workloads).

-e (embedded)

Include embedded data blocks (small blocks stored in the block pointer itself). Slightly more efficient for datasets with many tiny files.

The -i vs -I distinction matters more than you'd think. If you use -i (lowercase) and have taken multiple snapshots since the last sync, you'll send only the final delta but the intermediate snapshots won't exist on the receiver. This means you can't use those intermediates as a base for future incrementals. -I (uppercase) sends all intermediates and is almost always what you want. The common flags combo for production replication: zfs send -R -w -c -L.

Resumable send / receive

Large sends over unreliable networks (WAN, satellite, VPN) can fail midway. OpenZFS 0.7+ supports resumable send via tokens. If a receive is interrupted, ZFS records how far it got. You can resume from that point instead of starting over.

# Start a receive with -s to enable resume tokens
zfs send -R rpool/srv@snap | ssh remote "zfs recv -s -F tank/srv"

# If interrupted, check for a resume token on the receiver
ssh remote "zfs get receive_resume_token tank/srv"

# Resume the send using the token
token=$(ssh remote "zfs get -H -o value receive_resume_token tank/srv")
zfs send -t "$token" | ssh remote "zfs recv -s -F tank/srv"

# Abort a partial receive and discard the token
ssh remote "zfs recv -A tank/srv"

Resumable send is a game changer for initial seeding of remote replicas. Sending 10TB over a 100Mbps WAN takes about a day. Without resume, a network blip at hour 23 means starting over. With resume, you lose minutes, not hours. One gotcha: the resume token encodes the send flags. You can't change flags (add -w, etc.) when resuming — you must use the same flags as the original send, or abort and restart.

Encrypted replication

The -w (raw) flag sends encrypted datasets as ciphertext. The receiving side stores the encrypted blocks without ever seeing the plaintext. This enables replication to untrusted backup servers, cloud storage, or off-site hosts where you don't control physical security.

# Source has encrypted dataset
zfs get encryption rpool/srv/secrets
# NAME                PROPERTY    VALUE        SOURCE
# rpool/srv/secrets   encryption  aes-256-gcm  -

# Raw send — ciphertext only
zfs send -w rpool/srv/secrets@snap | ssh untrusted "zfs recv tank/secrets"

# The receiver has the data but cannot read it
ssh untrusted "zfs mount tank/secrets"
# cannot mount: encryption key not loaded

# Incremental raw send
zfs send -w -i rpool/srv/secrets@snap1 rpool/srv/secrets@snap2 | \
  ssh untrusted "zfs recv tank/secrets"

Raw send works with both incremental and full sends. The receiver sees encrypted blocks, compressed blocks (if -c is also used), and nothing else. Properties that reveal data structure (like used) are still visible, but actual file contents are opaque.

Bookmarks

A bookmark is a lightweight reference to a snapshot's transaction group (TXG) that persists after the snapshot is destroyed. Bookmarks take zero space. Their purpose: serve as the base for incremental sends even after the source snapshot has been pruned.

# Create a bookmark from a snapshot
zfs bookmark rpool/srv/data@monday rpool/srv/data#monday

# List bookmarks
zfs list -t bookmark -r rpool/srv/data

# Now you can destroy the snapshot — the bookmark remains
zfs destroy rpool/srv/data@monday

# Incremental send using the bookmark as the base
zfs send -i rpool/srv/data#monday rpool/srv/data@tuesday | \
  ssh backup "zfs recv tank/srv/data"

# Destroy a bookmark
zfs destroy rpool/srv/data#monday

The workflow: take a snapshot, replicate it, create a bookmark, destroy the local snapshot to free space, keep the bookmark as the incremental base. The remote still has the full snapshot. Next time, send incrementally from the bookmark to the new snapshot. This is how you keep the source system lean while maintaining an unbroken replication chain.

Bookmarks are one of ZFS's most underused features. Most people keep snapshots around on the source just so they have an incremental base. That costs space. Bookmarks cost zero space. If your retention policy prunes snapshots faster on the source than the replication cycle runs, bookmarks are the fix. Syncoid handles this automatically if you configure it correctly.

Performance tuning for send / receive

# Use mbuffer to smooth I/O and add progress reporting
zfs send -R rpool/srv@snap | mbuffer -s 128k -m 1G | \
  ssh backup "mbuffer -s 128k -m 1G | zfs recv -F tank/srv"

# Use pv for simple progress and throughput display
zfs send -R rpool/srv@snap | pv -rtab | ssh backup "zfs recv -F tank/srv"

# Compress the stream in transit (when source data is uncompressed)
zfs send rpool/srv@snap | lz4 | ssh backup "lz4 -d | zfs recv tank/srv"

# Use pigz for multi-threaded compression
zfs send rpool/srv@snap | pigz -3 | ssh backup "pigz -d | zfs recv tank/srv"

# Limit bandwidth to avoid saturating the link
zfs send rpool/srv@snap | pv -L 50M | ssh backup "zfs recv tank/srv"

Scenario	Recommended pipeline	Notes
LAN (1–10 Gbps)	`zfs send -c -L \| ssh \| zfs recv`	Use `-c` to skip recompression. SSH is the bottleneck; consider `ssh -c aes128-gcm@openssh.com` for faster cipher.
WAN (slow link)	`zfs send -c \| lz4 \| ssh -s \| mbuffer \| zfs recv`	Compress in transit. Use mbuffer on both ends. Enable resume tokens (`-s`).
Initial seed (very large)	Physical transport (disk ship)	`zfs send -R > /mnt/transport/seed.zfs` — send to a portable drive, ship it, `zfs recv` on the other end. Resume with incremental once online.
Encrypted to untrusted	`zfs send -w -c -L \| ssh \| zfs recv`	Raw send. Data stays encrypted in transit and at rest on the receiver.

The single biggest performance improvement for send/recv over SSH: use a fast cipher. The default SSH cipher is often AES-256-CTR which is CPU-bound at ~500 MB/s. Switch to aes128-gcm@openssh.com or chacha20-poly1305 for 2–3x throughput. On a 10G LAN, SSH itself becomes the bottleneck before the disks do. If you're replicating locally between pools on the same machine, skip SSH entirely: zfs send | zfs recv.

Automation: Sanoid & Syncoid

Manual snapshots and replication are fine for one-off operations, but production systems need automated retention. Sanoid manages snapshot creation and pruning. Syncoid manages replication. Together they replace complex cron scripts with a declarative config.

Sanoid — automated snapshot management

Defines snapshot policies per dataset via a simple INI config. Automatically creates and prunes snapshots based on retention rules. Runs via cron or systemd timer.

# /etc/sanoid/sanoid.conf
[rpool/home]
  use_template = production
  recursive = yes

[rpool/srv/data]
  use_template = production

[rpool/var/log]
  use_template = short-retention
  autosnap = yes
  autoprune = yes

[template_production]
  autosnap = yes
  autoprune = yes
  hourly = 48
  daily = 30
  weekly = 8
  monthly = 12
  yearly = 2

[template_short-retention]
  hourly = 24
  daily = 7
  weekly = 0
  monthly = 0
  yearly = 0

# Run sanoid manually (usually runs via cron every 15 minutes)
sanoid --cron

# Dry run — show what would be created/pruned
sanoid --cron --verbose

# Monitor sanoid status
sanoid --monitor-snapshots --monitor-health

Syncoid — automated replication

Wraps zfs send/recv for secure, incremental replication over SSH. Automatically determines the common snapshot, sends the incremental delta, and handles resume tokens. One command replaces pages of shell scripting.

# Replicate a single dataset
syncoid rpool/srv/data backup-host:tank/srv/data

# Recursive replication of an entire tree
syncoid --recursive rpool/home backup-host:tank/home

# With compressed send and no-sync-snap (use existing sanoid snapshots)
syncoid --recursive --no-sync-snap --sendoptions="-w -c -L" \
  rpool/srv backup-host:tank/srv

# Exclude specific datasets
syncoid --recursive --exclude="rpool/tmp" --exclude="rpool/cache" \
  rpool backup-host:tank

Syncoid auto-detects the common snapshot, sends the delta, and handles resume tokens. No scripting required.

zrepl — daemon-based replication

For complex setups: bi-directional sync, resume tokens, network drop resilience, many-to-many replication. YAML config. Runs as a daemon with built-in monitoring endpoints. Ideal for managing many ZFS hosts at scale.

# /etc/zrepl/zrepl.yml (simplified push job)
jobs:
  - name: "push-to-backup"
    type: push
    connect:
      type: ssh+stdinserver
      host: backup-host
      user: root
      identity_file: /root/.ssh/zrepl_key
    filesystems:
      "rpool/srv<": true
      "rpool/tmp": false
    snapshotting:
      type: periodic
      interval: 15m
      prefix: zrepl_
    pruning:
      keep_sender:
        - type: not_replicated
        - type: last_n
          count: 10
      keep_receiver:
        - type: grid
          grid: 1x1h(keep=all) | 24x1h | 30x1d | 12x30d

Sanoid/Syncoid for simple setups. zrepl for production fleets with complex topologies.

kldload installs Sanoid + Syncoid by default on desktop and server profiles. The default policy is 48 hourly, 30 daily, 8 weekly, 12 monthly snapshots. This is generous but safe — you'll never regret having too many snapshots, only too few. If disk space is tight, reduce hourly to 24 first. Never reduce daily below 7. The one mistake I see constantly: people disable autoprune and wonder why their pool is full six months later. Sanoid creates snapshots. Autoprune deletes old ones. You need both.

Boot environments

A boot environment is a snapshot + clone of the root filesystem that you can boot into. Before a kernel upgrade, OS update, or risky configuration change, create a boot environment. If the update breaks the system, reboot into the previous environment. This is the ZFS equivalent of VM snapshots, but for bare metal.

# Create a boot environment before a major update
zfs snapshot rpool/ROOT/centos@pre-kernel-update
zfs clone rpool/ROOT/centos@pre-kernel-update rpool/ROOT/centos-rollback

# If the update breaks things, set the bootfs property and reboot
zpool set bootfs=rpool/ROOT/centos-rollback rpool
reboot

# On Debian/Ubuntu with zsys or on systems with beadm/zectl:
zectl create pre-upgrade
dnf update -y
# If broken:
zectl activate pre-upgrade
reboot

# List boot environments
zectl list

kldload configures ZFS-on-root with a rpool/ROOT/<distro> dataset structure specifically to enable boot environments. The bootloader (systemd-boot or GRUB) reads the bootfs pool property to determine which dataset to mount as /. See the Boot Chain page for full details.

Boot environments are the single best argument for ZFS on root. Every other filesystem requires you to image the whole disk (Clonezilla, dd) or manage LVM snapshots (which are slow and fragile). ZFS boot environments are instant, space-efficient, and you can have dozens of them. I keep the last three kernel updates as boot environments. Cost: near zero until the live root diverges.

Real-world scenarios

Ransomware recovery

Ransomware encrypts your files. With hourly ZFS snapshots, you roll back to the last clean snapshot and lose at most one hour of work. The ransomware cannot encrypt snapshots because snapshots are read-only at the kernel level — no userspace process can modify them.

# Find the last clean snapshot (check timestamps vs. infection time)
zfs list -t snapshot -o name,creation -r rpool/srv/data | grep "2026-04-04"

# Verify it's clean
ls /srv/data/.zfs/snapshot/autosnap_2026-04-04_09:00:00_hourly/

# Roll back
zfs rollback -r rpool/srv/data@autosnap_2026-04-04_09:00:00_hourly

Snapshots are your ransomware insurance. But only if they're replicated off-host — a root compromise can still zfs destroy.

Database-consistent snapshots

ZFS snapshots are crash-consistent (equivalent to pulling the power plug). For true application consistency, freeze the database before snapshotting.

# PostgreSQL: checkpoint + snapshot
psql -c "CHECKPOINT;"
zfs snapshot rpool/srv/pgdata@consistent-$(date +%s)

# MySQL/MariaDB: flush + lock + snapshot + unlock
mysql -e "FLUSH TABLES WITH READ LOCK;"
zfs snapshot rpool/srv/mysql@consistent-$(date +%s)
mysql -e "UNLOCK TABLES;"

# For any filesystem: fsfreeze (blocks all I/O during snapshot)
fsfreeze --freeze /srv/data
zfs snapshot rpool/srv/data@frozen-$(date +%s)
fsfreeze --unfreeze /srv/data

# Script pattern for automated consistent snapshots
#!/bin/bash
psql -c "SELECT pg_start_backup('zfs-snap', true);" 2>/dev/null
zfs snapshot -r rpool/srv/pgdata@backup-$(date +%Y%m%d-%H%M%S)
psql -c "SELECT pg_stop_backup();" 2>/dev/null

Crash-consistent is fine for most workloads. Application-consistent is required for databases that don't journal or have long-running transactions.

Dev/test branching with clones

Clone production data for development without doubling storage. Each developer gets a writable copy that shares blocks with the original.

# Snapshot production
zfs snapshot rpool/srv/app@dev-branch

# Create per-developer clones
zfs clone rpool/srv/app@dev-branch rpool/dev/alice
zfs clone rpool/srv/app@dev-branch rpool/dev/bob
zfs clone rpool/srv/app@dev-branch rpool/dev/carol

# Each clone starts identical, diverges independently
# Total extra space: only the sum of changes across all clones

# When done, destroy dev clones
zfs destroy rpool/dev/alice
zfs destroy rpool/dev/bob
zfs destroy rpool/dev/carol
zfs destroy rpool/srv/app@dev-branch

Migration via send / receive

Moving a dataset to a new server, new pool, or new datacenter. Send/receive preserves everything: data, snapshots, properties, permissions, ACLs, xattrs.

# Full migration to a new server
zfs snapshot -r rpool/srv@migrate
zfs send -R -w -c -L rpool/srv@migrate | \
  ssh new-server "zfs recv -F tank/srv"

# Incremental catch-up (run until cutover)
zfs snapshot -r rpool/srv@migrate-final
zfs send -R -I rpool/srv@migrate rpool/srv@migrate-final | \
  ssh new-server "zfs recv -F tank/srv"

# Physical transport for initial seed (sneakernet)
zfs send -R rpool/srv@migrate > /mnt/usb/srv-seed.zfs
# Ship the drive, then on the new server:
zfs recv -F tank/srv < /mnt/usb/srv-seed.zfs
# Then incremental sync over the network to catch up

Never rsync a ZFS dataset. zfs send preserves block layout, snapshots, and properties. rsync throws all of that away.

Disaster recovery

Full-site DR with automated replication to a remote datacenter.

# Cron job: replicate every 15 minutes to DR site
*/15 * * * * syncoid --recursive --no-sync-snap \
  --sendoptions="-w -c -L" rpool/srv dr-host:tank/srv

# On DR failover: import the pool and adjust mountpoints
zpool import tank
zfs set mountpoint=/srv tank/srv
# Service is back online with at most 15 minutes of data loss (RPO=15m)

# Test DR regularly: clone on the DR side, boot a test VM from it
ssh dr-host "zfs clone tank/srv/app@latest tank/dr-test/app"

Snapshots vs backups — understanding the difference

Property	Local snapshot	Replicated snapshot (send/recv)	Traditional backup (rsync, tar, Veeam)
Protects against	Accidental deletion, logical corruption, user error	All of the above + hardware failure, site disaster	All of the above (if off-site)
Does NOT protect against	Pool loss, disk failure, site disaster, root compromise	Simultaneous compromise of both sites	Depends on backup integrity testing
Recovery speed	Instant (rollback or clone)	Minutes to hours (recv or import)	Hours to days (restore over network)
Space efficiency	Excellent (COW, only stores deltas)	Excellent (incremental sends)	Poor to moderate (full copies or dedup overhead)
Granularity	Block-level, any snapshot frequency	Block-level, limited by replication schedule	File-level, limited by backup window
Integrity verification	Automatic (ZFS checksums every block)	Automatic (checksums verified on receive)	Manual (must run restore tests)

The correct strategy is both: local snapshots for instant recovery from user errors, plus replicated snapshots to a separate system for disaster recovery. If you only have local snapshots, you have undo, not backup. If you only have off-site replication, recovery from user error requires a network round-trip instead of being instant.

The 3-2-1 rule applies to ZFS too: 3 copies, 2 different media, 1 off-site. Local snapshots are copy #1 (same media, same site). Replicated snapshots to a second ZFS host are copy #2 (different media, potentially off-site). For true paranoia, send to a third location or dump critical snapshots to cold storage. Most people stop at two and that's fine for everything except regulated environments.

Common pitfalls

Snapshot without autoprune = full pool

Automated snapshots without automated pruning will fill your pool. Every snapshot retains old blocks. Over months, the cumulative USED grows until the pool hits 80%+ and performance craters. Always pair autosnap with autoprune in Sanoid.

Destroying snapshots in wrong order

If snapshot B depends on snapshot A as the incremental base for replication, destroying A breaks the replication chain. Use holds to protect replication bases. Syncoid and zrepl manage this automatically.

Recursive rollback surprises

zfs rollback -r destroys all intermediate snapshots. If those snapshots have clones (dev/test branches, boot environments), you need -rR which also destroys the clones. Always snapshot current state before rolling back.

Forgetting -r on recursive operations

Snapshotting rpool/home without -r does not snapshot child datasets. You'll discover this during restore when rpool/home/user has no snapshots. Similarly, zfs send without -R skips child datasets.

Pool 90%+ full with many snapshots

When a pool is nearly full, ZFS performance degrades severely and you may not be able to destroy snapshots (destroying requires free space for metadata updates). Prevention: set zfs set reservation=10G rpool on the pool to guarantee free space for maintenance operations.

Sending to a dataset that's actively mounted and modified

Using zfs recv -F on a dataset that processes are actively writing to causes conflicts. The receive overwrites the dataset. Use a dedicated receive dataset that nothing else touches, then clone or rename when ready.

Assuming REFER = cost

A snapshot with REFER=500GB and USED=2GB costs 2GB, not 500GB. REFER is what the snapshot sees. USED is what it uniquely holds. Destroying it frees USED, not REFER.

Not testing restores

A replication job that's been running for two years has never been tested unless you've actually done a restore. Clone a recent snapshot on the DR host, mount it, verify the data. Do this quarterly. An untested backup is not a backup.

kldload snapshot defaults

kldload's desktop and server profiles install Sanoid automatically with the following default policy. The core profile does not install Sanoid (stock distro, no k* tooling).

Hourly

48 snapshots retained (2 days of hourly resolution)

Daily

30 snapshots retained (1 month of daily resolution)

Weekly

8 snapshots retained (2 months of weekly resolution)

Monthly

12 snapshots retained (1 year of monthly resolution)

Recursive

Enabled — all child datasets under rpool/home and rpool/srv are snapshotted

Autoprune

Enabled — old snapshots are automatically deleted when retention limits are exceeded

To customize, edit /etc/sanoid/sanoid.conf on the installed system. To add replication, add a Syncoid cron job pointing to your backup host. The kldload web UI can configure this during install if you provide a backup host target.

Quick reference

Operation	Command
Create snapshot	`zfs snapshot pool/data@name`
Recursive snapshot	`zfs snapshot -r pool/data@name`
List snapshots	`zfs list -t snapshot -o name,used,refer -s creation`
Browse snapshot	`ls /data/.zfs/snapshot/name/`
Rollback	`zfs rollback pool/data@name`
Destroy snapshot	`zfs destroy pool/data@name`
Destroy range	`zfs destroy pool/data@first%last`
Hold snapshot	`zfs hold tag pool/data@name`
Release hold	`zfs release tag pool/data@name`
Clone snapshot	`zfs clone pool/data@name pool/clone`
Promote clone	`zfs promote pool/clone`
Create bookmark	`zfs bookmark pool/data@name pool/data#name`
Full send	`zfs send pool/data@name \| zfs recv dest/data`
Incremental send	`zfs send -i pool/data@old pool/data@new \| zfs recv dest/data`
Recursive send	`zfs send -R pool@snap \| zfs recv -F dest`
Encrypted send	`zfs send -w pool/data@name \| zfs recv dest/data`
Resume interrupted	`zfs send -t TOKEN \| zfs recv -s dest/data`
Diff snapshots	`zfs diff pool/data@old pool/data@new`
Diff vs live	`zfs diff pool/data@snap`

If you take away one thing from this page: snapshots + send/recv is a complete data protection strategy that works at the block level, verifies integrity with checksums, supports encryption end-to-end, and costs nothing except the storage for changed blocks. Every other backup tool is doing a worse version of what ZFS does natively. The only thing you still need external tools for is off-host scheduling (Sanoid/Syncoid) and the 3-2-1 discipline to actually replicate off-site. The technology is solved. The human process is the hard part.

← Pool Design & VDEV Layout — the decision you can't undo. The ZFS Boot Chain — DKMS, Dracut, and the initramfs. →