ZFS vs Everything Else

ZFS Wiki

ZFS vs Everything Else — the middleware graveyard.

Traditional Linux storage is a layer cake: partitions, volume managers, RAID arrays, filesystems, encryption wrappers, snapshot tools, caching layers — each with its own config syntax, failure modes, and on-call runbooks. ZFS replaces all of them with a single, integrated storage platform. This page is a comprehensive, honest comparison of ZFS against every legacy filesystem, volume manager, and RAID system you might be running today.

This is the page I wish existed when I was evaluating ZFS in 2016. Every comparison I found was either a ZFS fanboy hit piece or a dismissive "it's fine for NAS" from someone who never ran it in production. This page is honest. ZFS wins most comparisons, but not all — and I'll tell you exactly where the legacy tools are simpler or more appropriate.

The philosophical difference

ZFS is not "just a filesystem" the way ext4 is a filesystem. ZFS is a storage platform. It merges the volume manager, RAID controller, filesystem, snapshot engine, replication system, caching layer, compression engine, and encryption subsystem into one coherent, transactional whole. Every component shares the same on-disk format, the same transaction model, and the same checksum tree.

Legacy Linux storage treats each layer as independent: mdadm doesn't know about ext4. LVM doesn't know about LUKS. ext4 doesn't know about mdadm. When something breaks, you're debugging three or four tools that have no awareness of each other. When you want a snapshot, you need LVM thin provisioning or a separate tool like snapper. When you want replication, you need rsync or borgbackup, which operate at the file level and scale poorly.

ZFS's integrated design means every operation — writes, checksums, compression, encryption, snapshots, replication — happens in one atomic transaction group (TXG). There is no window where the filesystem is inconsistent. There is no fsck. There is no "I hope the RAID rebuild finishes before another disk dies."

The layer cake problem — ext4 + LVM + mdadm

The traditional Linux "enterprise" storage stack looks like this: mdadm assembles physical disks into a RAID array. LVM carves that array into logical volumes. LUKS encrypts each volume. ext4 or XFS sits on top. That's four independent layers, each with its own tools, its own failure modes, and zero awareness of the layers above or below it.

What goes wrong with the layer cake

Silent corruption propagates. ext4 has no checksums. If a disk returns corrupt data, ext4 stores it faithfully. mdadm has no way to know which copy is correct during a RAID1 rebuild — it picks one arbitrarily. LVM doesn't checksum anything. LUKS encrypts whatever it receives, corrupt or not. The corruption is now encrypted, replicated, and backed up. You discover it six months later when you try to open a file.

Snapshots are painful. LVM snapshots exist, but they're copy-on-write at the block level with severe performance degradation. LVM thin snapshots are better but add another layer of complexity and have their own failure modes. Neither integrates with replication.

Expansion is error-prone. Growing the stack means: growing the mdadm array, then pvresize, then lvextend, then resize2fs or xfs_growfs. Miss a step and you've got mismatched sizes. Shrinking is worse — do the steps in reverse order or lose data.

ZFS equivalent: zpool add + done. One command. One tool. One failure domain.

Here's the same operation — creating redundant, encrypted storage — in both stacks:

# Legacy: mdadm + LUKS + LVM + ext4 (15+ commands)
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
mdadm --detail --scan >> /etc/mdadm.conf
cryptsetup luksFormat /dev/md0
cryptsetup luksOpen /dev/md0 crypt0
pvcreate /dev/mapper/crypt0
vgcreate vg0 /dev/mapper/crypt0
lvcreate -L 100G -n data vg0
mkfs.ext4 /dev/vg0/data
mkdir -p /data
mount /dev/vg0/data /data
# ...plus fstab, crypttab, mdadm.conf, dracut/initramfs updates

# ZFS: one command
zpool create -o ashift=12 \
  -O compression=lz4 -O atime=off -O encryption=aes-256-gcm \
  -O keyformat=passphrase -O keylocation=prompt \
  tank mirror /dev/sda /dev/sdb

I've set up the legacy stack hundreds of times. Every time, I forget one step — the crypttab entry, the dracut rebuild, the fstab UUID. With ZFS, I run one command and walk away. The mountpoint is a dataset property. The encryption is a dataset property. The RAID is the pool topology. There is nothing else to configure.

What ZFS replaces — the complete list

LVM, VG, LV, PV

Gone. ZFS is its own volume manager. zpools define disk topology. No mapping layers. No /dev/mapper.

cryptsetup, LUKS

Gone. Native per-dataset encryption. Separate keys per dataset. Replicate encrypted without decrypting.

mdadm (software RAID)

Gone. ZFS has RAIDZ1/2/3, mirrors, stripes, hot spares, and distributed parity (dRAID). With checksumming.

ext4, XFS, btrfs

Gone. ZFS is its own copy-on-write, transactional, self-healing filesystem. No fsck. Ever.

fsck

Gone. ZFS maintains on-disk consistency at all times. Boot after a crash. It's fine.

/etc/fstab

Mostly gone. Mountpoints are dataset properties. zfs mount -a handles the rest. Only EFI needs fstab.

rsync, rsnapshot, borg

Gone. zfs send/recv does block-level incremental replication. Only changed blocks. Delta-aware. Encrypted.

bcache, lvmcache

Gone. ARC (RAM cache) + L2ARC (SSD cache) + SLOG (write intent log). All native.

snapper, timeshift, btrbk

Gone. ZFS invented the snapshot. Nobody does it better. Instant. Atomic. Mountable.

fuse-overlayfs, aufs

Gone. ZFS clones are real copy-on-write dataset copies. No overlay hacks. No FUSE performance penalty.

quota, edquota

Gone. quota=50G on a dataset. One command. Tracks used, referenced, snapshot consumption. Done.

testdisk, photorec

Gone. ZFS doesn't lose data silently. Checksums. Self-healing. Snapshots. If you need recovery tools, something went very wrong.

tune2fs, xfs_growfs

Gone. ZFS properties are dynamic, inheritable, and live. zfs set recordsize=1M tank/media. No remount. No reboot.

ZFS vs ext4 — the default vs the future

ext4 is the default filesystem on most Linux distributions. It's stable, fast, well-understood, and has been in production since 2008. It is also a filesystem and nothing else — no volume management, no RAID, no checksums, no snapshots, no replication.

Feature	ext4	ZFS
Architecture	Journaling filesystem only	Integrated volume manager + filesystem + RAID
Data checksums	Metadata journal only — no data checksums	SHA-256/fletcher4 on every block — data and metadata
Self-healing	No — corrupt data stays corrupt	Yes — reads from good copy on checksum mismatch (mirrors/RAIDZ)
Snapshots	No (requires LVM thin)	Instant, zero-cost, unlimited, atomic
Replication	rsync (file-level, slow)	`zfs send/recv` (block-level, incremental, encrypted)
RAID	Requires mdadm or hardware RAID	Built-in mirrors, RAIDZ1/2/3, dRAID
Encryption	Requires LUKS wrapper	Native per-dataset AES-256-GCM
Compression	No	LZ4, ZSTD, gzip — transparent, per-dataset
Max filesystem size	1 EiB (theoretical)	256 ZiB (theoretical)
Max file size	16 TiB	16 EiB
Online shrink	Yes (with care)	No — pools cannot be shrunk
fsck required	Yes — after unclean shutdown, can take hours on large volumes	No — always consistent due to copy-on-write + TXG
RAM requirements	Minimal	1 GB per TB of storage (rule of thumb); more = better ARC
Kernel inclusion	In-tree since 2.6.28	Out-of-tree DKMS module (license incompatibility)
Distro support	Universal	Ubuntu native; others via DKMS or kldload

Where ext4 wins: simplicity, minimal resource usage, universal kernel inclusion, and the ability to shrink filesystems. For a 512 MB embedded device or a throwaway cloud instance that stores nothing important, ext4 is perfectly fine. It boots, it works, it's boring.

Where ext4 loses: everything else. No checksums means silent data corruption goes undetected. No snapshots means no quick rollback. No built-in RAID means you need mdadm. No built-in encryption means you need LUKS. No replication means you need rsync. Each addition is another layer with its own failure modes.

I still use ext4 for /boot and the EFI system partition. It's the right tool for small, static partitions that UEFI firmware and bootloaders need to read. For everything else — root, home, data, VMs, containers — ZFS.

ZFS vs XFS — the metadata champion

XFS is the default filesystem on RHEL, CentOS, Rocky, and Fedora. It was designed by SGI for high-performance, large-scale storage. XFS excels at metadata performance — it handles millions of files in a single directory better than any other legacy filesystem. It's the only legacy filesystem that gives ZFS honest competition in some workloads.

Feature	XFS	ZFS
Metadata performance	Excellent — B+ tree allocation groups, delayed allocation	Good — improved dramatically with special vdevs on SSD
Data checksums	No (metadata CRCs in v5 format since 2013, but no data checksums)	Yes — every block checksummed
Self-healing	No	Yes (with redundancy)
Snapshots	No	Yes — instant, unlimited
Reflinks / CoW copies	Yes (since kernel 4.9) — instant file copies	Yes — clones and snapshots are CoW
RAID	Requires mdadm or hardware RAID	Built-in
Compression	No	LZ4, ZSTD, gzip
Online grow	Yes (`xfs_growfs`)	Yes (`zpool add` or `zpool attach`)
Online shrink	No	No
Max filesystem size	8 EiB	256 ZiB
Parallel I/O	Excellent — allocation groups enable independent parallel writes	Good — multiple vdevs enable parallel I/O
Repair tool	`xfs_repair` — fast and reliable	No fsck needed — `zpool scrub` for proactive verification
Production history	Since 1994 (IRIX), Linux since 2001	Since 2005 (Solaris), Linux since 2010 (ZoL)

Where XFS wins: raw metadata throughput on workloads with millions of small files (mail servers, build caches, package repositories). XFS allocation groups allow truly parallel metadata operations across different regions of the disk. XFS is also battle-hardened in enterprise Linux — Red Hat has invested decades into xfs_repair and xfsprogs.

Where XFS loses: no checksums on data (only metadata CRCs), no snapshots, no built-in RAID, no compression, no encryption, no replication. XFS is an excellent filesystem. But it's only a filesystem. You still need the full layer cake around it.

If someone tells me they run XFS on bare metal with mdadm RAID10, I respect that. It's a solid stack. But the moment they need snapshots, replication, or compression, they're bolting on new tools. ZFS gives you all of it from day one. And it checksums the data, which XFS still doesn't.

ZFS vs Btrfs — the closest competitor

Btrfs is the only Linux filesystem that honestly competes with ZFS on features. It has copy-on-write, snapshots, checksums, built-in RAID, compression, and subvolumes. It's in-tree in the Linux kernel. On paper, it's everything ZFS is, but GPL-licensed and natively integrated. In practice, the story is more complicated.

Feature	Btrfs	ZFS
License	GPL — in-tree kernel module	CDDL — out-of-tree DKMS
Copy-on-write	Yes	Yes
Checksums	CRC32C (default), SHA-256, BLAKE2b	fletcher4 (default), SHA-256, SHA-512, Skein, Edon-R, BLAKE3
Self-healing	Yes (with redundancy)	Yes (with redundancy)
Snapshots	Yes — subvolume snapshots, writable	Yes — dataset snapshots (read-only) + clones (writable)
Compression	LZO, ZLIB, ZSTD	LZ4, GZIP, ZSTD, LZjb, ZLE
Encryption	No native (fscrypt proposed but unmerged)	Native AES-256-GCM per-dataset
RAID 0/1/10	Stable	Stable (striped vdevs, mirrors)
RAID 5/6 (parity)	BROKEN — write hole, data loss risk	RAIDZ1/2/3 — stable since 2005, no write hole
Send/receive	Yes — subvolume-based incremental send	Yes — dataset-based incremental send
Deduplication	Out-of-band (offline) since 6.13 via `btrfs-dedup`	Inline (real-time) but RAM-hungry; block cloning since 2.2
Quotas	qgroups (complex, historically buggy)	Simple per-dataset quota/refquota/reservation
Max filesystem size	16 EiB	256 ZiB
Online shrink	Yes	No
Device removal	Yes (`btrfs device remove`)	Limited (mirror vdevs only, via `zpool remove`)
RAM requirements	Lower than ZFS	Higher — ARC wants RAM
Maturity	Declared stable in 2013; RAID5/6 still not production-ready in 2026	Production since 2005 (Solaris); OpenZFS on Linux since 2013

Where Btrfs wins: kernel inclusion (no DKMS headaches), online shrink, device removal, lower RAM requirements, and writable snapshots by default. Btrfs subvolumes are also more flexible than ZFS datasets for certain container and flatpak workflows. SUSE has run Btrfs as the default root filesystem since 2014 — for RAID1 and single-disk configurations, it's genuinely production-ready.

Where Btrfs loses: the RAID5/6 write hole is the elephant in the room. Btrfs parity RAID has a known bug where a crash during a partial stripe write can produce inconsistent parity. This has been documented since 2013 and remains unfixed in 2026. If you need parity RAID, Btrfs is not an option. ZFS RAIDZ has never had this problem — its full-stripe writes and copy-on-write design make a write hole impossible.

Btrfs also lacks native encryption (you need LUKS underneath, defeating the integrated design). Btrfs qgroups are notoriously complex and have had performance regressions. And Btrfs has a history of data loss bugs in edge cases that has eroded trust, even as the codebase has matured significantly since 2020.

I actually like Btrfs for laptop root filesystems — it's in-tree, snapshots work great with snapper, and RAID1 is solid. But the moment you need parity RAID, encryption, or you're storing data you really cannot lose, ZFS is the only answer. The write hole in Btrfs RAID5/6 has been open for over a decade. That tells you everything about the project's priorities.

ZFS vs mdadm — software RAID

mdadm is the Linux software RAID implementation. It operates at the block layer, below the filesystem. It knows nothing about the data it stores — just blocks. This is both its strength (simplicity, flexibility) and its fatal flaw (no data integrity).

Feature	mdadm	ZFS
RAID levels	0, 1, 4, 5, 6, 10	Stripe, mirror, RAIDZ1/2/3, dRAID
Data checksums	No — relies on disk firmware	Yes — every block
Write hole (RAID5/6)	Yes — requires battery-backed write journal or write-intent bitmap	No — copy-on-write eliminates write hole by design
Rebuild intelligence	Rebuilds entire disk, even empty space	Only rebuilds allocated blocks
Hot spare activation	Manual or mdadm.conf-based	Automatic (hot spares or dRAID distributed spares)
Scrub	`echo check > /sys/block/md0/md/sync_action`	`zpool scrub tank` — verifies checksums, repairs from good copies
Monitoring	mdmonitor daemon + email	`zpool status`, `zed` daemon, JSON events
Filesystem awareness	None — just blocks	Fully integrated — RAID and filesystem are one

The write hole is mdadm's most dangerous problem. In RAID5/6, a power failure during a write can leave parity inconsistent with data. On next boot, mdadm has no way to know which blocks are correct. The write-intent bitmap mitigates this but doesn't eliminate it. ZFS's copy-on-write design makes a write hole physically impossible — new data is always written to new locations, and the uberblock pointer is updated atomically.

# mdadm: create RAID1 + filesystem (multiple tools, multiple steps)
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
mkfs.ext4 /dev/md0
mount /dev/md0 /data

# ZFS: one command, same result, plus checksums + snapshots + compression
zpool create -o ashift=12 -O compression=lz4 tank mirror /dev/sda /dev/sdb

ZFS vs hardware RAID — the vendor trap

Hardware RAID controllers (Dell PERC, HP Smart Array, LSI MegaRAID, Broadcom) move RAID computation to a dedicated chip. For decades, this was considered the "enterprise" approach. In 2026, hardware RAID is a liability for ZFS — and increasingly for everything else too.

Feature	Hardware RAID	ZFS
Data checksums	No — RAID controller doesn't checksum data	Yes — end-to-end
Write hole	Mitigated by BBU/supercap (when battery is healthy)	Impossible by design
BBU dependency	Yes — dead battery = write hole returns	No battery needed
Controller failure	Need identical replacement controller or data is lost	Import pool on any machine with ZFS
Vendor lock-in	Proprietary on-disk format — locked to controller vendor	Open format — portable across any OpenZFS platform
Visibility	OS sees one virtual disk — no per-disk SMART, no per-disk errors	Full visibility into every disk — SMART, error counters, I/O stats
Snapshots	No	Yes
Cost	$200–$2000 per controller + battery	Free (use HBA in IT/JBOD mode)
Firmware bugs	Opaque — firmware bugs have caused silent data corruption	Open source — bugs are visible, reported, and fixed publicly

The controller failure scenario is the killer. If your Dell PERC H740 dies, you need another H740 (or compatible) to read the array. If that model is discontinued, you're buying used cards on eBay and praying. With ZFS, you pull the disks, put them in any machine running OpenZFS, and zpool import tank. Done.

The BBU dependency is the second killer. Hardware RAID controllers rely on a battery backup unit to protect the write cache during power failure. Batteries degrade. When the BBU reports degraded, the controller disables write-back caching and performance falls off a cliff. Or worse: the battery is dead but the controller doesn't report it, and you have an unprotected write cache. ZFS doesn't need a battery because copy-on-write never overwrites live data.

If you have a server with a hardware RAID controller, flash it to IT mode (HBA passthrough) or replace it with an HBA. ZFS needs to see the raw disks. Dell PERC in HBA mode, LSI 9300/9400 in IT mode — these are the standard approaches. Never run ZFS on top of a hardware RAID virtual disk. You lose all of ZFS's self-healing and per-disk visibility.

ZFS vs LVM — the volume manager

LVM2 is the standard Linux volume manager. It provides logical volumes, thin provisioning, snapshots (of a sort), and the ability to span or stripe across multiple disks. ZFS replaces LVM entirely — datasets are the equivalent of logical volumes, but with far more capabilities.

Feature	LVM2	ZFS
Thin provisioning	Yes (LVM thin)	Yes — datasets are thin by default
Snapshots	CoW snapshots (thick LVs: severe performance penalty; thin: better but complex)	Instant, zero overhead, unlimited
Snapshot performance	Classic LVM snapshots degrade performance 30–80%	Zero performance impact
Quotas	LV size is the quota	Per-dataset quota, refquota, reservation — granular control
Checksums	No	Yes
Compression	No	Yes
Shrink	Yes (`lvreduce` + `resize2fs`)	No
Complexity	PV → VG → LV → filesystem (four concepts)	Pool → dataset (two concepts)
Replication	No built-in replication	`zfs send/recv`

Where LVM wins: online shrink (ZFS pools cannot shrink), deep integration with every Linux distro's installer, and simpler mental model for admins who only need basic volumes. LVM also integrates with LUKS and mdadm in well-documented ways.

Where LVM loses: LVM classic snapshots are notoriously slow — every write to the origin volume triggers a copy-on-write to the snapshot exception store, degrading performance by 30–80%. LVM thin snapshots are better but add significant complexity (thin pools, metadata volumes, autoextend thresholds). ZFS snapshots are free — zero performance impact, zero configuration.

ZFS vs Ceph — local vs distributed

Ceph is a distributed storage system that provides block (RBD), object (RADOS), and file (CephFS) storage across a cluster of machines. Comparing ZFS to Ceph is comparing a local storage platform to a distributed one — they solve different problems, but the comparison comes up constantly because both are used for "serious" storage.

Feature	Ceph	ZFS
Scope	Distributed across multiple nodes	Local to one machine (or replicated via send/recv)
Minimum nodes	3 (for quorum)	1
Operational complexity	High — MON, OSD, MDS, MGR daemons; CRUSH maps; PG placement	Low — `zpool` and `zfs` commands
Self-healing	Yes — re-replicates on node failure	Yes — resilvers on disk failure
Snapshots	RBD snapshots, CephFS snapshots	Dataset snapshots
Scale	Petabytes across hundreds of nodes	Petabytes on a single node (practical limit ~2 PB)
Network dependency	Requires dedicated storage network (10GbE minimum, 25GbE recommended)	None — local I/O
Latency	Network-bound (100µs–1ms typical)	Disk-bound (10–100µs NVMe, 1–5ms HDD)
Use case	Multi-tenant cloud, OpenStack/Kubernetes PVs, geographically distributed data	Single-node servers, NAS, VM hosts, databases, workstations

Ceph wins when you need data accessible across multiple machines simultaneously, when you need to survive entire node failures without service interruption, or when you're building a cloud platform that serves block storage to hundreds of VMs.

ZFS wins when you need local storage performance, operational simplicity, or you're running on a single machine. Fun fact: many Ceph clusters use ZFS as the OSD backing store (via BlueStore on raw ZFS zvols) to get checksumming and compression underneath Ceph's distributed layer.

Ceph is amazing technology, but it's a full-time job. I've seen teams of three engineers dedicated solely to Ceph operations. If you don't have that staffing, ZFS + zfs send/recv to a remote backup gives you 90% of the resilience at 10% of the operational cost. Don't deploy Ceph unless you genuinely need distributed storage.

ZFS vs DRBD — synchronous replication

DRBD (Distributed Replicated Block Device) provides synchronous block-level replication between two nodes. It's often used for database HA: primary writes to local disk and DRBD simultaneously replicates every write to the secondary. If the primary dies, the secondary has an identical copy.

Feature	DRBD	ZFS send/recv
Replication mode	Synchronous (Protocol C) or async	Asynchronous (snapshot-based incremental)
RPO	Zero (sync mode — no data loss on failover)	Last snapshot interval (typically 1–15 minutes)
Write latency impact	Every write waits for remote acknowledge (adds network RTT)	None — replication is decoupled from writes
Bandwidth	Continuous — mirrors every write in real time	Batched — only transfers changed blocks per snapshot
Complexity	Moderate — DRBD resource config, Pacemaker/Corosync for failover	Low — cron job or sanoid/syncoid
Multi-target	Yes (DRBD 9 supports 2+ secondaries, but complex)	Yes — send to multiple targets trivially
Checksums	Network CRC only — no on-disk data checksums	Full on-disk checksums on both sides

DRBD wins when you absolutely need zero RPO — database HA clusters where losing even one transaction is unacceptable. Synchronous replication guarantees the secondary has every committed write.

ZFS wins when you can tolerate a few minutes of potential data loss (which is most workloads). zfs send/recv is dramatically simpler to operate, doesn't impact write latency, and includes checksums on both sides. For most server replication, syncoid --no-sync-snap tank tank/backup@remote in a cron job is all you need.

The master comparison table

Every major feature across every storage technology in one table. This is the reference.

Feature	ext4	XFS	Btrfs	ZFS
Data checksums	No	No	Yes	Yes
Metadata checksums	Journal	CRC32C	Yes	Yes
Self-healing	No	No	Yes*	Yes
Copy-on-write	No	Reflink only	Yes	Yes
Snapshots	No	No	Yes	Yes
Compression	No	No	Yes	Yes
Encryption	No	No	No	Yes
Built-in RAID	No	No	Partial**	Yes
Volume management	No	No	Yes	Yes
Send/receive	No	No	Yes	Yes
Online shrink	Yes	No	Yes	No
Kernel in-tree	Yes	Yes	Yes	No
Parity RAID stable	N/A	N/A	No	Yes
RAM hungry	No	No	Moderate	Yes
Boot support	Universal	Universal	GRUB only	GRUB or ZFSBootMenu

* Btrfs self-healing requires RAID1/10 profiles. RAID5/6 is not reliable.
** Btrfs RAID 0/1/10 is stable. RAID 5/6 has a known write hole and is not production-safe.

RAID implementation comparison

Feature	mdadm	Hardware RAID	Btrfs RAID	ZFS RAID
Write hole	Yes (bitmap mitigates)	BBU mitigates	Yes (RAID5/6)	No (CoW)
Checksums	No	No	Yes	Yes
Scrub intelligence	Block-level check only	Controller-dependent	Checksum-verified	Checksum-verified + auto-repair
Rebuild speed (12 disks)	Hours	Hours	Hours	Minutes (dRAID) or hours (RAIDZ)
Portability	Any Linux	Same controller model only	Any Linux with Btrfs	Any OS with OpenZFS
Per-disk visibility	Yes	No (controller abstracts)	Yes	Yes
Mixed disk sizes	Uses smallest	Uses smallest	Flexible	Uses smallest per vdev
Hot spare	Configured in mdadm.conf	Controller config	Manual	Pool property + dRAID distributed spares

Migration paths to ZFS

You can't convert an existing filesystem to ZFS in-place. Migration always involves creating a new ZFS pool and copying data. Here are the practical paths for each legacy system.

From ext4 / XFS (single disk or LVM)

# 1. Create ZFS pool on new disk(s)
zpool create -o ashift=12 -O compression=lz4 -O atime=off tank mirror /dev/sdc /dev/sdd

# 2. Copy data preserving permissions, xattrs, ACLs
rsync -avxHAX --progress /old-mount/ /tank/data/

# 3. Verify
diff -r /old-mount/ /tank/data/

# 4. Update fstab/mountpoints, reboot, decommission old disks

From mdadm RAID

# 1. Back up mdadm config
mdadm --detail --scan > /root/mdadm-backup.conf

# 2. Create ZFS pool on separate disks
zpool create -o ashift=12 -O compression=lz4 tank mirror /dev/sde /dev/sdf

# 3. Copy data
rsync -avxHAX /old-raid-mount/ /tank/data/

# 4. Stop mdadm array after verification
mdadm --stop /dev/md0
mdadm --zero-superblock /dev/sda1 /dev/sdb1

# 5. Optionally add old disks to the ZFS pool
zpool add tank mirror /dev/sda /dev/sdb

From hardware RAID

# 1. Copy data to external storage or new ZFS pool
rsync -avxHAX /old-mount/ /tank/data/

# 2. Flash RAID controller to IT/HBA mode (or replace with HBA)
#    Dell PERC: use Dell firmware utility
#    LSI: use sas2flash or sas3flash
#    This exposes raw disks to the OS

# 3. Create ZFS pool on the now-exposed raw disks
zpool create -o ashift=12 -O compression=lz4 \
  tank mirror /dev/sda /dev/sdb mirror /dev/sdc /dev/sdd

# 4. Restore data
rsync -avxHAX /backup/ /tank/data/

From Btrfs

# 1. Create ZFS pool on new disks
zpool create -o ashift=12 -O compression=zstd tank mirror /dev/sdc /dev/sdd

# 2. Use btrfs send to extract snapshots, pipe through tar
# (btrfs send/recv is Btrfs-only — can't receive into ZFS)
btrfs subvolume snapshot -r /btrfs-mount /btrfs-mount/migration-snap
rsync -avxHAX /btrfs-mount/migration-snap/ /tank/data/

# 3. Recreate subvolume structure as ZFS datasets
zfs create tank/data/home
zfs create tank/data/var
rsync -avxHAX /btrfs-mount/home/ /tank/data/home/
rsync -avxHAX /btrfs-mount/var/ /tank/data/var/

Every migration is the same pattern: create ZFS pool, rsync data, verify, switch. The boring part is waiting for rsync. For large datasets (10+ TB), do an initial rsync while the old system is live, then do a final incremental rsync during a maintenance window. Downtime measured in minutes, not hours.

When NOT to use ZFS

ZFS is the right answer for most storage needs. But not all. Here are the cases where legacy tools are simpler or more appropriate. Being honest about this makes the rest of the page more credible.

Embedded systems / tiny VMs

ZFS wants RAM. The ARC alone consumes 1–4 GB on a typical system. If you're running a 256 MB container or a resource-constrained IoT device, ext4 is the right choice. Don't force ZFS into environments where RAM is precious and data integrity isn't critical.

Ephemeral cloud instances

A spot instance that lives for 20 minutes to run a batch job doesn't need ZFS. The instance will be destroyed before checksums or snapshots provide any value. Use whatever the AMI ships with (usually ext4 or XFS).

Windows-only environments

OpenZFS on Windows exists but is experimental. If your entire infrastructure is Windows Server and you need a filesystem, NTFS or ReFS is the practical choice. Don't shoehorn ZFS into a Windows shop.

/boot and EFI system partition

UEFI firmware reads the ESP as FAT32. GRUB reads /boot as ext4 (or Btrfs or ZFS, but with caveats). For maximum compatibility, keep /boot on ext4 and ESP on FAT32. These are small, static partitions where ZFS provides no benefit.

Environments where kernel DKMS is forbidden

Some security-hardened or compliance-bound environments prohibit out-of-tree kernel modules. ZFS on Linux is a DKMS module (or kABI-tracking RPM). If your security policy bans DKMS, you can't run ZFS. Btrfs or ext4 are your options.

When you need online shrink

ZFS pools cannot be shrunk. If your workflow requires regularly reclaiming pool space by removing disks, Btrfs or LVM handles this. ZFS pools only grow — plan accordingly.

I run ext4 on /boot, FAT32 on the ESP, and ZFS on everything else. That's the right answer for 95% of Linux machines. The remaining 5% are either too small for ZFS (embedded), too ephemeral (cloud spot instances), or too locked down (FIPS environments that ban DKMS). Know your constraints. Use the right tool.

Operational complexity comparison

One of ZFS's biggest advantages isn't a feature — it's the operational simplicity of having one tool instead of five. Here's what common storage tasks look like in each stack.

Task	Legacy stack	ZFS
Create redundant storage	`mdadm --create` + `pvcreate` + `vgcreate` + `lvcreate` + `mkfs` (5 commands)	`zpool create tank mirror sda sdb` (1 command)
Take a snapshot	`lvcreate --snapshot` (LVM) or install snapper/timeshift	`zfs snapshot tank/data@now`
Replicate to remote	`rsync -avz /data/ remote:/backup/` (file-level, slow)	`zfs send -i @prev tank/data@now \| ssh remote zfs recv backup/data`
Check data integrity	No tool (ext4/XFS have no data checksums)	`zpool scrub tank`
Replace failed disk	`mdadm --manage /dev/md0 --remove` + `--add` + wait for rebuild	`zpool replace tank /dev/sda /dev/sde`
Enable compression	Not possible (ext4/XFS don't support it)	`zfs set compression=zstd tank/data`
Set quota	`edquota` (per-user) or LV size limit	`zfs set quota=100G tank/data`
Expand storage	`mdadm --grow` + `pvresize` + `lvextend` + `resize2fs` (4 commands, order matters)	`zpool add tank mirror sdc sdd` (1 command, instant)
Rollback after bad update	Restore from backup (minutes to hours)	`zfs rollback tank/root@before-update` (seconds)

Scalability limits

Limit	ext4	XFS	Btrfs	ZFS
Max filesystem size	1 EiB	8 EiB	16 EiB	256 ZiB
Max file size	16 TiB	8 EiB	16 EiB	16 EiB
Max files	4 billion (fixed at mkfs)	2⁶⁴	2⁶⁴	2⁴⁸
Max filename length	255 bytes	255 bytes	255 bytes	255 bytes
Max snapshots	N/A	N/A	Unlimited (subvolume-based)	Unlimited (2⁶⁴ theoretical)
Max disks per pool	N/A	N/A	N/A	Hundreds (practical), limited by memory

In practice, you'll hit hardware limits (RAM, CPU, disk count) long before you hit ZFS's theoretical limits. The practical ceiling for a single ZFS pool is around 2 PB on current hardware — beyond that, you're looking at distributed solutions like Ceph or Lustre.

Performance characteristics

Raw sequential throughput is not where ZFS shines compared to ext4 or XFS. ZFS's performance advantages come from compression (less data written to disk), ARC caching (hot data served from RAM), and the special vdev (metadata on SSD). Here's an honest assessment.

Sequential writes

ext4 and XFS are 5–15% faster for raw sequential writes on the same hardware. ZFS's copy-on-write overhead and checksum computation add latency. However, with compression enabled (LZ4 compresses faster than disk I/O), ZFS often writes less data to disk, making it faster in practice for compressible workloads.

Sequential reads

Roughly equivalent across all filesystems. The bottleneck is disk speed, not filesystem overhead. ZFS's ARC gives it an advantage for repeated reads of the same data.

Random IOPS (mirrors)

ZFS mirrors perform comparably to mdadm RAID1 for random I/O. The ARC gives ZFS an edge for read-heavy workloads with a hot working set. For pure random writes, ext4 on mdadm has a slight edge due to less copy-on-write overhead.

Random IOPS (RAIDZ)

RAIDZ has a significant random write penalty due to the read-modify-write cycle. This is not a ZFS bug — it's inherent to parity RAID with copy-on-write. For random I/O workloads, use mirrors. See Pool Design for details.

Metadata operations

XFS wins for raw metadata throughput (millions of creates/deletes in a single directory). ZFS is competitive with a special vdev on SSD. Without a special vdev on spinning disks, ZFS metadata operations can be 2–5x slower than XFS.

Memory tradeoff

ZFS uses RAM aggressively for ARC. This is a feature, not a bug — unused RAM is wasted RAM. ARC is adaptive and releases memory under pressure. But on systems with 4–8 GB RAM, the ARC competes with application memory. Set zfs_arc_max to cap it.

The verdict

ZFS is not just a filesystem — it's a storage platform. It replaces the entire legacy stack: partitions, volume managers, RAID arrays, encryption wrappers, snapshot tools, caching layers, integrity checkers, and replication systems. One tool. One command syntax. One failure domain.

The legacy tools aren't bad. ext4 is rock-solid. XFS is fast. mdadm works. LVM is flexible. But they're independent layers that don't talk to each other, don't checksum data, and require you to be the integration layer. You are the glue. You are the one who has to remember the right order of operations for expanding storage, the right incantation for LVM snapshots, the right flags for mdadm --grow.

ZFS eliminates the glue. It gives you an integrated system where every component — RAID, volume management, filesystem, checksums, compression, encryption, snapshots, replication — is designed to work together from the ground up. Once you've operated a ZFS system, the legacy layer cake feels like what it is: the past.

Ready to switch? The ZFS Zero to Hero tutorial walks you through creating your first pool, and Pool Design helps you choose the right VDEV layout. Or just download kldload and let the installer handle it all.

← Proxmox Performance Tuning — stop blaming ZFS, start tuning it. Common Myths — things people believe that aren't true. →