| pick your distro, get ZFS on root
kldload — your platform, your way, free
Source
← Back to ZFS Overview

ZFS vs Everything Else — the middleware graveyard.

Traditional Linux storage is a layer cake: partitions, volume managers, RAID arrays, filesystems, encryption wrappers, snapshot tools, caching layers — each with its own config syntax, failure modes, and on-call runbooks. ZFS replaces all of them with a single, integrated storage platform. This page is a comprehensive, honest comparison of ZFS against every legacy filesystem, volume manager, and RAID system you might be running today.

This is the page I wish existed when I was evaluating ZFS in 2016. Every comparison I found was either a ZFS fanboy hit piece or a dismissive "it's fine for NAS" from someone who never ran it in production. This page is honest. ZFS wins most comparisons, but not all — and I'll tell you exactly where the legacy tools are simpler or more appropriate.

The philosophical difference

ZFS is not "just a filesystem" the way ext4 is a filesystem. ZFS is a storage platform. It merges the volume manager, RAID controller, filesystem, snapshot engine, replication system, caching layer, compression engine, and encryption subsystem into one coherent, transactional whole. Every component shares the same on-disk format, the same transaction model, and the same checksum tree.

Legacy Linux storage treats each layer as independent: mdadm doesn't know about ext4. LVM doesn't know about LUKS. ext4 doesn't know about mdadm. When something breaks, you're debugging three or four tools that have no awareness of each other. When you want a snapshot, you need LVM thin provisioning or a separate tool like snapper. When you want replication, you need rsync or borgbackup, which operate at the file level and scale poorly.

ZFS's integrated design means every operation — writes, checksums, compression, encryption, snapshots, replication — happens in one atomic transaction group (TXG). There is no window where the filesystem is inconsistent. There is no fsck. There is no "I hope the RAID rebuild finishes before another disk dies."

The layer cake problem — ext4 + LVM + mdadm

The traditional Linux "enterprise" storage stack looks like this: mdadm assembles physical disks into a RAID array. LVM carves that array into logical volumes. LUKS encrypts each volume. ext4 or XFS sits on top. That's four independent layers, each with its own tools, its own failure modes, and zero awareness of the layers above or below it.

What goes wrong with the layer cake

Silent corruption propagates. ext4 has no checksums. If a disk returns corrupt data, ext4 stores it faithfully. mdadm has no way to know which copy is correct during a RAID1 rebuild — it picks one arbitrarily. LVM doesn't checksum anything. LUKS encrypts whatever it receives, corrupt or not. The corruption is now encrypted, replicated, and backed up. You discover it six months later when you try to open a file.

Snapshots are painful. LVM snapshots exist, but they're copy-on-write at the block level with severe performance degradation. LVM thin snapshots are better but add another layer of complexity and have their own failure modes. Neither integrates with replication.

Expansion is error-prone. Growing the stack means: growing the mdadm array, then pvresize, then lvextend, then resize2fs or xfs_growfs. Miss a step and you've got mismatched sizes. Shrinking is worse — do the steps in reverse order or lose data.

ZFS equivalent: zpool add + done. One command. One tool. One failure domain.

Here's the same operation — creating redundant, encrypted storage — in both stacks:

# Legacy: mdadm + LUKS + LVM + ext4 (15+ commands)
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
mdadm --detail --scan >> /etc/mdadm.conf
cryptsetup luksFormat /dev/md0
cryptsetup luksOpen /dev/md0 crypt0
pvcreate /dev/mapper/crypt0
vgcreate vg0 /dev/mapper/crypt0
lvcreate -L 100G -n data vg0
mkfs.ext4 /dev/vg0/data
mkdir -p /data
mount /dev/vg0/data /data
# ...plus fstab, crypttab, mdadm.conf, dracut/initramfs updates

# ZFS: one command
zpool create -o ashift=12 \
  -O compression=lz4 -O atime=off -O encryption=aes-256-gcm \
  -O keyformat=passphrase -O keylocation=prompt \
  tank mirror /dev/sda /dev/sdb
I've set up the legacy stack hundreds of times. Every time, I forget one step — the crypttab entry, the dracut rebuild, the fstab UUID. With ZFS, I run one command and walk away. The mountpoint is a dataset property. The encryption is a dataset property. The RAID is the pool topology. There is nothing else to configure.

What ZFS replaces — the complete list

LVM, VG, LV, PV
Gone. ZFS is its own volume manager. zpools define disk topology. No mapping layers. No /dev/mapper.
cryptsetup, LUKS
Gone. Native per-dataset encryption. Separate keys per dataset. Replicate encrypted without decrypting.
mdadm (software RAID)
Gone. ZFS has RAIDZ1/2/3, mirrors, stripes, hot spares, and distributed parity (dRAID). With checksumming.
ext4, XFS, btrfs
Gone. ZFS is its own copy-on-write, transactional, self-healing filesystem. No fsck. Ever.
fsck
Gone. ZFS maintains on-disk consistency at all times. Boot after a crash. It's fine.
/etc/fstab
Mostly gone. Mountpoints are dataset properties. zfs mount -a handles the rest. Only EFI needs fstab.
rsync, rsnapshot, borg
Gone. zfs send/recv does block-level incremental replication. Only changed blocks. Delta-aware. Encrypted.
bcache, lvmcache
Gone. ARC (RAM cache) + L2ARC (SSD cache) + SLOG (write intent log). All native.
snapper, timeshift, btrbk
Gone. ZFS invented the snapshot. Nobody does it better. Instant. Atomic. Mountable.
fuse-overlayfs, aufs
Gone. ZFS clones are real copy-on-write dataset copies. No overlay hacks. No FUSE performance penalty.
quota, edquota
Gone. quota=50G on a dataset. One command. Tracks used, referenced, snapshot consumption. Done.
testdisk, photorec
Gone. ZFS doesn't lose data silently. Checksums. Self-healing. Snapshots. If you need recovery tools, something went very wrong.
tune2fs, xfs_growfs
Gone. ZFS properties are dynamic, inheritable, and live. zfs set recordsize=1M tank/media. No remount. No reboot.

ZFS vs ext4 — the default vs the future

ext4 is the default filesystem on most Linux distributions. It's stable, fast, well-understood, and has been in production since 2008. It is also a filesystem and nothing else — no volume management, no RAID, no checksums, no snapshots, no replication.

Feature ext4 ZFS
ArchitectureJournaling filesystem onlyIntegrated volume manager + filesystem + RAID
Data checksumsMetadata journal only — no data checksumsSHA-256/fletcher4 on every block — data and metadata
Self-healingNo — corrupt data stays corruptYes — reads from good copy on checksum mismatch (mirrors/RAIDZ)
SnapshotsNo (requires LVM thin)Instant, zero-cost, unlimited, atomic
Replicationrsync (file-level, slow)zfs send/recv (block-level, incremental, encrypted)
RAIDRequires mdadm or hardware RAIDBuilt-in mirrors, RAIDZ1/2/3, dRAID
EncryptionRequires LUKS wrapperNative per-dataset AES-256-GCM
CompressionNoLZ4, ZSTD, gzip — transparent, per-dataset
Max filesystem size1 EiB (theoretical)256 ZiB (theoretical)
Max file size16 TiB16 EiB
Online shrinkYes (with care)No — pools cannot be shrunk
fsck requiredYes — after unclean shutdown, can take hours on large volumesNo — always consistent due to copy-on-write + TXG
RAM requirementsMinimal1 GB per TB of storage (rule of thumb); more = better ARC
Kernel inclusionIn-tree since 2.6.28Out-of-tree DKMS module (license incompatibility)
Distro supportUniversalUbuntu native; others via DKMS or kldload

Where ext4 wins: simplicity, minimal resource usage, universal kernel inclusion, and the ability to shrink filesystems. For a 512 MB embedded device or a throwaway cloud instance that stores nothing important, ext4 is perfectly fine. It boots, it works, it's boring.

Where ext4 loses: everything else. No checksums means silent data corruption goes undetected. No snapshots means no quick rollback. No built-in RAID means you need mdadm. No built-in encryption means you need LUKS. No replication means you need rsync. Each addition is another layer with its own failure modes.

I still use ext4 for /boot and the EFI system partition. It's the right tool for small, static partitions that UEFI firmware and bootloaders need to read. For everything else — root, home, data, VMs, containers — ZFS.

ZFS vs XFS — the metadata champion

XFS is the default filesystem on RHEL, CentOS, Rocky, and Fedora. It was designed by SGI for high-performance, large-scale storage. XFS excels at metadata performance — it handles millions of files in a single directory better than any other legacy filesystem. It's the only legacy filesystem that gives ZFS honest competition in some workloads.

Feature XFS ZFS
Metadata performanceExcellent — B+ tree allocation groups, delayed allocationGood — improved dramatically with special vdevs on SSD
Data checksumsNo (metadata CRCs in v5 format since 2013, but no data checksums)Yes — every block checksummed
Self-healingNoYes (with redundancy)
SnapshotsNoYes — instant, unlimited
Reflinks / CoW copiesYes (since kernel 4.9) — instant file copiesYes — clones and snapshots are CoW
RAIDRequires mdadm or hardware RAIDBuilt-in
CompressionNoLZ4, ZSTD, gzip
Online growYes (xfs_growfs)Yes (zpool add or zpool attach)
Online shrinkNoNo
Max filesystem size8 EiB256 ZiB
Parallel I/OExcellent — allocation groups enable independent parallel writesGood — multiple vdevs enable parallel I/O
Repair toolxfs_repair — fast and reliableNo fsck needed — zpool scrub for proactive verification
Production historySince 1994 (IRIX), Linux since 2001Since 2005 (Solaris), Linux since 2010 (ZoL)

Where XFS wins: raw metadata throughput on workloads with millions of small files (mail servers, build caches, package repositories). XFS allocation groups allow truly parallel metadata operations across different regions of the disk. XFS is also battle-hardened in enterprise Linux — Red Hat has invested decades into xfs_repair and xfsprogs.

Where XFS loses: no checksums on data (only metadata CRCs), no snapshots, no built-in RAID, no compression, no encryption, no replication. XFS is an excellent filesystem. But it's only a filesystem. You still need the full layer cake around it.

If someone tells me they run XFS on bare metal with mdadm RAID10, I respect that. It's a solid stack. But the moment they need snapshots, replication, or compression, they're bolting on new tools. ZFS gives you all of it from day one. And it checksums the data, which XFS still doesn't.

ZFS vs Btrfs — the closest competitor

Btrfs is the only Linux filesystem that honestly competes with ZFS on features. It has copy-on-write, snapshots, checksums, built-in RAID, compression, and subvolumes. It's in-tree in the Linux kernel. On paper, it's everything ZFS is, but GPL-licensed and natively integrated. In practice, the story is more complicated.

Feature Btrfs ZFS
LicenseGPL — in-tree kernel moduleCDDL — out-of-tree DKMS
Copy-on-writeYesYes
ChecksumsCRC32C (default), SHA-256, BLAKE2bfletcher4 (default), SHA-256, SHA-512, Skein, Edon-R, BLAKE3
Self-healingYes (with redundancy)Yes (with redundancy)
SnapshotsYes — subvolume snapshots, writableYes — dataset snapshots (read-only) + clones (writable)
CompressionLZO, ZLIB, ZSTDLZ4, GZIP, ZSTD, LZjb, ZLE
EncryptionNo native (fscrypt proposed but unmerged)Native AES-256-GCM per-dataset
RAID 0/1/10StableStable (striped vdevs, mirrors)
RAID 5/6 (parity)BROKEN — write hole, data loss riskRAIDZ1/2/3 — stable since 2005, no write hole
Send/receiveYes — subvolume-based incremental sendYes — dataset-based incremental send
DeduplicationOut-of-band (offline) since 6.13 via btrfs-dedupInline (real-time) but RAM-hungry; block cloning since 2.2
Quotasqgroups (complex, historically buggy)Simple per-dataset quota/refquota/reservation
Max filesystem size16 EiB256 ZiB
Online shrinkYesNo
Device removalYes (btrfs device remove)Limited (mirror vdevs only, via zpool remove)
RAM requirementsLower than ZFSHigher — ARC wants RAM
MaturityDeclared stable in 2013; RAID5/6 still not production-ready in 2026Production since 2005 (Solaris); OpenZFS on Linux since 2013

Where Btrfs wins: kernel inclusion (no DKMS headaches), online shrink, device removal, lower RAM requirements, and writable snapshots by default. Btrfs subvolumes are also more flexible than ZFS datasets for certain container and flatpak workflows. SUSE has run Btrfs as the default root filesystem since 2014 — for RAID1 and single-disk configurations, it's genuinely production-ready.

Where Btrfs loses: the RAID5/6 write hole is the elephant in the room. Btrfs parity RAID has a known bug where a crash during a partial stripe write can produce inconsistent parity. This has been documented since 2013 and remains unfixed in 2026. If you need parity RAID, Btrfs is not an option. ZFS RAIDZ has never had this problem — its full-stripe writes and copy-on-write design make a write hole impossible.

Btrfs also lacks native encryption (you need LUKS underneath, defeating the integrated design). Btrfs qgroups are notoriously complex and have had performance regressions. And Btrfs has a history of data loss bugs in edge cases that has eroded trust, even as the codebase has matured significantly since 2020.

I actually like Btrfs for laptop root filesystems — it's in-tree, snapshots work great with snapper, and RAID1 is solid. But the moment you need parity RAID, encryption, or you're storing data you really cannot lose, ZFS is the only answer. The write hole in Btrfs RAID5/6 has been open for over a decade. That tells you everything about the project's priorities.

ZFS vs mdadm — software RAID

mdadm is the Linux software RAID implementation. It operates at the block layer, below the filesystem. It knows nothing about the data it stores — just blocks. This is both its strength (simplicity, flexibility) and its fatal flaw (no data integrity).

Feature mdadm ZFS
RAID levels0, 1, 4, 5, 6, 10Stripe, mirror, RAIDZ1/2/3, dRAID
Data checksumsNo — relies on disk firmwareYes — every block
Write hole (RAID5/6)Yes — requires battery-backed write journal or write-intent bitmapNo — copy-on-write eliminates write hole by design
Rebuild intelligenceRebuilds entire disk, even empty spaceOnly rebuilds allocated blocks
Hot spare activationManual or mdadm.conf-basedAutomatic (hot spares or dRAID distributed spares)
Scrubecho check > /sys/block/md0/md/sync_actionzpool scrub tank — verifies checksums, repairs from good copies
Monitoringmdmonitor daemon + emailzpool status, zed daemon, JSON events
Filesystem awarenessNone — just blocksFully integrated — RAID and filesystem are one

The write hole is mdadm's most dangerous problem. In RAID5/6, a power failure during a write can leave parity inconsistent with data. On next boot, mdadm has no way to know which blocks are correct. The write-intent bitmap mitigates this but doesn't eliminate it. ZFS's copy-on-write design makes a write hole physically impossible — new data is always written to new locations, and the uberblock pointer is updated atomically.

# mdadm: create RAID1 + filesystem (multiple tools, multiple steps)
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
mkfs.ext4 /dev/md0
mount /dev/md0 /data

# ZFS: one command, same result, plus checksums + snapshots + compression
zpool create -o ashift=12 -O compression=lz4 tank mirror /dev/sda /dev/sdb

ZFS vs hardware RAID — the vendor trap

Hardware RAID controllers (Dell PERC, HP Smart Array, LSI MegaRAID, Broadcom) move RAID computation to a dedicated chip. For decades, this was considered the "enterprise" approach. In 2026, hardware RAID is a liability for ZFS — and increasingly for everything else too.

Feature Hardware RAID ZFS
Data checksumsNo — RAID controller doesn't checksum dataYes — end-to-end
Write holeMitigated by BBU/supercap (when battery is healthy)Impossible by design
BBU dependencyYes — dead battery = write hole returnsNo battery needed
Controller failureNeed identical replacement controller or data is lostImport pool on any machine with ZFS
Vendor lock-inProprietary on-disk format — locked to controller vendorOpen format — portable across any OpenZFS platform
VisibilityOS sees one virtual disk — no per-disk SMART, no per-disk errorsFull visibility into every disk — SMART, error counters, I/O stats
SnapshotsNoYes
Cost$200–$2000 per controller + batteryFree (use HBA in IT/JBOD mode)
Firmware bugsOpaque — firmware bugs have caused silent data corruptionOpen source — bugs are visible, reported, and fixed publicly

The controller failure scenario is the killer. If your Dell PERC H740 dies, you need another H740 (or compatible) to read the array. If that model is discontinued, you're buying used cards on eBay and praying. With ZFS, you pull the disks, put them in any machine running OpenZFS, and zpool import tank. Done.

The BBU dependency is the second killer. Hardware RAID controllers rely on a battery backup unit to protect the write cache during power failure. Batteries degrade. When the BBU reports degraded, the controller disables write-back caching and performance falls off a cliff. Or worse: the battery is dead but the controller doesn't report it, and you have an unprotected write cache. ZFS doesn't need a battery because copy-on-write never overwrites live data.

If you have a server with a hardware RAID controller, flash it to IT mode (HBA passthrough) or replace it with an HBA. ZFS needs to see the raw disks. Dell PERC in HBA mode, LSI 9300/9400 in IT mode — these are the standard approaches. Never run ZFS on top of a hardware RAID virtual disk. You lose all of ZFS's self-healing and per-disk visibility.

ZFS vs LVM — the volume manager

LVM2 is the standard Linux volume manager. It provides logical volumes, thin provisioning, snapshots (of a sort), and the ability to span or stripe across multiple disks. ZFS replaces LVM entirely — datasets are the equivalent of logical volumes, but with far more capabilities.

Feature LVM2 ZFS
Thin provisioningYes (LVM thin)Yes — datasets are thin by default
SnapshotsCoW snapshots (thick LVs: severe performance penalty; thin: better but complex)Instant, zero overhead, unlimited
Snapshot performanceClassic LVM snapshots degrade performance 30–80%Zero performance impact
QuotasLV size is the quotaPer-dataset quota, refquota, reservation — granular control
ChecksumsNoYes
CompressionNoYes
ShrinkYes (lvreduce + resize2fs)No
ComplexityPV → VG → LV → filesystem (four concepts)Pool → dataset (two concepts)
ReplicationNo built-in replicationzfs send/recv

Where LVM wins: online shrink (ZFS pools cannot shrink), deep integration with every Linux distro's installer, and simpler mental model for admins who only need basic volumes. LVM also integrates with LUKS and mdadm in well-documented ways.

Where LVM loses: LVM classic snapshots are notoriously slow — every write to the origin volume triggers a copy-on-write to the snapshot exception store, degrading performance by 30–80%. LVM thin snapshots are better but add significant complexity (thin pools, metadata volumes, autoextend thresholds). ZFS snapshots are free — zero performance impact, zero configuration.

ZFS vs Ceph — local vs distributed

Ceph is a distributed storage system that provides block (RBD), object (RADOS), and file (CephFS) storage across a cluster of machines. Comparing ZFS to Ceph is comparing a local storage platform to a distributed one — they solve different problems, but the comparison comes up constantly because both are used for "serious" storage.

Feature Ceph ZFS
ScopeDistributed across multiple nodesLocal to one machine (or replicated via send/recv)
Minimum nodes3 (for quorum)1
Operational complexityHigh — MON, OSD, MDS, MGR daemons; CRUSH maps; PG placementLow — zpool and zfs commands
Self-healingYes — re-replicates on node failureYes — resilvers on disk failure
SnapshotsRBD snapshots, CephFS snapshotsDataset snapshots
ScalePetabytes across hundreds of nodesPetabytes on a single node (practical limit ~2 PB)
Network dependencyRequires dedicated storage network (10GbE minimum, 25GbE recommended)None — local I/O
LatencyNetwork-bound (100µs–1ms typical)Disk-bound (10–100µs NVMe, 1–5ms HDD)
Use caseMulti-tenant cloud, OpenStack/Kubernetes PVs, geographically distributed dataSingle-node servers, NAS, VM hosts, databases, workstations

Ceph wins when you need data accessible across multiple machines simultaneously, when you need to survive entire node failures without service interruption, or when you're building a cloud platform that serves block storage to hundreds of VMs.

ZFS wins when you need local storage performance, operational simplicity, or you're running on a single machine. Fun fact: many Ceph clusters use ZFS as the OSD backing store (via BlueStore on raw ZFS zvols) to get checksumming and compression underneath Ceph's distributed layer.

Ceph is amazing technology, but it's a full-time job. I've seen teams of three engineers dedicated solely to Ceph operations. If you don't have that staffing, ZFS + zfs send/recv to a remote backup gives you 90% of the resilience at 10% of the operational cost. Don't deploy Ceph unless you genuinely need distributed storage.

ZFS vs DRBD — synchronous replication

DRBD (Distributed Replicated Block Device) provides synchronous block-level replication between two nodes. It's often used for database HA: primary writes to local disk and DRBD simultaneously replicates every write to the secondary. If the primary dies, the secondary has an identical copy.

Feature DRBD ZFS send/recv
Replication modeSynchronous (Protocol C) or asyncAsynchronous (snapshot-based incremental)
RPOZero (sync mode — no data loss on failover)Last snapshot interval (typically 1–15 minutes)
Write latency impactEvery write waits for remote acknowledge (adds network RTT)None — replication is decoupled from writes
BandwidthContinuous — mirrors every write in real timeBatched — only transfers changed blocks per snapshot
ComplexityModerate — DRBD resource config, Pacemaker/Corosync for failoverLow — cron job or sanoid/syncoid
Multi-targetYes (DRBD 9 supports 2+ secondaries, but complex)Yes — send to multiple targets trivially
ChecksumsNetwork CRC only — no on-disk data checksumsFull on-disk checksums on both sides

DRBD wins when you absolutely need zero RPO — database HA clusters where losing even one transaction is unacceptable. Synchronous replication guarantees the secondary has every committed write.

ZFS wins when you can tolerate a few minutes of potential data loss (which is most workloads). zfs send/recv is dramatically simpler to operate, doesn't impact write latency, and includes checksums on both sides. For most server replication, syncoid --no-sync-snap tank tank/backup@remote in a cron job is all you need.

The master comparison table

Every major feature across every storage technology in one table. This is the reference.

Feature ext4 XFS Btrfs ZFS
Data checksumsNoNoYesYes
Metadata checksumsJournalCRC32CYesYes
Self-healingNoNoYes*Yes
Copy-on-writeNoReflink onlyYesYes
SnapshotsNoNoYesYes
CompressionNoNoYesYes
EncryptionNoNoNoYes
Built-in RAIDNoNoPartial**Yes
Volume managementNoNoYesYes
Send/receiveNoNoYesYes
Online shrinkYesNoYesNo
Kernel in-treeYesYesYesNo
Parity RAID stableN/AN/ANoYes
RAM hungryNoNoModerateYes
Boot supportUniversalUniversalGRUB onlyGRUB or ZFSBootMenu

* Btrfs self-healing requires RAID1/10 profiles. RAID5/6 is not reliable.
** Btrfs RAID 0/1/10 is stable. RAID 5/6 has a known write hole and is not production-safe.

RAID implementation comparison

Feature mdadm Hardware RAID Btrfs RAID ZFS RAID
Write holeYes (bitmap mitigates)BBU mitigatesYes (RAID5/6)No (CoW)
ChecksumsNoNoYesYes
Scrub intelligenceBlock-level check onlyController-dependentChecksum-verifiedChecksum-verified + auto-repair
Rebuild speed (12 disks)HoursHoursHoursMinutes (dRAID) or hours (RAIDZ)
PortabilityAny LinuxSame controller model onlyAny Linux with BtrfsAny OS with OpenZFS
Per-disk visibilityYesNo (controller abstracts)YesYes
Mixed disk sizesUses smallestUses smallestFlexibleUses smallest per vdev
Hot spareConfigured in mdadm.confController configManualPool property + dRAID distributed spares

Migration paths to ZFS

You can't convert an existing filesystem to ZFS in-place. Migration always involves creating a new ZFS pool and copying data. Here are the practical paths for each legacy system.

From ext4 / XFS (single disk or LVM)

# 1. Create ZFS pool on new disk(s)
zpool create -o ashift=12 -O compression=lz4 -O atime=off tank mirror /dev/sdc /dev/sdd

# 2. Copy data preserving permissions, xattrs, ACLs
rsync -avxHAX --progress /old-mount/ /tank/data/

# 3. Verify
diff -r /old-mount/ /tank/data/

# 4. Update fstab/mountpoints, reboot, decommission old disks

From mdadm RAID

# 1. Back up mdadm config
mdadm --detail --scan > /root/mdadm-backup.conf

# 2. Create ZFS pool on separate disks
zpool create -o ashift=12 -O compression=lz4 tank mirror /dev/sde /dev/sdf

# 3. Copy data
rsync -avxHAX /old-raid-mount/ /tank/data/

# 4. Stop mdadm array after verification
mdadm --stop /dev/md0
mdadm --zero-superblock /dev/sda1 /dev/sdb1

# 5. Optionally add old disks to the ZFS pool
zpool add tank mirror /dev/sda /dev/sdb

From hardware RAID

# 1. Copy data to external storage or new ZFS pool
rsync -avxHAX /old-mount/ /tank/data/

# 2. Flash RAID controller to IT/HBA mode (or replace with HBA)
#    Dell PERC: use Dell firmware utility
#    LSI: use sas2flash or sas3flash
#    This exposes raw disks to the OS

# 3. Create ZFS pool on the now-exposed raw disks
zpool create -o ashift=12 -O compression=lz4 \
  tank mirror /dev/sda /dev/sdb mirror /dev/sdc /dev/sdd

# 4. Restore data
rsync -avxHAX /backup/ /tank/data/

From Btrfs

# 1. Create ZFS pool on new disks
zpool create -o ashift=12 -O compression=zstd tank mirror /dev/sdc /dev/sdd

# 2. Use btrfs send to extract snapshots, pipe through tar
# (btrfs send/recv is Btrfs-only — can't receive into ZFS)
btrfs subvolume snapshot -r /btrfs-mount /btrfs-mount/migration-snap
rsync -avxHAX /btrfs-mount/migration-snap/ /tank/data/

# 3. Recreate subvolume structure as ZFS datasets
zfs create tank/data/home
zfs create tank/data/var
rsync -avxHAX /btrfs-mount/home/ /tank/data/home/
rsync -avxHAX /btrfs-mount/var/ /tank/data/var/
Every migration is the same pattern: create ZFS pool, rsync data, verify, switch. The boring part is waiting for rsync. For large datasets (10+ TB), do an initial rsync while the old system is live, then do a final incremental rsync during a maintenance window. Downtime measured in minutes, not hours.

When NOT to use ZFS

ZFS is the right answer for most storage needs. But not all. Here are the cases where legacy tools are simpler or more appropriate. Being honest about this makes the rest of the page more credible.

Embedded systems / tiny VMs

ZFS wants RAM. The ARC alone consumes 1–4 GB on a typical system. If you're running a 256 MB container or a resource-constrained IoT device, ext4 is the right choice. Don't force ZFS into environments where RAM is precious and data integrity isn't critical.

Ephemeral cloud instances

A spot instance that lives for 20 minutes to run a batch job doesn't need ZFS. The instance will be destroyed before checksums or snapshots provide any value. Use whatever the AMI ships with (usually ext4 or XFS).

Windows-only environments

OpenZFS on Windows exists but is experimental. If your entire infrastructure is Windows Server and you need a filesystem, NTFS or ReFS is the practical choice. Don't shoehorn ZFS into a Windows shop.

/boot and EFI system partition

UEFI firmware reads the ESP as FAT32. GRUB reads /boot as ext4 (or Btrfs or ZFS, but with caveats). For maximum compatibility, keep /boot on ext4 and ESP on FAT32. These are small, static partitions where ZFS provides no benefit.

Environments where kernel DKMS is forbidden

Some security-hardened or compliance-bound environments prohibit out-of-tree kernel modules. ZFS on Linux is a DKMS module (or kABI-tracking RPM). If your security policy bans DKMS, you can't run ZFS. Btrfs or ext4 are your options.

When you need online shrink

ZFS pools cannot be shrunk. If your workflow requires regularly reclaiming pool space by removing disks, Btrfs or LVM handles this. ZFS pools only grow — plan accordingly.

I run ext4 on /boot, FAT32 on the ESP, and ZFS on everything else. That's the right answer for 95% of Linux machines. The remaining 5% are either too small for ZFS (embedded), too ephemeral (cloud spot instances), or too locked down (FIPS environments that ban DKMS). Know your constraints. Use the right tool.

Operational complexity comparison

One of ZFS's biggest advantages isn't a feature — it's the operational simplicity of having one tool instead of five. Here's what common storage tasks look like in each stack.

Task Legacy stack ZFS
Create redundant storage mdadm --create + pvcreate + vgcreate + lvcreate + mkfs (5 commands) zpool create tank mirror sda sdb (1 command)
Take a snapshot lvcreate --snapshot (LVM) or install snapper/timeshift zfs snapshot tank/data@now
Replicate to remote rsync -avz /data/ remote:/backup/ (file-level, slow) zfs send -i @prev tank/data@now | ssh remote zfs recv backup/data
Check data integrity No tool (ext4/XFS have no data checksums) zpool scrub tank
Replace failed disk mdadm --manage /dev/md0 --remove + --add + wait for rebuild zpool replace tank /dev/sda /dev/sde
Enable compression Not possible (ext4/XFS don't support it) zfs set compression=zstd tank/data
Set quota edquota (per-user) or LV size limit zfs set quota=100G tank/data
Expand storage mdadm --grow + pvresize + lvextend + resize2fs (4 commands, order matters) zpool add tank mirror sdc sdd (1 command, instant)
Rollback after bad update Restore from backup (minutes to hours) zfs rollback tank/root@before-update (seconds)

Scalability limits

Limit ext4 XFS Btrfs ZFS
Max filesystem size1 EiB8 EiB16 EiB256 ZiB
Max file size16 TiB8 EiB16 EiB16 EiB
Max files4 billion (fixed at mkfs)264264248
Max filename length255 bytes255 bytes255 bytes255 bytes
Max snapshotsN/AN/AUnlimited (subvolume-based)Unlimited (264 theoretical)
Max disks per poolN/AN/AN/AHundreds (practical), limited by memory

In practice, you'll hit hardware limits (RAM, CPU, disk count) long before you hit ZFS's theoretical limits. The practical ceiling for a single ZFS pool is around 2 PB on current hardware — beyond that, you're looking at distributed solutions like Ceph or Lustre.

Performance characteristics

Raw sequential throughput is not where ZFS shines compared to ext4 or XFS. ZFS's performance advantages come from compression (less data written to disk), ARC caching (hot data served from RAM), and the special vdev (metadata on SSD). Here's an honest assessment.

Sequential writes

ext4 and XFS are 5–15% faster for raw sequential writes on the same hardware. ZFS's copy-on-write overhead and checksum computation add latency. However, with compression enabled (LZ4 compresses faster than disk I/O), ZFS often writes less data to disk, making it faster in practice for compressible workloads.

Sequential reads

Roughly equivalent across all filesystems. The bottleneck is disk speed, not filesystem overhead. ZFS's ARC gives it an advantage for repeated reads of the same data.

Random IOPS (mirrors)

ZFS mirrors perform comparably to mdadm RAID1 for random I/O. The ARC gives ZFS an edge for read-heavy workloads with a hot working set. For pure random writes, ext4 on mdadm has a slight edge due to less copy-on-write overhead.

Random IOPS (RAIDZ)

RAIDZ has a significant random write penalty due to the read-modify-write cycle. This is not a ZFS bug — it's inherent to parity RAID with copy-on-write. For random I/O workloads, use mirrors. See Pool Design for details.

Metadata operations

XFS wins for raw metadata throughput (millions of creates/deletes in a single directory). ZFS is competitive with a special vdev on SSD. Without a special vdev on spinning disks, ZFS metadata operations can be 2–5x slower than XFS.

Memory tradeoff

ZFS uses RAM aggressively for ARC. This is a feature, not a bug — unused RAM is wasted RAM. ARC is adaptive and releases memory under pressure. But on systems with 4–8 GB RAM, the ARC competes with application memory. Set zfs_arc_max to cap it.

The verdict

ZFS is not just a filesystem — it's a storage platform. It replaces the entire legacy stack: partitions, volume managers, RAID arrays, encryption wrappers, snapshot tools, caching layers, integrity checkers, and replication systems. One tool. One command syntax. One failure domain.

The legacy tools aren't bad. ext4 is rock-solid. XFS is fast. mdadm works. LVM is flexible. But they're independent layers that don't talk to each other, don't checksum data, and require you to be the integration layer. You are the glue. You are the one who has to remember the right order of operations for expanding storage, the right incantation for LVM snapshots, the right flags for mdadm --grow.

ZFS eliminates the glue. It gives you an integrated system where every component — RAID, volume management, filesystem, checksums, compression, encryption, snapshots, replication — is designed to work together from the ground up. Once you've operated a ZFS system, the legacy layer cake feels like what it is: the past.

Ready to switch? The ZFS Zero to Hero tutorial walks you through creating your first pool, and Pool Design helps you choose the right VDEV layout. Or just download kldload and let the installer handle it all.