ZFS Overview — kldload

ZFS Wiki

ZFS — The Last Word in Filesystems

ZFS is not a filesystem. Not in the way you think of ext4, XFS, or NTFS as filesystems. ZFS is an integrated storage platform — a volume manager, a RAID controller, a filesystem, a cache manager, a compression engine, a checksumming layer, and a replication system fused into a single, coherent whole. It replaces the entire traditional storage stack with one piece of software that manages everything from raw physical disks to mounted directories.

This is the landing page for the kldload ZFS Wiki — the most complete introduction to ZFS on the internet. It covers what ZFS is, where it came from, how it works under the hood, why it matters, and how to get started in five minutes. Every section links deeper into the wiki for hands-on details. If you read this page and nothing else, you'll understand ZFS better than 95% of the people who use it.

I started with ZFS on FreeBSD 8, came back to it again at the death of Solaris, and I'm on it now on Linux. I've watched it eat every other storage solution alive. The reason I built kldload around ZFS isn't because it's trendy — it's because after running it for years, I can't imagine going back to anything else. This page is the honest, complete introduction I wish someone had given me at the start.

ZFS by the Numbers

128-bit Address space. Can store 256 quadrillion zettabytes.

20 yrs In continuous production since Solaris 10 (2005).

2 Commands to manage everything. zpool + zfs.

0 Undetectable corruption. Every block checksummed.

<1ms Time to create a snapshot. Any dataset, any size.

2.2+ Current OpenZFS stable. Block cloning, RAIDZ expand.

8 Linux distros kldload installs on ZFS root.

~2x Typical LZ4 compression ratio. Free performance.

The Origin Story — Sun Microsystems, 2001–2005

In 2001, two engineers at Sun Microsystems — Jeff Bonwick and Matt Ahrens — started building a filesystem from scratch. Not an incremental improvement. Not a patch on UFS. A clean-sheet design that asked: what would storage look like if we threw away every assumption from the last 30 years and started over?

The context matters. By 2001, the Unix storage stack was a patchwork of tools from different decades. fdisk from the 1980s created partitions. md (later mdadm) assembled software RAID. LVM carved logical volumes. mkfs formatted filesystems. Each tool had its own syntax, its own state files, its own failure modes. None of them knew about each other. A RAID controller didn't know which blocks belonged to which files. The filesystem didn't know that the data underneath was mirrored. Every layer was flying blind.

Bonwick and Ahrens decided the problem wasn't any single tool — it was the layering itself. Their insight: if one system manages everything from physical disks to mounted directories, it can make guarantees that no stack of independent tools ever could. It can checksum data and verify the checksum at read time. It can self-heal corruption from a mirror copy. It can take instant snapshots with zero I/O. It can compress transparently. It can do all of this because it controls the entire path from application write to disk platter.

The name ZFS originally stood for "Zettabyte File System" — a nod to its 128-bit address space, which can theoretically store 256 quadrillion zettabytes. To put that in perspective: if you filled every grain of sand on Earth with a hard drive, you still couldn't exhaust ZFS's address space. But Bonwick later said the name was really meant to be "the last word in filesystems" — Z being the last letter of the alphabet. The ambition was to build something so complete that nobody would ever need to build another filesystem again.

ZFS was first integrated into Solaris 10 in November 2005 and open-sourced under the CDDL (Common Development and Distribution License) as part of OpenSolaris. It was immediately recognized as a generational leap. While the rest of the industry was bolting features onto 1990s-era designs, Sun shipped a filesystem that solved problems most people didn't even know they had.

Jeff Bonwick's original blog posts about ZFS are legendary reading. He described the design philosophy as: "we wanted to make administering storage as easy as filling a glass of water." That philosophy shows in everything — two commands (zpool and zfs) manage everything. No fdisk, no mdadm, no lvcreate, no mkfs. Just pools and datasets. That simplicity is the result of incredible engineering depth. If you ever get the chance to read Bonwick's 2005 blog post "ZFS: The Last Word in Filesystems," do it. It reads like a manifesto, and twenty years later, every claim in it has held up.

The Journey — Sun to Oracle to OpenZFS

ZFS has had one of the most dramatic histories in open-source software. It survived a corporate acquisition, a license war, a community fork, and a cross-platform unification. Understanding the history explains why ZFS is what it is today.

2001

Development begins at Sun Microsystems. Jeff Bonwick and Matt Ahrens start the ZFS project. The team grows to include Mark Shellenbaum, Mark Maybee, Neil Perrin, and Bill Moore. They work in secret for four years, building the entire system before anyone outside Sun sees a line of code.

2005

ZFS ships in Solaris 10 Update 2. Open-sourced under the CDDL as part of OpenSolaris. The storage world immediately recognizes it as a generational leap. Apple begins porting ZFS to macOS (the port works but never ships publicly — licensing concerns and internal politics kill it).

2007

FreeBSD integrates ZFS. Pawel Jakub Dawidek ports ZFS to FreeBSD 7.0. This becomes the first non-Solaris platform with production ZFS support. FreeBSD's ZFS integration remains the most mature on any non-illumos platform to this day.

2008

ZFS dedup and encryption appear in Solaris. Sun continues adding major features. The community grows. FreeNAS (now TrueNAS) adopts ZFS as its storage backend. NetApp sues Sun over ZFS patents (the case is eventually settled).

2010

Oracle acquires Sun for $7.4 billion. Within months, Oracle closes the OpenSolaris source code. The open-source community is cut off from future ZFS improvements. This is the near-death moment for open-source ZFS.

2010–2013

The illumos fork. The community forks the last open-source Solaris code into illumos. Garrett D'Amore, Bryan Cantrill, and others keep ZFS alive. Joyent (now Samsung) builds their SmartOS cloud platform on illumos + ZFS. Delphix builds database virtualization on it. Brian Behlendorf and LLNL (Lawrence Livermore National Laboratory) begin porting ZFS to Linux as a DKMS kernel module.

2013

ZFS on Linux reaches production quality. The "ZoL" project (ZFS on Linux) achieves stability for production workloads. Adoption begins in HPC, scientific computing, and enterprise storage. Ubuntu becomes the first major distro to ship ZFS packages.

2016

Ubuntu ships ZFS in the kernel. Canonical includes ZFS modules in their kernel packages. Their legal team considers CDDL + GPL distribution permissible via the "system library exception" argument. This dramatically accelerates ZFS adoption on Linux.

2020

OpenZFS unification. The illumos, FreeBSD, and Linux codebases merge under the OpenZFS umbrella. One codebase, multiple platforms. Feature development accelerates dramatically. The project is now the definitive open-source ZFS implementation, with contributions from iXsystems, Klara Inc., Delphix, LLNL, and dozens of independent developers.

2021

OpenZFS 2.1 ships dRAID. Distributed RAID spreads parity and spare capacity across all disks, resilvering in minutes instead of hours. Also: persistent L2ARC survives reboots, compatibility bookmarks simplify replication management.

2023–2024

OpenZFS 2.2 ships block cloning. Copy a file in near-zero time by sharing block pointers. Experimental RAIDZ expansion lets you add a disk to an existing RAIDZ vdev — the first time RAIDZ topology has been mutable. Linux 6.x kernel compatibility. Significant performance improvements across the board.

2025–2026

OpenZFS 2.3 development. RAIDZ expansion moves toward stable. Fast dedup reduces memory overhead dramatically. Continued platform improvements. The project is more active than at any point in its history, with over 200 contributors on GitHub.

Oracle's acquisition of Sun nearly killed open-source ZFS. It's one of the great what-ifs of open source history. But the community response — illumos, ZoL, and eventually the OpenZFS unification — produced something better than what Sun alone would have built. The competition between illumos and Linux implementations pushed both forward. ZFS is stronger today because it survived Oracle. That said, Oracle still runs ZFS internally in Solaris 11. Their version has features that haven't made it to OpenZFS. But the OpenZFS community moves faster now, and the unification in 2020 means every improvement benefits every platform simultaneously. The future belongs to OpenZFS, not Oracle's closed fork.

What ZFS Actually Is — Not Just a Filesystem

The most common mistake people make about ZFS is calling it a "filesystem" and comparing it to ext4 or XFS. That's like comparing a smartphone to a calculator — they overlap, but they're not the same category. ZFS is seven things in one:

Volume manager

Replaces LVM. Pools aggregate disks, datasets share space dynamically. No fixed-size partitions.

RAID controller

Replaces mdadm. Mirrors, RAIDZ1/2/3, dRAID built in. No hardware RAID card needed (or wanted).

Filesystem

Replaces ext4/XFS. POSIX-compliant, with COW, snapshots, clones, and per-dataset properties.

Cache manager

ARC (RAM cache), L2ARC (SSD cache), SLOG (write intent log). Intelligent, self-tuning.

Compression engine

Transparent per-dataset compression. LZ4, ZSTD, GZIP. Often improves performance.

Checksumming layer

Every block verified on every read. Self-healing from redundant copies. End-to-end integrity.

Replication system

Block-level send/receive. Incremental replication. Encrypted raw streams. No rsync needed.

Traditional Linux storage is a layer cake. Physical disks are partitioned with fdisk or parted. Partitions are assembled into RAID arrays with mdadm. RAID arrays are carved into logical volumes with LVM. Logical volumes are formatted with mkfs.ext4 or mkfs.xfs. Each layer is a separate tool, a separate configuration, a separate failure domain.

The traditional Linux storage stack:

Physical Disks → Partition Table (GPT) → mdadm RAID → LVM → ext4/XFS

Five layers. Five tools. Five failure modes. Five places where a misconfiguration corrupts your data.

The ZFS storage stack:

Physical Disks → ZFS

One layer. Two commands. Zero ambiguity.

ZFS replaces all of it. No partition tables (ZFS manages raw disks or partitions directly). No separate RAID controller (ZFS has mirrors, RAIDZ1/2/3, and dRAID built in). No volume manager (ZFS pools dynamically allocate space to datasets). No separate filesystem format (ZFS is the filesystem). No separate cache layer (ARC, L2ARC, and SLOG are native ZFS features).

This integration isn't just convenient — it's architecturally superior. When the filesystem knows about the RAID layout, it can optimize writes to fill full stripes. When the RAID layer knows about the filesystem, it can verify checksums end-to-end. When the cache manager knows about both, it can make intelligent decisions about what to keep in memory. No bolted-together stack of independent tools can achieve this.

Copy-on-Write — The Paradigm That Changes Everything

Every traditional filesystem (ext4, XFS, NTFS) uses in-place writes. When you modify a file, the new data overwrites the old data at the same location on disk. If power fails mid-write, you get a partially written block — corruption. That's why journals exist: they write the change to a log first, then apply it. But journals only protect metadata. Data corruption from interrupted writes is still possible on ext4 (unless you mount with data=journal, which most people don't because it halves performance).

ZFS uses copy-on-write (COW). When you modify a block, ZFS doesn't touch the original. It writes the new block to a new location on disk, updates the block pointer in the parent, and frees the old location. The on-disk data is always in a consistent state. There is no window where a power failure can leave you with half-written garbage. You either have the old data or the new data. Always.

This single design choice enables everything else ZFS does:

Instant snapshots

A snapshot is just a saved set of block pointers. Since old blocks are never overwritten, creating a snapshot costs nothing — no data copy, no I/O, no delay. It's a metadata operation that completes in milliseconds regardless of dataset size.

Instant clones

A clone is a writable snapshot. It shares all blocks with the original and only allocates new space when blocks diverge. Clone a 500GB dataset in milliseconds, use zero additional space until you change something.

Atomic transactions

Every write is a transaction. The superblock (uberblock) is updated last, atomically. If power fails before the uberblock update, the write never happened. If it fails after, the write is complete. There is no inconsistent middle state.

No fsck. Ever.

Because the on-disk state is always consistent, there is no need for a filesystem check after an unclean shutdown. The pool imports in seconds. No fsck. No journal replay. No multi-hour scan of a 10TB filesystem.

# Traditional filesystem: modify in place, pray power doesn't fail
# ZFS: write new block, update pointer, free old block
#
# Visualized:
#
#   ext4 write:    [block A] --overwrite--> [block A'] (old data gone)
#   ZFS  write:    [block A] (kept)   +   [block B] (new data at new location)
#                  pointer updated: parent now points to B
#                  block A freed (or kept if snapshot exists)

The ZFS Storage Stack — Disks to Datasets

Understanding ZFS means understanding four layers: disks, vdevs, pools, and datasets. Every ZFS deployment follows this hierarchy.

Layer 1: Physical Disks

Raw block devices. HDDs, SSDs, NVMe, even files (for testing). ZFS consumes them directly. You don't partition them first — though kldload does create a small EFI partition for booting, the rest of the disk is given to ZFS as a raw partition.

Layer 2: VDEVs (Virtual Devices)

Disks are grouped into vdevs. A vdev is the redundancy unit. A 2-disk mirror is one vdev. A 6-disk RAIDZ2 is one vdev. A single disk is one vdev (no redundancy). You can also have special-purpose vdevs: SLOG (write intent log), L2ARC (read cache), and special (metadata acceleration). A pool is made of one or more vdevs. Data is striped across vdevs. If any vdev is lost (all disks in that vdev fail beyond its redundancy level), the entire pool is lost.

Vdevs are the redundancy boundary. Pools are the stripe boundary. Never confuse the two.

Layer 3: Pool (zpool)

A pool is a collection of vdevs that presents a single, unified storage space. All vdevs in a pool contribute their capacity. Data is striped across vdevs for performance. The pool is the top-level container. You interact with it via zpool commands: zpool create, zpool status, zpool scrub, zpool add, zpool iostat.

Layer 4: Datasets and Zvols

On top of a pool live datasets (POSIX filesystems you mount and use) and zvols (block devices for VMs, iSCSI, swap). Datasets are the unit of management: each has its own properties (compression, encryption, quota, mountpoint, snapshot schedule). Datasets form a hierarchy and inherit properties from their parents. You interact with them via zfs commands: zfs create, zfs snapshot, zfs send, zfs get, zfs set.

# The full stack in one example:
#
#   Physical:    /dev/sda  /dev/sdb  /dev/sdc  /dev/sdd  /dev/nvme0n1  /dev/nvme1n1
#                    |         |         |         |           |             |
#   VDEVs:      [ mirror-0 ]      [ mirror-1 ]        [ special mirror  ]
#                sda + sdb          sdc + sdd          nvme0n1 + nvme1n1
#                    |                  |                      |
#   Pool:       [=================== rpool ===================]
#                    |
#   Datasets:   rpool/ROOT/centos   (mountpoint=/)
#               rpool/home          (mountpoint=/home)
#               rpool/home/alice    (mountpoint=/home/alice, encryption=on)
#               rpool/var/log       (mountpoint=/var/log, quota=10G)
#               rpool/vms           (zvol for VM disk images)

Key Concepts — The Complete Map

Here is every major ZFS concept, what it does, and where to learn more. This is your roadmap to the rest of the wiki.

Pools & VDEVs

The foundation. Pools aggregate vdevs. Vdevs provide redundancy. Mirror for IOPS. RAIDZ for capacity. dRAID for large arrays. Pool topology is permanent — choose carefully.

Pool Design & VDEV Layout →

Datasets

Mountable POSIX filesystems with independent properties. Compression, encryption, quotas, reservations, record size, ACLs — all per-dataset. Datasets inherit from their parent and form a hierarchy. This is the unit of management in ZFS.

ZFS Zero to Hero →

Zvols

Block devices backed by ZFS. Used for VM disk images, swap, iSCSI targets. Get all ZFS features (snapshots, replication, compression) but present as /dev/zvol/... rather than a mounted filesystem.

KVM Virtual Machines →

Snapshots

Point-in-time, read-only captures of a dataset. Created instantly (metadata-only operation). Cost zero space until blocks diverge. Accessible via the hidden .zfs/snapshot/ directory. The foundation of backup, rollback, and replication.

Snapshots & Replication →

Clones

Writable copies of a snapshot. Instant creation, zero initial space. Share all unchanged blocks with the source. Perfect for testing, dev environments, and VM templates. Promote a clone to make it independent of the source.

Snapshots Guide →

Send & Receive

Stream a dataset (or incremental changes since a snapshot) to another pool, another machine, or a file. Block-level, encrypted end-to-end if desired. This is ZFS-native replication. No rsync. No file-level crawl. The entire dataset with all metadata in one stream.

Snapshots & Replication →

Checksumming

Every block has a cryptographic checksum stored in its parent block pointer. Every read is verified. If the checksum doesn't match, ZFS knows the data is corrupt before returning it to your application. Default: fletcher4. Available: sha256, sha512, skein, edonr, blake3.

ZFS vs Everything Else →

Self-Healing

On a redundant pool (mirror or RAIDZ), when ZFS detects a checksum mismatch during a read, it automatically fetches the correct copy from another disk and repairs the bad block in place. No admin intervention. No downtime. The corruption is fixed before your application even knows it happened.

ZFS vs Everything Else →

Scrubbing

zpool scrub reads every block in the pool and verifies every checksum. On a redundant pool, it repairs any corruption it finds. Run it weekly or monthly. It's a proactive integrity check that catches problems before they become data loss.

Tuning for Workloads →

Compression

Transparent, per-dataset compression. lz4 is the default — nearly free CPU cost with ~2x compression on typical data. zstd offers better ratios for archival workloads. Compression often improves performance because fewer blocks means fewer disk I/Os. Always leave it on.

Compression & Dedup →

Encryption

Native, per-dataset encryption (AES-256-GCM). Each dataset can have its own key. Encrypted datasets can be sent/received without decryption (raw send). Keys can be passphrases, keyfiles, or external key management systems. Encryption is set at dataset creation — it cannot be added later.

ZFS Encryption →

ARC (Adaptive Replacement Cache)

ZFS's read cache lives in RAM. ARC is far smarter than the Linux page cache — it uses a combination of recency and frequency to decide what stays cached. ARC grows to fill available RAM and shrinks under memory pressure. On a 64GB server, you might see 50GB of ARC. That's not a memory leak — that's your data being served at RAM speed.

Memory & ARC →

L2ARC (Level 2 ARC)

An SSD-backed extension of ARC for when RAM isn't enough. Sits between RAM and spinning disks. Useful when your working set exceeds RAM but you still need fast reads. Not a write cache. Loses its contents on reboot (persistent L2ARC available in OpenZFS 2.0+).

Memory & ARC →

SLOG (Separate ZFS Intent Log)

Accelerates synchronous writes by moving the ZFS Intent Log to a fast, power-loss-protected device. Only helps sync-write workloads (databases, NFS, iSCSI). Not a general write cache. Must use enterprise NVMe with power loss protection — consumer SSDs defeat the purpose.

Pool Design →

Special VDEV

An SSD-based vdev that stores pool metadata and optionally small files. Dramatically accelerates ls, find, du, and metadata-heavy operations on HDD-based pools. Must be mirrored — losing an unmirrored special vdev loses the entire pool.

Pool Design →

Properties & Inheritance

Every dataset has properties: compression, encryption, quota, reservation, recordsize, atime, exec, setuid, and dozens more. Properties are inherited from parent datasets. Set a property on a parent and all children inherit it. Override on a child to diverge. zfs get all pool/dataset shows everything. zfs inherit resets to parent.

ZFS Zero to Hero →

Data Integrity — Why ZFS Exists

Data integrity is the reason ZFS was built. Not performance. Not features. The guarantee that when you read data back, it's exactly what you wrote. Every other ZFS feature — snapshots, compression, encryption, caching — is secondary to this core mission.

The problem ZFS solves is called silent data corruption (also known as bit rot). Hard drives lie. SATA cables flip bits. RAID controllers have firmware bugs. RAM without ECC corrupts data in transit. None of these produce I/O errors. Your application gets back data that looks fine but is subtly wrong. A JPEG with a few corrupted pixels. A database row with a garbled field. A binary that segfaults randomly. You don't find out until weeks or months later, long after your backups have been overwritten with the corrupted version.

ZFS solves this with three mechanisms:

End-to-end checksumming

Every block has a checksum stored in its parent block's pointer, not alongside the data itself. This is critical — if the checksum lived next to the data (like btrfs metadata), a misdirected write could corrupt both the data and its checksum together, making the corruption undetectable. ZFS stores checksums in the parent, which stores its checksum in its parent, all the way up to the uberblock (the root of the Merkle tree). A single bit flip anywhere in the tree is detected.

Self-healing with redundancy

When ZFS detects a checksum mismatch on a read, and the pool has redundancy (mirror or RAIDZ), ZFS reads the block from another copy. If that copy's checksum is valid, ZFS returns the good data to your application and overwrites the bad copy with the good one. The corruption is repaired transparently, during normal operation, with no downtime and no human intervention.

# See how many blocks ZFS has repaired automatically
zpool status tank
#   NAME         STATE     READ WRITE CKSUM
#   tank         ONLINE       0     0     0
#     mirror-0   ONLINE       0     0     0
#       sda      ONLINE       0     0     2   <-- 2 checksum errors, auto-repaired
#       sdb      ONLINE       0     0     0

Proactive scrubbing

zpool scrub reads every block on every disk and verifies every checksum. On a redundant pool, it repairs any corruption it finds. Without scrubbing, a corrupted block might sit undetected until someone reads it — by which time the redundant copy might also be corrupted. Scrubbing is ZFS's proactive defense against accumulated bit rot.

# Start a scrub
zpool scrub rpool

# Check scrub progress
zpool status rpool
#   scan: scrub in progress since Mon Apr  4 02:00:01 2026
#     1.23T scanned at 456M/s, 892G issued at 334M/s, 1.88T total
#     0 repaired, 47.45% done, 00:52:14 to go

# Schedule weekly scrubs (kldload does this by default)
systemctl enable zfs-scrub-weekly@rpool.timer

I've caught real corruption with scrubs. Twice on SATA cables that were slightly loose, once on a drive that had a firmware bug that only manifested on certain LBA ranges. In all three cases, ZFS detected and repaired the corruption automatically. On ext4 with mdraid, those same failures would have been silent — the data would have been wrong and I'd never have known. This is not theoretical. This is why I won't run production on anything else.

Performance Features

ZFS is not just safe — it's fast. The same architectural choices that enable data integrity also enable performance optimizations that traditional storage stacks can't match.

ARC — the smartest cache in the building

The Adaptive Replacement Cache is ZFS's in-RAM read cache. Unlike the Linux page cache (which uses simple LRU eviction), ARC tracks both recency and frequency of access. A file read once doesn't evict a file read a hundred times. ARC automatically tunes the balance between recently-accessed and frequently-accessed data. On a server with 64GB of RAM, ARC will grow to use 40–50GB for caching. This is not a memory leak. It's your hot data being served at RAM speed. ARC releases memory instantly when applications need it.

# Check ARC statistics
arc_summary

# Key metrics to watch:
#   ARC size:          current cache size
#   Target size (max): maximum cache will grow to
#   ARC hit ratio:     percentage of reads served from cache (aim for >90%)
#   Demand data hits:  cache hits for actual application reads

# Limit ARC to 8GB (useful for VM hosts or memory-constrained systems)
echo 8589934592 > /sys/module/zfs/parameters/zfs_arc_max

Compression — less I/O, more throughput

LZ4 compression is so fast that the CPU time to compress/decompress a block is less than the disk time saved by not reading/writing those bytes. On typical data (config files, logs, source code, documents), LZ4 achieves 2–3x compression. That means a 1TB dataset uses 400GB on disk, and reads/writes complete in half the time because half the blocks are skipped. Compression improves performance. Leave it on. Always.

# Check compression ratio on a dataset
zfs get compressratio,used,logicalused rpool/home
#   NAME        PROPERTY       VALUE  SOURCE
#   rpool/home  compressratio  2.14x  -
#   rpool/home  used           18.2G  -
#   rpool/home  logicalused    38.9G  -
# You're storing 38.9GB of data in 18.2GB of space.

# Change compression algorithm per-dataset
zfs set compression=zstd rpool/archive    # better ratio, more CPU
zfs set compression=lz4 rpool/home        # fast, good ratio (default)
zfs set compression=off rpool/vms/images  # pre-compressed VM images

Prefetch & aggregation

ZFS detects sequential read patterns and prefetches upcoming blocks before your application asks for them. It also aggregates small writes into large transactions (transaction groups, or TXGs) and flushes them periodically. This transforms random I/O patterns into sequential disk writes, which is dramatically faster on spinning disks and reduces write amplification on SSDs.

Adaptive record size

The recordsize property sets the maximum block size per dataset. Databases with 8KB pages get recordsize=8K. Media streaming gets recordsize=1M. General workloads use the default 128K. Matching record size to workload eliminates read amplification (reading more data than needed) and write amplification (rewriting more data than changed).

# Tune recordsize per workload
zfs set recordsize=8K    rpool/srv/postgres   # PostgreSQL 8K pages
zfs set recordsize=16K   rpool/srv/mysql      # MySQL/InnoDB 16K pages
zfs set recordsize=1M    rpool/srv/media      # large sequential reads
zfs set recordsize=128K  rpool/home           # general purpose (default)

Administrative Simplicity — Two Commands

The entire ZFS administration surface is two commands:

`zpool` — manages the physical layer

Create pools. Add disks. Replace failed disks. Check pool health. Run scrubs. View I/O statistics. Import and export pools. Everything about the physical storage.

zpool create rpool mirror sda sdb        # create a mirrored pool
zpool status                              # show pool health and disk status
zpool iostat -v 5                         # live I/O statistics every 5 seconds
zpool scrub rpool                         # verify all checksums
zpool replace rpool sda sdc               # hot-replace a failed disk
zpool add rpool mirror sde sdf            # expand pool with a new mirror pair
zpool history                             # audit log of every pool operation

`zfs` — manages the logical layer

Create datasets. Set properties. Take snapshots. Send/receive replication streams. Mount, unmount, encrypt, decrypt. Everything about how data is organized and managed.

zfs create rpool/home/alice                # create a dataset
zfs set quota=100G rpool/home/alice        # limit to 100GB
zfs set compression=zstd rpool/archive     # per-dataset compression
zfs snapshot rpool/home/alice@before-risky # point-in-time snapshot
zfs rollback rpool/home/alice@before-risky # undo everything since snapshot
zfs send rpool/home/alice@snap | \
  ssh backup zfs recv tank/backup/alice    # replicate to remote machine
zfs get all rpool/home/alice               # show every property
zfs list -t all -r rpool                   # list everything in the pool

That's it. No fdisk. No mdadm. No pvcreate, vgcreate, lvcreate. No mkfs. No resize2fs. No fsck. Two commands replace an entire toolchain of legacy utilities, each with its own syntax, its own configuration files, and its own failure modes.

This is the thing that hooks people. You go from needing to remember mdadm --create --level=1 --raid-devices=2 and pvcreate and vgcreate and lvcreate -L 50G and mkfs.ext4 and editing /etc/fstab and running resize2fs when you need more space... to just zpool create and zfs create. Five tools become two. Twelve steps become two. And the two are consistent, predictable, and well-documented. Once you internalize zpool for physical and zfs for logical, you never look back.

The Properties System — Inheritance and Override

Every ZFS dataset has dozens of properties that control its behavior. Properties follow an inheritance model: set a property on a parent dataset, and all children inherit it. Override on a specific child to diverge. Reset a child to re-inherit from its parent.

# Set compression on the pool — all datasets inherit it
zfs set compression=lz4 rpool
zfs get compression rpool/home         # inherited from rpool: lz4
zfs get compression rpool/var/log      # inherited from rpool: lz4

# Override on a specific dataset
zfs set compression=zstd rpool/archive
zfs get compression rpool/archive      # local: zstd (overridden)

# Check where a property value came from
zfs get -H -o name,property,value,source compression rpool/home
# rpool/home    compression    lz4    inherited from rpool

# Reset to inherited value
zfs inherit compression rpool/archive
# Now rpool/archive inherits lz4 from rpool again

This is how you manage storage policy at scale. Set your defaults at the pool level. Override per-dataset only where workloads demand it. Every property is visible via zfs get all — nothing is hidden in config files or undocumented settings.

Properties divide into two categories: native properties (managed by ZFS itself: compression, encryption, quota, recordsize) and user properties (arbitrary key-value pairs you define: zfs set com.company:backup=daily rpool/srv). User properties are useful for automation — tag datasets with metadata and let scripts query them.

ZFS in the Enterprise — Who Runs It

ZFS is not experimental. It's not hobbyist software. It runs in production at scale across every industry. Here's who trusts their data to ZFS:

Proxmox VE

The most popular open-source virtualization platform uses ZFS as a first-class storage backend. Proxmox + ZFS powers thousands of production hypervisors running VMs and containers. ZFS snapshots integrate with Proxmox backup, live migration, and replication. If you run Proxmox, you're already in the ZFS ecosystem.

iXsystems / TrueNAS

iXsystems builds TrueNAS (formerly FreeNAS), the most widely deployed ZFS-based NAS platform. TrueNAS CORE runs on FreeBSD + ZFS. TrueNAS SCALE runs on Linux + OpenZFS. iXsystems is also one of the largest contributors to the OpenZFS project and employs several core OpenZFS developers. They sell enterprise storage appliances built entirely on ZFS.

Netflix

Netflix's Open Connect CDN serves a significant fraction of global internet traffic. Their content delivery appliances run FreeBSD with ZFS, serving streaming content from ZFS pools tuned for high-throughput sequential reads. When you watch Netflix, ZFS is delivering your video.

Joyent (Samsung)

Joyent built their entire SmartOS cloud platform on illumos + ZFS. SmartOS uses ZFS for everything: OS boot (from ZFS), container storage (ZFS datasets), VM storage (ZFS zvols), and backup (ZFS send/receive). Samsung acquired Joyent and continues to run the platform.

Delphix

Delphix built a database virtualization platform on ZFS. They use ZFS clones to create instant, space-efficient copies of production databases for development and testing. A 10TB Oracle database cloned in seconds, using near-zero additional space. Delphix is one of the largest contributors to OpenZFS.

Klara Inc.

Klara (founded by former FreeBSD developers) provides commercial OpenZFS development, consulting, and support. They contribute significant features to OpenZFS including RAIDZ expansion, block cloning, and performance improvements. If a major OpenZFS feature landed in the last few years, Klara probably helped build it.

Lawrence Livermore National Laboratory

LLNL hosts the OpenZFS on Linux project and uses ZFS for high-performance computing storage. Their HPC clusters depend on ZFS for data integrity and performance on petabyte-scale datasets. The ZFS on Linux port exists because of LLNL.

The entire FreeBSD ecosystem

FreeBSD has shipped ZFS as a first-class, in-kernel filesystem since 2008. It's the default root filesystem recommendation. Every FreeBSD server, every pfSense firewall with ZFS, every FreeNAS/TrueNAS box, every FreeBSD jail host — they all run ZFS. FreeBSD's ZFS integration is the most mature on any platform.

The "who uses ZFS" question matters because of the licensing FUD. People hear "not in the mainline Linux kernel" and assume it's risky or unsupported. Meanwhile, Netflix serves a third of the internet's traffic on it. Proxmox runs hundreds of thousands of production hypervisors on it. iXsystems sells enterprise storage appliances on it. The code is battle-tested at a scale most organizations will never reach. The licensing situation is a legal nuance, not a technical risk. If your legal team approves CDDL (and most do), there is no safer storage platform available.

ZFS on Linux — OpenZFS and DKMS

ZFS is not part of the Linux kernel. It ships as an out-of-tree kernel module built via DKMS (Dynamic Kernel Module Support). When you install a new kernel, DKMS recompiles the ZFS module against the new kernel headers. Usually this works on the first try. When it doesn't — missing headers, compiler mismatch, ABI change — the module fails to build and ZFS doesn't load on next boot.

The reason ZFS isn't in the mainline kernel is licensing. ZFS is licensed under Sun's CDDL (Common Development and Distribution License). The Linux kernel is GPLv2. The FSF and some kernel developers consider these licenses incompatible for linked distribution. Linus Torvalds has declined to take a position, saying only "my lawyers tell me I should not comment." Ubuntu ships ZFS modules in their kernel packages. Canonical's legal team considers it permissible. Other distributions (Fedora, RHEL, Debian) ship ZFS only as DKMS packages that build from source.

Current OpenZFS releases: OpenZFS 2.2.x is the current stable branch (2024–2025). Key features include block cloning, Linux 6.x compatibility, improved RAIDZ expansion (experimental), and significant performance work. OpenZFS 2.3 development continues with stable RAIDZ expansion, fast dedup, and platform improvements. The project maintains compatibility with Linux kernels from ~5.x through 6.x.

# Check your OpenZFS version
zfs --version
# zfs-2.2.7-1
# zfs-kmod-2.2.7-1

# Check module is loaded
lsmod | grep zfs
# zfs                  4308992  6
# spl                   135168  1 zfs

# See all loaded ZFS-related modules
modinfo zfs | head -5
# filename:       /lib/modules/6.8.0/extra/zfs/zfs.ko
# version:        2.2.7-1
# license:        CDDL
# author:         OpenZFS
# description:    ZFS

What Makes ZFS Different from Everything Else

Other filesystems have copied individual ZFS features. Btrfs has copy-on-write and snapshots. XFS has scalability. ext4 has stability. But none of them are ZFS, and here's why:

Integration, not aggregation

Btrfs bolted RAID onto a filesystem. mdraid + LVM + ext4 stacks independent tools. ZFS was designed as one integrated system from day one. The RAID layer knows about the filesystem. The filesystem knows about the cache. The cache knows about the checksums. Every layer cooperates. This is why ZFS can self-heal, why snapshots are instant, why compression actually speeds things up. You can't get these properties from independent tools that don't know about each other.

20 years of production hardening

ZFS shipped in 2005. It has been in continuous production use for two decades. Every edge case, every failure mode, every corruption scenario has been found and fixed by millions of machines running billions of hours of I/O. Btrfs is younger, less deployed, and has had stability issues with RAID5/6 that persist to this day. mdadm is stable but dumb — it doesn't know about the data it's protecting. ZFS has the combined experience of Sun, Oracle, Netflix, iXsystems, and the entire FreeBSD ecosystem burned into its code.

The checksumming guarantee

ext4 checksums metadata (journal) but not data. XFS checksums metadata but not data. Btrfs checksums both but stores checksums inline (alongside the data), making certain corruption patterns undetectable. ZFS checksums everything and stores checksums in parent block pointers, forming a Merkle tree rooted at the uberblock. There is no arrangement of corrupted bits that ZFS cannot detect.

Operational simplicity at scale

A 100-server fleet with mdraid + LVM + ext4 requires managing hundreds of mdadm configs, /etc/fstab entries, LVM metadata, and fsck schedules. The same fleet with ZFS requires zpool status and zfs list. Pools are self-describing. Datasets are self-documenting. Properties show where they came from. History is built in. The operational overhead at scale is dramatically lower.

I'm not saying ZFS is perfect. The licensing situation is real. DKMS breaks sometimes. Pool design is permanent. Memory usage surprises people. These are real trade-offs and they're documented honestly in the section below. But the gap between ZFS and everything else is not close. It's not "ZFS is 10% better." It's "ZFS is a fundamentally different class of system." Once you've used it, going back to ext4 + mdraid + LVM feels like going back to horses after driving a car. The car has its own problems, but you're never going back to horses.

The Quick Comparison — ZFS vs Everything Else

For the full comparison, see ZFS vs Everything Else. Here's the snapshot:

Feature	ZFS	ext4 + mdraid + LVM	btrfs	XFS + mdraid + LVM
Data checksumming	Every block, Merkle tree	Metadata only (journal)	Yes, inline (weaker)	Metadata only
Self-healing	Automatic with redundancy	No	Partial (RAID1 only)	No
Snapshots	Instant, unlimited	LVM snapshots (slow, fragile)	Instant, COW-based	LVM snapshots (slow, fragile)
Compression	Per-dataset, transparent	No	Per-volume, transparent	No
Native encryption	Per-dataset, AES-256-GCM	LUKS (whole-volume only)	Not yet	LUKS (whole-volume only)
Block-level replication	`zfs send/recv`	rsync (file-level, slow)	`btrfs send/recv`	rsync (file-level, slow)
RAID5/6 stability	RAIDZ1/2/3 production-stable	mdraid stable	RAID5/6 write hole (unsafe)	mdraid stable
fsck required	Never	Yes (hours on large FS)	Rarely, but possible	Yes
In mainline kernel	No (DKMS / out-of-tree)	Yes	Yes	Yes
Production maturity	20 years, massive scale	Decades, ubiquitous	Improving, RAID5/6 still risky	Decades, enterprise proven

The one column ZFS loses is "in mainline kernel." That's real, and it has real consequences — DKMS can break, some distros don't package it, corporate legal teams get nervous. But look at the rest of the table. Every single data safety feature is green for ZFS and red or yellow for everything else. If you care about your data being correct — not just present, but correct — ZFS is the only option that actually guarantees it. Everything else is trusting the hardware to not lie. Hardware lies.

How kldload Leverages ZFS

kldload exists because ZFS is the best storage platform available and installing it on Linux is unreasonably hard. Every kldload install — across all eight supported distros (CentOS Stream, Debian, Ubuntu, Fedora, RHEL, Rocky Linux, Arch Linux, Alpine Linux) — boots on ZFS on root. Not a data partition on ZFS with ext4 for boot. The entire OS, from / to /home to /var/log, lives on ZFS datasets.

Pre-built ZFS module

kldload builds the OpenZFS kernel module at image creation time, matching the exact kernel version baked into the ISO. No DKMS compilation at install time. No missing headers. No compiler mismatches. The module is ready. The DKMS package is installed for future kernel updates, but the critical first boot doesn't depend on it.

ZFSBootMenu

kldload uses ZFSBootMenu instead of GRUB for ZFS-on-root systems. ZFSBootMenu understands ZFS natively — it can list boot environments, roll back to snapshots, boot into clones, and manage multiple OS installs on the same pool. A bad kernel update is a 15-second rollback, not a rescue USB adventure.

Boot Chain →

Sane defaults

Every pool kldload creates uses ashift=12, compression=lz4, acltype=posixacl, xattr=sa, dnodesize=auto, and autotrim=on. Datasets are split by function (/home, /var/log, /tmp, /srv) with appropriate properties per dataset. /tmp gets sync=disabled, exec=off, setuid=off, and devices=off for security hardening.

Automated snapshots

Every kldload install takes a factory snapshot at install time — your known-good baseline. Hourly automatic snapshots are enabled by default (keeping 48 hours). kldload tools take pre-upgrade snapshots before package operations. You always have a rollback point.

Cross-distro consistency

Whether you install CentOS, Debian, Ubuntu, Fedora, RHEL, Rocky, Arch, or Alpine, the ZFS configuration is identical. Same pool properties. Same dataset layout. Same snapshot automation. Same boot chain. The distro is the userland; ZFS is the foundation. Move between distros without re-learning your storage.

Five-Minute Quick Start — See ZFS in Action

Reading about ZFS is one thing. Running it is another. Here's a complete hands-on walkthrough you can run right now on any Linux machine with the OpenZFS packages installed. This uses a loopback device — no real disks required.

# 1. Create two 1GB files to simulate disks
truncate -s 1G /tmp/disk1.img /tmp/disk2.img

# 2. Create a mirrored pool (two-disk mirror, like RAID1)
zpool create -o ashift=12 testpool mirror /tmp/disk1.img /tmp/disk2.img

# 3. Enable compression (always do this)
zfs set compression=lz4 testpool

# 4. Create some datasets
zfs create testpool/data
zfs create testpool/data/important
zfs create testpool/scratch

# 5. Check what you've built
zpool status testpool
zfs list -r testpool

# 6. Write some data
cp /etc/hosts /testpool/data/important/
echo "Hello, ZFS" > /testpool/scratch/hello.txt

# 7. Take a snapshot (instant, regardless of data size)
zfs snapshot testpool/data/important@backup1

# 8. Delete the data
rm /testpool/data/important/hosts

# 9. Roll back to the snapshot (instant recovery)
zfs rollback testpool/data/important@backup1
cat /testpool/data/important/hosts  # It's back.

# 10. Check compression
zfs get compressratio testpool

# 11. See the snapshot in the hidden .zfs directory
ls /testpool/data/important/.zfs/snapshot/backup1/

# 12. Clean up
zpool destroy testpool
rm /tmp/disk1.img /tmp/disk2.img

That's a mirrored pool, three datasets with inheritance, transparent compression, an instant snapshot, instant rollback, and browseable snapshot history. In twelve commands. No partition tables. No mdadm. No LVM. No mkfs. No fstab. No fsck. This is what Jeff Bonwick meant by "as easy as filling a glass of water."

Every time I demo ZFS to someone who's been managing mdraid + LVM + ext4, the snapshot rollback is the moment their expression changes. They're used to "I deleted the file, it's gone, where's the backup tape?" With ZFS it's zfs rollback and the file is back. Sub-second, regardless of dataset size. It's not magic — it's the copy-on-write architecture making "undo" a first-class operation. But it feels like magic the first time you see it.

Under the Hood — How ZFS Manages Data

Understanding ZFS at the architectural level isn't necessary to use it, but it explains why ZFS behaves the way it does. Here's what happens when your application writes a file.

Transaction groups (TXGs)

ZFS batches writes into transaction groups. Instead of writing each block immediately, ZFS accumulates writes in memory for up to zfs_txg_timeout seconds (default: 5). When the TXG is full or the timeout fires, the entire group is committed to disk as an atomic unit. This converts random application writes into large sequential disk writes — dramatically better for both HDDs and SSDs.

# View TXG activity in real time
zpool iostat -v 1

# The TXG timeout (default 5 seconds, rarely needs tuning)
cat /sys/module/zfs/parameters/zfs_txg_timeout
# 5

The Merkle tree

ZFS organizes all data in a Merkle tree (a hash tree). Every data block has a checksum. That checksum is stored in the block's parent pointer. The parent has its own checksum stored in its parent. This chain extends all the way up to the uberblock — the root of the entire pool. A single corrupted bit anywhere in the tree causes a checksum mismatch that propagates upward to the root.

This is fundamentally different from ext4 (which checksums journal metadata but not data) and even btrfs (which stores checksums adjacent to data, not in parent pointers). ZFS's Merkle tree ensures that a misdirected write — where the drive writes to the wrong location — is always detected. If the checksum lived next to the data, a misdirected write could corrupt both together, making the corruption invisible.

The uberblock — root of trust

The uberblock is ZFS's root block. It contains the transaction group number, a timestamp, and a pointer (with checksum) to the root of the block tree. ZFS maintains an array of 128 uberblocks and writes new ones round-robin. On pool import, ZFS finds the uberblock with the highest valid transaction group number — that's the most recent consistent state. Because copy-on-write means old data is never overwritten, the pool is always in a consistent state. There is no journal to replay, no fsck to run.

# View the current uberblock
zdb -u rpool
#   Uberblock[37]
#       magic          = 0x00bab10c
#       version        = 5000
#       txg            = 1847293
#       guid_sum       = 8412736491827364918
#       timestamp      = 1712193847 UTC = Thu Apr  4 02:04:07 2026
#       rootbp = [L0 DMU objset] ...
#       checkpoint_txg = 0

ZIL and SLOG — the write path

When an application requests a synchronous write (databases, NFS, anything using O_SYNC or fsync()), the application expects the data to be on stable storage before the write call returns. ZFS handles this via the ZIL (ZFS Intent Log). The ZIL writes a compact record of the pending transaction to a reserved area of the pool. Once the ZIL write completes, ZFS tells the application the write is safe. Later, the TXG flush writes the full data blocks. If power fails between the ZIL write and the TXG flush, ZFS replays the ZIL on import to recover the in-flight transactions.

A SLOG (Separate Log device) is simply the ZIL on a dedicated, fast device instead of the pool's main disks. An NVMe drive with power-loss protection is the ideal SLOG. The SLOG only accelerates synchronous writes. Asynchronous writes (which are the majority for most workloads) bypass the ZIL entirely and go straight into TXGs.

The SLOG is the most misunderstood component in ZFS. People add SSDs as SLOGs thinking they'll speed up all writes. They won't. The SLOG only helps synchronous writes — fsync(), NFS, iSCSI, databases with sync=always. If your workload is mostly async (file servers, media streaming, general Linux usage), a SLOG does nothing. I've watched people spend $400 on an Optane drive for a SLOG on a media server and wonder why performance didn't change. Know your workload before you buy hardware.

The Honest Trade-offs

Read these before you commit. Every technology has trade-offs. ZFS tells you about them upfront instead of letting you discover them at 3 AM.

Licensing. ZFS is CDDL. Linux is GPL. They're legally incompatible for linked distribution. This is why ZFS isn't in the mainline kernel and never will be. It ships as an out-of-tree module. The code is production-ready — Oracle, Netflix, Joyent, and the entire FreeBSD ecosystem run it in production. But some organizations have legal teams that won't approve CDDL on GPL systems. Ask your legal team before you're deep into a project, not after.

Pool design is permanent. ashift cannot be changed after pool creation. RAIDZ vdev width cannot be changed. A mirror cannot become RAIDZ. Moving from RAIDZ1 to RAIDZ2 means creating a new pool and zfs send/recv everything over. This is the one decision you can't undo. Get it right the first time. Read the Pool Design page before you run zpool create.

Memory. ZFS uses RAM aggressively for caching (ARC). This is a feature — it's why ZFS is fast. But it surprises people whose monitoring alerts on "high memory usage." ARC releases memory under pressure, but tools like free and Grafana will show 80% used when the system is fine. Tune zfs_arc_max if you run memory-sensitive workloads alongside ZFS. See the Memory & ARC page.

Kernel updates. When the kernel updates, the ZFS module must be rebuilt. DKMS handles this automatically — unless it doesn't. Missing headers, ABI changes, gcc version mismatches — any of these silently break the build. The machine boots, ZFS doesn't load, monitoring says green. kldload mitigates this by pre-building the module at image time, but if you patch deployed machines in place, DKMS is still in the path. Best practice: treat machines as immutable. Rebuild the image, don't patch in place.

Not distributed. ZFS is local storage. It doesn't span machines like Ceph or GlusterFS. Replication (zfs send/recv) is asynchronous — the replica is always slightly behind the source. There is no automatic failover built in. If the primary dies, something has to promote the replica. For most workloads this is fine. If you need synchronous replication or sub-second failover, you need orchestration on top of ZFS.

Encryption key management is still yours. ZFS encrypts datasets beautifully. It does not manage the encryption keys. Where the passphrases or keyfiles are stored, how they're distributed, what happens if one is lost — that's your problem. ZFS shifts the question from "how do I encrypt" to "how do I manage the keys to the encryption." The data side is solved. The key side is your responsibility.

Scrub takes hours. zpool scrub reads every block to verify checksums. On a 10TB pool, that's 4–8 hours. During resilver (replacing a failed disk), performance degrades. This isn't unique to ZFS — mdraid has the same problem. But ZFS is honest about it. Use mirrors instead of RAIDZ if resilver speed matters to you.

ECC RAM recommendation. ZFS checksums data on disk but not in RAM. If your RAM flips a bit before ZFS writes the block, the corrupted data gets a valid checksum and the corruption is permanent. ECC RAM prevents this. ZFS doesn't require ECC — no filesystem does — but it's the one filesystem honest enough to make you think about it. Use ECC if you can. If you can't, ZFS is still better than ext4, which wouldn't have detected the corruption at all.

None of these are reasons not to use ZFS. They're reasons to use it correctly. Every filesystem has trade-offs. ZFS just tells you about them upfront.

kldload Defaults — What's Set and Why

Every kldload install applies these defaults. Every default can be overridden. Nothing is hidden. zfs get all rpool shows you everything.

Pool creation properties

ashift=12           4K sector alignment. Matches all modern drives. Permanent.
autotrim=on         SSD TRIM. Free blocks returned to the drive automatically.
compression=lz4     Always on. ~2x ratio, zero measurable CPU cost.
acltype=posixacl    Required for systemd, containers, and most Linux applications.
xattr=sa            Extended attributes stored in dnodes, not directory entries. Faster.
dnodesize=auto      Variable dnode size. Better metadata performance.
normalization=formD Unicode normalization. Consistent filename handling.
relatime=on         Relaxed atime. Reduces write amplification vs full atime.

Dataset layout

rpool/ROOT/{hostname}    mountpoint=/         Your OS. canmount=noauto (ZFSBootMenu controls it).
rpool/root               mountpoint=/root     Root home. Separate for snapshot isolation.
rpool/home               mountpoint=/home     User homes. Per-user child datasets.
rpool/srv                mountpoint=/srv      Application data.
rpool/opt                mountpoint=/opt      Optional packages.
rpool/usr/local          mountpoint=/usr/local Local binaries.
rpool/var/cache          mountpoint=/var/cache Package cache. Safe to destroy.
rpool/var/lib            mountpoint=/var/lib   State data (databases, containers).
rpool/var/log            mountpoint=/var/log   Logs. Separate so they can't fill root.
rpool/var/spool          mountpoint=/var/spool Mail/print spools.
rpool/var/tmp            mountpoint=/var/tmp   Persistent temp.
rpool/tmp                mountpoint=/tmp       Temp. sync=disabled, setuid=off, exec=off, devices=off.

/tmp hardening

sync=disabled       /tmp doesn't need write guarantees. Huge performance win.
setuid=off          No SUID binaries in /tmp. Blocks privilege escalation.
exec=off            No execution from /tmp. Blocks most tmp-based exploits.
devices=off         No device nodes in /tmp. Blocks device spoofing.

Snapshot automation

Factory snapshot      Taken at install time. Your known-good baseline.
Hourly auto-snapshots Enabled by default. Keep 48 (2 days). Systemd timer.
Pre-upgrade snapshots kldload tools snapshot before package operations.

Common Operations Cheat Sheet

These are the commands you'll use most often, grouped by task. Every one of these is covered in depth in the linked wiki pages. This section is your quick reference.

Pool health and monitoring

# The single most important ZFS command — run this daily
zpool status
#   pool: rpool
#  state: ONLINE
#   scan: scrub repaired 0B in 01:23:45 with 0 errors on Sun Mar 31 02:24:12 2026
# config:
#         NAME        STATE     READ WRITE CKSUM
#         rpool       ONLINE       0     0     0
#           mirror-0  ONLINE       0     0     0
#             sda2    ONLINE       0     0     0
#             sdb2    ONLINE       0     0     0
# errors: No known data errors     <-- This is what you want to see

# Live I/O statistics (like iostat for ZFS)
zpool iostat -v 5

# Space usage by dataset
zfs list -o name,used,avail,refer,compressratio -r rpool

# All pool events (including disk failures)
zpool events -v | tail -50

Snapshot management

# Create a snapshot (instant, regardless of dataset size)
zfs snapshot rpool/home@2026-04-04_manual

# Create recursive snapshots (all child datasets at once)
zfs snapshot -r rpool/home@before-upgrade

# List all snapshots
zfs list -t snapshot -o name,used,creation -s creation

# Browse snapshot contents without rollback
ls /home/.zfs/snapshot/2026-04-04_manual/

# Restore a single file from a snapshot (no rollback needed)
cp /home/.zfs/snapshot/2026-04-04_manual/alice/important.doc /home/alice/

# Full rollback (destroys all changes since snapshot)
zfs rollback rpool/home@2026-04-04_manual

# Destroy old snapshots
zfs destroy rpool/home@old-snapshot

Replication and backup

# Full send to a remote machine
zfs send rpool/data@snap1 | ssh backup-host zfs recv tank/backup/data

# Incremental send (only changes since last snapshot — fast)
zfs send -i rpool/data@snap1 rpool/data@snap2 | ssh backup-host zfs recv tank/backup/data

# Encrypted raw send (data never decrypted in transit)
zfs send --raw rpool/secrets@snap1 | ssh backup-host zfs recv tank/backup/secrets

# Estimate send size before starting
zfs send -nv -i rpool/data@snap1 rpool/data@snap2
# estimated size is 142M

# Save to a file instead (for portable backup)
zfs send rpool/data@snap1 | gzip > /backup/data-snap1.zfs.gz

Disk replacement

# Replace a failed disk with a new one (hot spare or manual)
zpool replace rpool /dev/sda /dev/sdc

# Monitor resilver progress
zpool status rpool
#   scan: resilver in progress since Thu Apr  4 10:15:32 2026
#     234G scanned at 456M/s, 123G resilvered at 234M/s, 52.5% done

# Bring a disk online that was temporarily removed
zpool online rpool /dev/sda

# Clear transient errors after replacing a cable
zpool clear rpool

The zpool replace command is one of those things that makes you realize how much better ZFS is than mdraid. With mdraid, you mdadm --fail, mdadm --remove, physically swap the disk, mdadm --add, then wait for the rebuild while hoping your /etc/mdadm.conf is correct and the array name matches and the partitions are right. With ZFS: zpool replace rpool old-disk new-disk. One command. No config files. No partition matching. ZFS handles everything.

Wiki Roadmap — Where to Go From Here

This overview covers the foundations. Every topic links deeper into the wiki. Here's the recommended reading order:

Pool Design

Read this first. VDEV layout is the one decision you can't undo. Mirrors vs RAIDZ vs dRAID. ashift. Special vdevs. SLOG. L2ARC. Recipes by disk count.

Snapshots & Replication

How snapshots work, how send/receive works, incremental replication, backup strategies, and sanoid/syncoid automation.

Boot Chain

How ZFS on root actually boots. EFI, ZFSBootMenu, boot environments, rollback from the bootloader.

Compression & Dedup

LZ4, ZSTD, GZIP, and when to use each. Deduplication: why it's usually a bad idea and when it's not.

Encryption

Per-dataset encryption, key management, raw send for encrypted replication, and the LUKS vs ZFS encryption comparison.

Memory & ARC

How ARC works, why your RAM looks full, how to tune zfs_arc_max, L2ARC sizing, and monitoring ARC hit ratios.

Hardware Selection

Disk selection, HBA vs RAID controller, ECC RAM, NVMe for SLOG/L2ARC, and hardware pitfalls to avoid.

Tuning for Workloads

Record size per workload, sync vs async writes, prefetch tuning, TXG timeout, and sysctl settings.

Platforms

ZFS on Linux, FreeBSD, illumos, and macOS. Platform-specific quirks, installation methods, and compatibility notes.

Proxmox Tuning

ZFS-specific Proxmox optimizations: zvol vs dataset, ARC tuning for VM hosts, snapshot integration.

ZFS vs Everything Else

Detailed comparison: ZFS vs ext4, XFS, btrfs, mdraid+LVM. Feature matrix, performance characteristics, and honest recommendations.

Common Myths

Debunking: "ZFS needs tons of RAM," "ZFS is slow," "ZFS is only for FreeBSD," "dedup is great," and more.

Resources

Books, talks, blog posts, mailing lists, and community channels for ZFS.

This isn't a future roadmap. This is what you get today. Every kldload install has all of this out of the box. Per-dataset encryption. Instant clones. Snapshot rollback. Self-healing. No fsck. No partitions. ZFS on root across nine distributions. The only thing that changes is how you think about your data.

If you've read this far, you understand why I built kldload on ZFS. It's not because it's the newest or the trendiest. It's because after 20 years and billions of hours of production runtime, it remains the only storage system that actually guarantees your data comes back the way you wrote it. Everything else is hoping the disk didn't lie. ZFS is the one system that checks. Every block. Every read. Every time. That's the foundation everything else should be built on.

← AI for Kubernetes Pool Design & VDEV Layout — the decision you can't undo. →

ZFS — The Last Word in Filesystems

ZFS by the Numbers

The Origin Story — Sun Microsystems, 2001–2005

The Journey — Sun to Oracle to OpenZFS

What ZFS Actually Is — Not Just a Filesystem

Copy-on-Write — The Paradigm That Changes Everything

Instant snapshots

Instant clones

Atomic transactions

No fsck. Ever.

The ZFS Storage Stack — Disks to Datasets

Layer 1: Physical Disks

Layer 2: VDEVs (Virtual Devices)

Layer 3: Pool (zpool)

Layer 4: Datasets and Zvols

Key Concepts — The Complete Map

Pools & VDEVs

Datasets

Zvols

Snapshots

Clones

Send & Receive

Checksumming

Self-Healing

Scrubbing

Compression

Encryption

ARC (Adaptive Replacement Cache)

L2ARC (Level 2 ARC)

SLOG (Separate ZFS Intent Log)

Special VDEV

Properties & Inheritance

Data Integrity — Why ZFS Exists

End-to-end checksumming

Self-healing with redundancy

Proactive scrubbing

Performance Features

ARC — the smartest cache in the building

Compression — less I/O, more throughput

Prefetch & aggregation

Adaptive record size

Administrative Simplicity — Two Commands

zpool — manages the physical layer

zfs — manages the logical layer

The Properties System — Inheritance and Override

ZFS in the Enterprise — Who Runs It

Proxmox VE

iXsystems / TrueNAS

Netflix

Joyent (Samsung)

Delphix

Klara Inc.

Lawrence Livermore National Laboratory

The entire FreeBSD ecosystem

ZFS on Linux — OpenZFS and DKMS

What Makes ZFS Different from Everything Else

Integration, not aggregation

20 years of production hardening

The checksumming guarantee

Operational simplicity at scale

The Quick Comparison — ZFS vs Everything Else

How kldload Leverages ZFS

Pre-built ZFS module

ZFSBootMenu

Sane defaults

Automated snapshots

Cross-distro consistency

Five-Minute Quick Start — See ZFS in Action

Under the Hood — How ZFS Manages Data

Transaction groups (TXGs)

The Merkle tree

The uberblock — root of trust

ZIL and SLOG — the write path

The Honest Trade-offs

kldload Defaults — What's Set and Why

Pool creation properties

Dataset layout

/tmp hardening

Snapshot automation

Common Operations Cheat Sheet

`zpool` — manages the physical layer

`zfs` — manages the logical layer