| your Linux construction kit
Source
← Back to ZFS Overview

Proxmox Performance Tuning — stop blaming ZFS, start tuning it.

Proxmox ships with ZFS support out of the box. But the defaults are not optimized for VM workloads. People install Proxmox, create VMs on ZFS, performance is terrible, and they blame ZFS. The problem isn't ZFS. The problem is that nobody tuned it.

The 8K amplification problem

Why Proxmox VMs feel slow on default ZFS

Proxmox defaults to 128K recordsize for datasets. But VM disk I/O operates in 4K-8K blocks. When a VM writes 8K, ZFS has to:

  1. Read the full 128K record that contains the 8K block
  2. Decompress it (if compression is on)
  3. Modify the 8K portion
  4. Recompress the full 128K
  5. Write the new 128K record to a new location (CoW)

That's 16x write amplification for every VM I/O operation. Your VMs aren't slow because ZFS is slow. They're slow because ZFS is reading and writing 128K to change 8K.

Imagine rewriting an entire chapter of a book to fix one typo. That's what 128K recordsize does to 8K VM I/O.

The fixes

Fix 1: Use zvols with correct volblocksize

# DON'T do this (Proxmox default — dataset-backed, 128K records)
# zfs create rpool/data/vm-100-disk-0

# DO this — zvol with 16K block size
zfs create -V 40G -s \
    -o volblocksize=16K \
    -o compression=lz4 \
    rpool/data/vm-100-disk-0

# For database VMs, use 8K
zfs create -V 40G -s \
    -o volblocksize=8K \
    rpool/data/vm-100-disk-0

16K volblocksize = 2x amplification instead of 16x. That's an 8x improvement from changing one number.

Fix 2: Tune ARC for VM workloads

# Proxmox defaults restrict ARC aggressively
# Set ARC to use up to 50% of RAM (e.g., 16GB on a 32GB host)
echo "options zfs zfs_arc_max=17179869184" > /etc/modprobe.d/zfs.conf

# Set minimum ARC (don't let the kernel starve ZFS)
echo "options zfs zfs_arc_min=4294967296" >> /etc/modprobe.d/zfs.conf

# Apply without reboot
echo 17179869184 > /sys/module/zfs/parameters/zfs_arc_max
echo 4294967296 > /sys/module/zfs/parameters/zfs_arc_min

Fix 3: Add a SLOG for sync writes

# VMs use sync writes for data integrity
# Without SLOG, every sync write waits for spinning rust

# Add an enterprise NVMe as SLOG (MUST have power loss protection)
zpool add rpool log /dev/nvme1n1

# Verify
zpool status rpool | grep log
Without SLOG: every VM commit waits for the slowest disk. With SLOG: commits hit the NVMe in microseconds, then flush to the pool in the background.

Fix 4: Add a special vdev for metadata

# Metadata operations (directory listings, file lookups) are slow on HDDs
# A mirrored SSD special vdev stores metadata on fast storage

zpool add rpool special mirror /dev/sda /dev/sdb

# Set small_blocks threshold (files smaller than this go to special vdev)
zfs set special_small_blocks=64K rpool

Fix 5: Use mirrors, not RAIDZ, for VMs

This is the most common mistake on Proxmox. RAIDZ has terrible random write performance. VMs generate random I/O. Mirrors handle random I/O linearly — each mirror pair serves requests independently.

# BAD for VMs:
# zpool create rpool raidz2 /dev/sd{a,b,c,d,e,f}

# GOOD for VMs:
zpool create rpool \
    mirror /dev/sda /dev/sdb \
    mirror /dev/sdc /dev/sdd \
    mirror /dev/sde /dev/sdf

Quick reference: Proxmox ZFS tuning

SettingDefaultRecommendedWhy
volblocksize8K16K (VMs) / 8K (DBs)Match guest I/O pattern, reduce amplification
recordsize128KDon't use datasets for VMsUse zvols instead
compressionon (lz4)lz4Keep it — nearly free and saves I/O
zfs_arc_max50% RAM50-75% RAMLet ARC cache VM hot blocks
syncstandardstandard + SLOGNever disable sync — add SLOG instead
VDEV layoutvariesMirrorsRAIDZ kills VM I/O performance
ashift1212Correct for 4K sector disks (all modern disks)
special vdevnoneMirrored SSDsAccelerates metadata for all VMs
Proxmox isn't bad. Untuned Proxmox is bad. The same ZFS that runs Netflix's CDN can run your Proxmox cluster — if you tune it for the workload. The defaults are conservative. Your workload isn't conservative. Tune accordingly.