Proxmox & kldload
Both run on bare metal. Both support KVM/QEMU. Both have ZFS. Both target homelabs and self-hosted infrastructure. But they're solving different problems — and they can work together.
The honest take: If you want to spin up VMs through a polished GUI with minimal expertise, Proxmox is mature and it works. Use it.
If you want a proper foundation — ZFS all the way down, kernel-level observability, direct GPU access, no vendor costs, fully auditable — kldload on bare metal with the kvm profile gives you a hypervisor with instant ZFS clones, automated snapshots, and DR replication that Proxmox can't match.
Or run both: Proxmox as the management layer, kldload VMs with disk passthrough for workloads that need real ZFS.
What they are
Proxmox VE
A hypervisor platform. Its entire identity is managing VMs and LXC containers through a web UI. The OS underneath is Debian, heavily modified and locked to Proxmox's tooling. ZFS is a storage option — not the foundation.
kldload
A base image factory and operating system. You pick your distro (8 options), root it in ZFS, and select a profile: desktop, server, kvm, storage, ai, or core. The kvm profile gives you a full hypervisor with ZFS zvols, instant clones, automated hourly snapshots, DR replication, and CLI tools (kvm-create, kvm-clone, kvm-snap, kvm-replicate). You own the entire stack.
The comparison
| Proxmox VE | kldload (kvm profile) | |
|---|---|---|
| ZFS | Storage option. Bolted on. Boot environments not native. | The substrate. Boot environments, zvols, snapshots before every change. |
| VM cloning | Linked clone (qcow2 backing file chain). | kvm-clone — ZFS zvol clone, instant, no chain, no dependency. |
| VM snapshots | qcow2 snapshot chain (degrades with depth). | kvm-snap — ZFS snapshot, atomic, unlimited, zero overhead. |
| VM replication | Proxmox replication (cluster required). | kvm-replicate — incremental zfs send to any host over WireGuard. |
| Cost | Free tier exists. Enterprise ~€110/yr/socket. Nag screen. | BSD-3-Clause. Zero. Forever. No nag screen, no phone home. |
| Distro | Debian only (modified). | 8 distros. CentOS, Debian, Ubuntu, Fedora, RHEL, Rocky, Arch, Alpine. |
| GPU | PCI passthrough. IOMMU pain, single-VM binding. | Direct access. Drivers on first boot. Container GPU sharing. |
| Observability | No native eBPF. Need external tools. | eBPF + bcc + bpftrace on first boot. Kernel-level. No SaaS. |
| WireGuard | Supported. Manual setup. | Kernel module. 4-plane mesh. Backplane architecture documented. |
| Offline | Needs internet for most operations. | Full package mirrors baked in. No internet required. |
| Web UI | Excellent. Mature. The best thing about Proxmox. | Installer UI. CLI is the centerpiece: kvm-create/clone/snap/replicate. |
zfs send to any host that speaks ZFS — it doesn't need to be running kldload, doesn't need a cluster, doesn't need any coordination. The VM data is just ZFS — it goes wherever ZFS goes.The recommendation
Three paths, pick one:
Path 1: kldload bare metal with the kvm profile. ZFS talks directly to disks. Best performance. Full kvm-create/clone/snap/replicate tools. This is what the KVM tutorial teaches.
Path 2: Proxmox host with PCI disk passthrough to kldload VMs. Guest ZFS talks to real hardware. No double-ZFS penalty.
Path 3: Proxmox host with kldload VMs on virtio-scsi (non-ZFS storage). No double-CoW. Guest gets ZFS features, host uses LVM-thin.
Why double-ZFS hurts
ARC cache thrashing — both pools cache the same blocks. RAM burned twice.
Write amplification — CoW twice for every write. Double I/O, double latency.
Unpredictable performance — two pools competing for disk I/O, flushing independently.
Confusing diagnostics — zpool iostat in the guest doesn't reflect reality.
What works
Option 1: kldload on bare metal with KVM. ZFS talks directly to the disks. Best performance.
Option 2: Proxmox host with PCI passthrough. Guest's ZFS talks to real hardware.
Option 3: Proxmox host with virtio-scsi on non-ZFS storage (LVM-thin, ext4). No double-CoW.
The double-ZFS problem
┌─────────────────────────────────────────────────┐
│ Proxmox host │
│ zpool: rpool (ZFS on physical NVMe) │
│ └── vm-100-disk-0.qcow2 │
│ ┌──────────────────────────────────┐ │
│ │ kldload VM │ │
│ │ zpool: rpool (ZFS on /dev/vda) │ │
│ │ ├── ROOT/default │ │
│ │ ├── home │ │
│ │ └── srv │ │
│ └──────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
Two ARC caches: Both ZFS layers maintain their own ARC (Adaptive Replacement Cache). The host caches the VM’s virtual disk blocks, and the guest caches its own filesystem blocks. This means the same data can sit in memory twice.
Double compression: If both layers use lz4, the guest compresses data, then the host tries to compress the already-compressed blocks (gaining nothing but burning CPU).
Double checksumming: Both layers verify block integrity. Redundant for data correctness, costs CPU.
Write amplification: The guest writes to its pool, which generates I/O to the virtual disk. The host’s pool then processes that I/O, potentially amplifying the number of physical writes due to COW on both layers.
When double-ZFS is fine
- Development and testing — you want the real kldload ZFS experience in a disposable VM
- Small workloads — the overhead is negligible when VM I/O is low
- You need kldload’s ZFS features — boot environments, snapshots, ksnap, kbe, kupgrade — these only work with ZFS on root
When to avoid double-ZFS
- Production storage nodes — if the VM is an NFS/iSCSI server or runs databases with heavy I/O
- Memory-constrained hosts — two ARCs eating RAM is wasteful
- High-throughput workloads — write amplification hurts
Alternative: bare-metal kldload + KVM
For production, install kldload directly on the hardware (bare metal) and run VMs on top using KVM/libvirt. You get one ZFS layer with full performance:
┌──────────────────────────────────────────────────┐
│ kldload bare metal (kvm profile) │
│ zpool: rpool │
│ ├── ROOT/default (host OS) │
│ ├── vms/web-1 → zvol → /dev/zvol/rpool/vms/web-1 │
│ ├── vms/web-2 → zvol → /dev/zvol/rpool/vms/web-2 │
│ ├── vms/db-1 → zvol → /dev/zvol/rpool/vms/db-1 │
│ └── vms/isos → dataset (zstd compressed) │
│ │
│ kvm-create db-1 --ram 8192 --disk 200 │
│ kvm-clone db-1 db-test (instant, zero-copy) │
│ kvm-snap db-1 (atomic ZFS snapshot) │
│ kvm-replicate rpool/vms/db-1 dr-host │
└──────────────────────────────────────────────────┘
VMs use zvols — raw block devices on ZFS. No qcow2 layer. The host’s ZFS handles compression, snapshots, clones, and replication at the block level. VMs inside don’t need their own ZFS — the host provides all the ZFS benefits transparently.
zfs send to any host, zfs rollback in seconds, zfs clone that’s instant regardless of disk size.If you do run on Proxmox: tuning tips
Disable compression on one layer
# Option A: disable on the Proxmox side (let kldload handle it)
# On the Proxmox host:
zfs set compression=off rpool/data/vm-100-disk-0
# Option B: disable on the kldload guest side (let Proxmox handle it)
# Inside the kldload VM:
zfs set compression=off rpool
Option A is usually better — let the guest control its own compression settings per-dataset.
Cap the guest ARC
Inside the kldload VM, reduce the ARC so the host has more memory for its own ARC:
# Inside the kldload VM — limit ARC to 1GB
echo "options zfs zfs_arc_max=1073741824" > /etc/modprobe.d/zfs-arc.conf
# Rebuild initramfs and reboot
dracut --force # CentOS/RHEL
update-initramfs -u # Debian
reboot
Use virtio-scsi with discard
In Proxmox VM settings: - Bus: SCSI (virtio-scsi-single) - Discard: On (enables TRIM passthrough) - IO Thread: On
# Proxmox CLI
qm set 100 --scsi0 local-zfs:vm-100-disk-0,discard=on,iothread=1,ssd=1
Inside the kldload VM, enable autotrim:
zpool set autotrim=on rpool
This lets the guest’s ZFS tell the host “I’m not using these blocks anymore,” which the host’s ZFS can then free.
Allocate fixed RAM (no ballooning)
Ballooning and ZFS ARC don’t mix well — the ARC doesn’t shrink gracefully when the balloon inflates:
# Proxmox CLI — fixed 8GB, no balloon
qm set 100 --memory 8192 --balloon 0
Use raw disk format instead of qcow2
On Proxmox with ZFS storage, raw format avoids the qcow2 layer:
# Create VM with raw disk on ZFS
qm set 100 --scsi0 local-zfs:40,format=raw
Proxmox’s ZFS storage backend uses zvols for raw disks — this means the guest’s ZFS writes go through a zvol, which is more efficient than going through a qcow2 file on top of a ZFS dataset.
ZFS special devices (Proxmox host)
If your Proxmox host has NVMe SSDs, consider using ZFS special vdevs to accelerate metadata operations:
# On the Proxmox host — add a special vdev (metadata + small blocks)
zpool add rpool special mirror /dev/nvme0n1p4 /dev/nvme1n1p4
# Set small block threshold (blocks ≤64K go to the special vdev)
zfs set special_small_blocks=64K rpool
This accelerates ls, find,
zfs list, and any metadata-heavy operation, which benefits
VMs because their virtual disk metadata is served from fast storage.
Recommended Proxmox VM settings for kldload
# Create the VM
qm create 100 \
--name kldload-node \
--machine q35 \
--cpu host \
--cores 4 \
--memory 8192 \
--balloon 0 \
--bios ovmf \
--efidisk0 local-zfs:1,efitype=4m,pre-enrolled-keys=0 \
--tpmstate0 local-zfs:1,version=v2.0 \
--scsi0 local-zfs:40,discard=on,iothread=1,ssd=1 \
--scsihw virtio-scsi-single \
--ide2 local:iso/kldload-free-latest.iso,media=cdrom \
--net0 virtio,bridge=vmbr0 \
--serial0 socket \
--boot order="ide2;scsi0" \
--ostype l26
These settings match the kldload
deploy.sh proxmox-deploy defaults: q35 machine, host CPU
passthrough, OVMF UEFI, TPM 2.0, virtio-scsi with discard, serial
console.
Migrating from Proxmox VM to bare metal
zfs send the pool to bare metal, reinstall the bootloader, reboot. Your data, your ZFS snapshots, your entire configuration — all preserved. You're not reinstalling, you're relocating.If you outgrow the double-ZFS setup:
# Inside the Proxmox kldload VM
kexport raw
# Write the raw image to bare-metal disk
dd if=kldload-export-*.raw of=/dev/nvme0n1 bs=4M status=progress conv=sparse oflag=sync
sync
Or use zfs send/receive:
# Inside the VM — send the pool to a USB backup disk
zfs snapshot -r rpool@migrate
zfs send -R rpool@migrate | ssh bare-metal zfs receive -F rpool
Then reinstall the bootloader on the bare-metal disk:
# Boot from kldload ISO on bare metal
krecovery import rpool
krecovery reinstall-bootloader /dev/nvme0n1
reboot