Containers on ZFS — Docker, Podman & Firecracker.
Containers on overlay2 are disposable by default. Containers on ZFS are disposable by choice.
Every layer is a dataset. Every volume is a dataset. Every dataset gets checksums, compression,
snapshots, clones, and send/recv. You can snapshot before docker pull,
rollback a bad image in seconds, clone a volume for testing at zero cost,
and replicate your entire container state to another host with syncoid.
kldloadOS ships with ZFS as the root filesystem. Docker, Podman, and Firecracker all sit on top of it. This page shows you how to configure each one, how to secure them, and how to use ZFS to do things that overlay2 cannot.
1. Docker on ZFS storage driver
By default, Docker uses overlay2. On kldloadOS, you switch it to the ZFS storage driver. Each container layer becomes a ZFS dataset. Each image layer becomes a ZFS dataset. Docker manages the datasets automatically — you just tell it to use ZFS.
Configure Docker to use ZFS
# Create a dedicated dataset for Docker
zfs create -o mountpoint=/var/lib/docker rpool/docker
zfs create rpool/docker/volumes
# Configure the storage driver
mkdir -p /etc/docker
cat > /etc/docker/daemon.json <<'EOF'
{
"storage-driver": "zfs",
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"default-ulimits": {
"nofile": { "Name": "nofile", "Hard": 65536, "Soft": 65536 }
}
}
EOF
# Restart Docker
systemctl restart docker
# Verify
docker info | grep -A5 'Storage Driver'
# Storage Driver: zfs
# Zpool: rpool
# Zpool Health: ONLINE
# Parent Dataset: rpool/docker
# Check ZFS datasets created by Docker
zfs list -r rpool/docker | head -20
What ZFS gives Docker
With the ZFS storage driver, every docker pull creates ZFS datasets for each image layer.
Every docker run creates a ZFS clone for the container's writable layer.
This means:
# Snapshot before pulling a new image
zfs snapshot rpool/docker@before-pull-$(date +%F)
docker pull nginx:latest
# Something wrong with the new image? Rollback.
docker stop $(docker ps -q)
zfs rollback rpool/docker@before-pull-2026-03-23
# See compression savings on container data
zfs get compressratio rpool/docker
# rpool/docker compressratio 2.83x -
# That's 2.83x compression on all image layers. Free disk space.
# Clone a volume for testing (instant, zero disk cost)
zfs snapshot rpool/docker/volumes/myapp-data@test
zfs clone rpool/docker/volumes/myapp-data@test rpool/docker/volumes/myapp-data-test
# Replicate Docker state to another host
syncoid -r rpool/docker root@node2:rpool/docker
2. Podman — rootless, daemonless
Podman runs containers without a daemon and without root. It's CLI-compatible with Docker
(alias docker=podman and most scripts work). On kldloadOS,
Podman uses ZFS for storage just like Docker does.
Podman on ZFS
# Install Podman (already included in kldloadOS desktop/server profiles)
dnf install -y podman
# Rootless setup — runs as your user, no daemon, no root
podman info | grep -A3 graphDriver
# graphDriverName: zfs
# Run a container (same syntax as Docker)
podman run -d --name web -p 8080:80 nginx:alpine
# Podman Compose (drop-in for docker compose)
dnf install -y podman-compose
podman-compose up -d
# Generate systemd unit from a running container
podman generate systemd --name web --files --new
systemctl --user enable --now container-web.service
# Rootless means: no daemon to crash, no root to exploit,
# and the container runs as your UID with user namespaces
When to use Podman vs Docker
Use Podman when you want rootless containers, systemd integration, no long-running daemon, or you're running on a machine where Docker's daemon model is a liability (single-user servers, CI runners, embedded systems).
Use Docker when you need Docker Compose with full feature parity, Docker Swarm, or your team's tooling depends on the Docker API socket.
Use both — they coexist on kldloadOS. Same images, same registries, same OCI format.
3. Private registry on ZFS
A private registry stores your images locally. On ZFS, the registry data is compressed, checksummed, snapshotable, and replicable. No cloud registry fees, no egress charges, no dependency on someone else's infrastructure.
Set up a local registry
# Create a ZFS dataset for the registry
zfs create -o compression=zstd rpool/srv/registry
# Run the registry
docker run -d --restart=always --name registry \
-p 5000:5000 \
-v /srv/registry:/var/lib/registry \
registry:2
# Tag and push an image
docker tag myapp:latest localhost:5000/myapp:latest
docker push localhost:5000/myapp:latest
# Pull from the registry (from any host on the network)
docker pull 192.168.1.10:5000/myapp:latest
# Snapshot the registry before changes
ksnap /srv/registry
# Replicate the registry to another host
syncoid rpool/srv/registry root@node2:rpool/srv/registry
# Check compression savings
zfs get compressratio rpool/srv/registry
# Container images compress well — expect 2-4x with zstd
TLS and authentication
# Generate a self-signed cert (or use Let's Encrypt)
mkdir -p /srv/registry/certs /srv/registry/auth
openssl req -x509 -nodes -days 3650 -newkey rsa:4096 \
-keyout /srv/registry/certs/registry.key \
-out /srv/registry/certs/registry.crt \
-subj "/CN=registry.local"
# Create htpasswd auth
dnf install -y httpd-tools
htpasswd -Bc /srv/registry/auth/htpasswd admin
# Run registry with TLS + auth
docker run -d --restart=always --name registry \
-p 5000:5000 \
-v /srv/registry/data:/var/lib/registry \
-v /srv/registry/certs:/certs \
-v /srv/registry/auth:/auth \
-e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/registry.crt \
-e REGISTRY_HTTP_TLS_KEY=/certs/registry.key \
-e REGISTRY_AUTH=htpasswd \
-e REGISTRY_AUTH_HTPASSWD_REALM="kldload Registry" \
-e REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd \
registry:2
# Login from any host
docker login registry.local:5000
4. Compose patterns for common stacks
Real Compose files for real stacks. All volumes on ZFS. All images pinned. All health checks defined. Copy, adjust, deploy.
Web + Database + Cache
# docker-compose.yml — typical web application stack
services:
web:
image: nginx:1.27-alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./html:/usr/share/nginx/html:ro
depends_on:
app:
condition: service_healthy
restart: unless-stopped
app:
image: node:22-alpine
working_dir: /app
volumes:
- ./app:/app
command: node server.js
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:3000/health"]
interval: 10s
timeout: 5s
retries: 3
environment:
- DATABASE_URL=postgresql://app:secret@db:5432/myapp
- REDIS_URL=redis://cache:6379
restart: unless-stopped
db:
image: postgres:16-alpine
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: myapp
POSTGRES_USER: app
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d myapp"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
cache:
image: redis:7-alpine
command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 3
restart: unless-stopped
volumes:
pgdata:
driver: local
# Create a ZFS dataset for the postgres volume with tuned recordsize
zfs create -o recordsize=16k rpool/docker/volumes/pgdata
# Snapshot before deploying
ksnap /var/lib/docker/volumes/pgdata
# Deploy
docker compose up -d
5. Firecracker microVMs vs Docker containers
Docker containers share the host kernel. Firecracker microVMs each get their own kernel in a lightweight VM that boots in under 125 milliseconds. Different isolation levels, different use cases, same ZFS storage underneath.
Docker containers
Isolation: cgroups + namespaces (process-level)
Boot time: milliseconds
Overhead: near zero (shares host kernel)
Use for: web apps, databases, caches, microservices, CI/CD
Risk: kernel exploit in container = host compromise
Firecracker microVMs
Isolation: hardware virtualization (VM-level)
Boot time: <125ms
Overhead: ~5MB per microVM
Use for: untrusted code, CI runners, serverless functions, sandboxing
Risk: VM escape required for host compromise (much harder)
Firecracker on kldloadOS
# Download Firecracker
ARCH=$(uname -m)
curl -L "https://github.com/firecracker-microvm/firecracker/releases/latest/download/firecracker-v1.9.1-${ARCH}.tgz" | \
tar xz -C /usr/local/bin --strip-components=1
# Prepare rootfs on ZFS
zfs create rpool/srv/firecracker
# (copy or build your rootfs.ext4 and vmlinux kernel here)
# Snapshot the clean rootfs — rollback after each run
zfs snapshot rpool/srv/firecracker@clean
# Launch a microVM
firecracker --api-sock /tmp/fc.sock --config-file config.json
# After the workload completes, rollback to pristine state
zfs rollback rpool/srv/firecracker@clean
# Next microVM starts with a perfectly clean filesystem. Every time.
6. Resource limits — cgroups v2
kldloadOS uses cgroups v2 (unified hierarchy). Every container gets explicit resource limits. No container should be able to starve the host or other containers of CPU, memory, or I/O.
Resource limits in practice
# Memory limit: container is OOM-killed if it exceeds 512MB
docker run -d --memory=512m --memory-swap=512m myapp
# CPU limit: container gets at most 1.5 CPU cores
docker run -d --cpus=1.5 myapp
# CPU shares: relative weight (default 1024)
docker run -d --cpu-shares=512 myapp # half priority
docker run -d --cpu-shares=2048 myapp # double priority
# I/O limit: cap write throughput to 50MB/s
docker run -d --device-write-bps /dev/zd0:50mb myapp
# PID limit: prevent fork bombs
docker run -d --pids-limit=100 myapp
# In Compose:
services:
app:
image: myapp:latest
deploy:
resources:
limits:
cpus: '2.0'
memory: 1G
reservations:
cpus: '0.5'
memory: 256M
7. Networking
Containers need to talk to each other, to the host, and to the outside world. Docker provides bridge, host, macvlan, and overlay networks. Pick the right one.
Bridge (default)
Containers get a private IP on a virtual bridge. Port mapping (-p 80:80)
exposes services to the host network. Containers on the same bridge resolve each other by name.
Good for most workloads.
docker network create mynet
docker run -d --network mynet --name web nginx
docker run -d --network mynet --name app myapp
# app can reach web via: http://web:80
Macvlan (LAN IP)
Each container gets its own IP on the physical LAN. No port mapping, no NAT. The container appears as a separate host to the rest of the network. Good for services that need to be discoverable on the LAN (NFS, DNS, DHCP).
docker network create -d macvlan \
--subnet=192.168.1.0/24 \
--gateway=192.168.1.1 \
-o parent=eth0 lannet
docker run -d --network lannet \
--ip 192.168.1.50 --name dns \
pihole/pihole
WireGuard overlay
Connect containers across hosts using WireGuard tunnels. Each host runs WireGuard, containers route through it. Encrypted, fast, and works across the internet. Good for multi-site container clusters without Kubernetes.
# On each host, set up WireGuard
# (see WireGuard Masterclass page)
# Then route container traffic through wg0
docker network create \
--subnet=10.10.0.0/16 \
-o com.docker.network.bridge.name=wg-br \
wg-containers
8. Container security
A container is only as secure as its configuration. Default Docker settings are more permissive than they should be. Lock them down.
Security hardening checklist
# 1. Rootless containers (Podman does this by default)
podman run --user 1000:1000 myapp
# 2. Read-only root filesystem
docker run --read-only --tmpfs /tmp --tmpfs /run myapp
# 3. Drop ALL capabilities, add only what's needed
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp
# 4. No new privileges (prevent setuid binaries)
docker run --security-opt=no-new-privileges myapp
# 5. Seccomp profile (restrict syscalls)
docker run --security-opt seccomp=strict-profile.json myapp
# 6. AppArmor profile
docker run --security-opt apparmor=docker-custom myapp
# 7. Resource limits (prevent resource exhaustion)
docker run --memory=512m --cpus=1.0 --pids-limit=100 myapp
# 8. Non-root user in Dockerfile
# FROM alpine:3.19
# RUN adduser -D -u 1000 appuser
# USER appuser
# CMD ["/app/server"]
# 9. Scan images for vulnerabilities
trivy image myapp:latest
# 10. Never use --privileged unless you have a specific,
# documented reason. --privileged gives the container
# full access to the host. It defeats the purpose of
# containerization.
9. ZFS advantages for containers
Here is what you can do with ZFS under your containers that you cannot do with overlay2, ext4, or XFS.
Snapshot before docker pull
A bad image can break your stack. Snapshot the entire Docker dataset before pulling.
If the new image causes problems, zfs rollback restores every layer,
every volume, every container to the exact state before the pull.
Clone volumes for testing
Need to test a database migration? zfs clone the production volume.
The clone is instant, shares blocks with the original, and costs zero disk space
until the test writes diverge. Delete it when done.
Compression on all layers
ZFS compresses every image layer and every volume with zstd. Container images are highly compressible (text files, binaries, libraries). Expect 2-4x compression. That is 2-4x more images on the same disk.
Checksums on all data
ZFS checksums every block of every container layer and volume. Silent data corruption (bit rot) is detected and auto-repaired from redundancy. overlay2 on ext4 does not checksum anything. A corrupt image layer is served silently.
Send/recv containers
Replicate your entire Docker state to another host with
syncoid -r rpool/docker root@node2:rpool/docker.
All images, all volumes, all layers. Incremental. Efficient. No Docker registry needed.
Rollback bad images
Pulled nginx:latest and it broke your TLS config? Stop the container,
zfs rollback to before the pull, start the container.
You are back to the exact previous image. Seconds, not a rebuild.
Containers are not magic. They are Linux namespaces, cgroups, and a layered filesystem. The filesystem matters. overlay2 gives you layers. ZFS gives you layers plus checksums, compression, snapshots, clones, and replication. Same containers, different foundation. The containers don't know the difference. You will.
Run Docker for the ecosystem. Run Podman for rootless. Run Firecracker for isolation. Run all three on ZFS and stop worrying about silent corruption, unreversible upgrades, and volumes you cannot replicate.