Serverless & MicroVMs — Firecracker on ZFS
Firecracker is the virtual machine monitor that powers AWS Lambda and Fargate. It is open source, it runs on KVM, and it boots a microVM in under 125 milliseconds. Each microVM is a real virtual machine with its own Linux kernel — not a container, not a namespace trick, not a shared-kernel sandbox. Hardware-level isolation at container speed.
The kldload KVM profile installs Firecracker, jailer, and firectl from the darksite automatically. This page shows you how to use them.
Without ZFS, every Firecracker microVM needs its own copy of the rootfs image — 100 VMs means 100 copies on disk. With ZFS clones, you snapshot the golden rootfs once and zfs clone it for each microVM. 100 VMs in 15 seconds. Zero extra disk until they diverge. This is Lambda on your hardware without the Lambda bill. When the VMs are destroyed, the clones vanish instantly — no garbage collection, no orphaned images, no disk space slowly leaking away.
The economics of ZFS clones + Firecracker:
AWS Lambda charges per millisecond of compute. Lambda runs on Firecracker. Firecracker boots a VM in 125ms. The expensive part isn't the compute — it's the storage. Every Lambda invocation needs a rootfs. AWS solves this with proprietary block storage. You solve it with zfs clone. Clone is instant. Clone is free. Clone shares all blocks with the original. 1,000 microVMs running the same base image? One copy on disk. The clones only use space for what each VM writes uniquely.
What this unlocks for your services:
Per-request isolation. An API request comes in that needs to run untrusted code? Spin up a Firecracker microVM from a ZFS clone. Execute in isolation. Return the result. Destroy the VM. The clone used zero disk. The VM is gone. No container escape risk because it was a real VM with its own kernel. Total time: ~200ms including boot, execute, destroy.
Multi-tenant compute. Each customer gets their own microVM fleet, cloned from a golden image. Customer A's VMs can't see Customer B's VMs — hardware isolation, not namespace isolation. Each customer's data lives on a separate ZFS dataset with its own encryption key. Destroy the clones when the customer's session ends. Zero residue. The next customer gets fresh clones from the same golden image.
Batch processing. You have 10,000 images to process. Clone the golden worker image 100 times. Each clone processes 100 images. Workers finish, clones are destroyed. Total extra disk during processing: whatever the workers wrote (results). The base images were shared. The clones were free. Scale horizontally at the storage layer, not the compute layer.
1. What Firecracker is
MicroVMs, not containers
A container shares the host kernel. A microVM boots its own kernel. Firecracker strips the virtual machine down to the absolute minimum — no BIOS, no PCI bus, no USB, no GPU passthrough. Just a kernel, a rootfs, a network interface, and a block device. That is why it boots in under 125ms and uses about 5MB of memory overhead per VM.
KVM underneath
Firecracker is a KVM virtual machine monitor written in Rust. It talks to /dev/kvm
directly. No QEMU, no libvirt, no management layer. The KVM profile in kldload enables
kvm_intel / kvm_amd and sets up the necessary device permissions. If your
hardware supports VT-x or AMD-V, Firecracker works.
The jailer
Firecracker ships with jailer — a wrapper that puts each microVM inside a
chroot with its own cgroup, UID, seccomp filter, and network namespace. Even if the guest kernel is
compromised, the attacker lands inside a cgroup jail with no access to the host filesystem, no
outbound network unless you explicitly route it, and no syscalls beyond a minimal whitelist.
2. Why ZFS makes it better
Every microVM needs a rootfs — a filesystem image it boots from. Without ZFS, you are copying multi-megabyte images for every VM. With ZFS, you are cloning them in milliseconds for zero space.
Snapshot before execution
Snapshot the rootfs dataset before the microVM boots. If the function mutates the filesystem,
roll back to the snapshot after execution. Every invocation starts from a clean state. No image rebuilds,
no re-downloads. One zfs rollback and you are back to byte-identical pristine.
Clone for parallel execution
zfs clone creates a writable copy of a snapshot in under a second regardless of the
dataset size. Need 50 VMs? Clone 50 datasets. They share every block with the original until a write
diverges. Disk usage is proportional to what each VM changes, not what it contains.
Compress idle images
ZFS compression=zstd on the rootfs dataset means idle images take a fraction of their
raw size on disk. A 200MB Alpine rootfs compresses to about 70MB. You are not paying for that compression
at runtime — the ARC caches decompressed blocks in RAM.
Replicate golden images
zfs send / zfs recv replicates a rootfs snapshot to another node
in one command. Build the golden image once, ship it to every host. Incremental sends mean updates
transfer only the changed blocks.
3. Setup
The KVM profile handles the heavy lifting. kldload-firstboot calls setup_kvm(),
which installs firecracker, jailer, and firectl from the darksite
binaries in /root/darksite/binaries/. It also creates
/var/lib/firecracker/rootfs/ and /run/firecracker/.
Verify the install
# All three should return version info
firecracker --version
jailer --version
firectl --version
# Confirm KVM is available
ls -l /dev/kvm
# crw-rw---- 1 root kvm 10, 232 ... /dev/kvm
Create the rootfs dataset
# ZFS dataset for all microVM rootfs images
zfs create -o compression=zstd -o recordsize=64k \
-o mountpoint=/var/lib/firecracker/rootfs rpool/firecracker
# Create the golden Alpine rootfs
mkdir -p /tmp/alpine-rootfs
cd /tmp/alpine-rootfs
# Bootstrap a minimal Alpine root
curl -fsSL https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-minirootfs-3.20.0-x86_64.tar.gz \
| tar xzf -
# Add OpenRC init and a shell
chroot /tmp/alpine-rootfs /bin/sh -c '
apk add --no-cache openrc
ln -s agetty /etc/init.d/agetty.ttyS0
echo ttyS0 > /etc/securetty
rc-update add agetty.ttyS0 default
rc-update add devfs boot
rc-update add procfs boot
rc-update add sysfs boot
echo "root:firecracker" | chpasswd
'
# Create an ext4 rootfs image from the chroot
truncate -s 200M /var/lib/firecracker/rootfs/alpine.ext4
mkfs.ext4 /var/lib/firecracker/rootfs/alpine.ext4
mount /var/lib/firecracker/rootfs/alpine.ext4 /mnt
cp -a /tmp/alpine-rootfs/* /mnt/
umount /mnt
rm -rf /tmp/alpine-rootfs
Get a kernel
Firecracker needs an uncompressed Linux kernel (vmlinux). You can extract one from the
host or download a prebuilt one.
# Option A: extract vmlinux from the host kernel
/usr/src/kernels/$(uname -r)/scripts/extract-vmlinux \
/boot/vmlinuz-$(uname -r) > /var/lib/firecracker/vmlinux
# Option B: use the Firecracker CI kernel (known-good, minimal)
curl -fsSL https://s3.amazonaws.com/spec.ccfc.min/ci-artifacts/kernels/x86_64/vmlinux-5.10.217 \
-o /var/lib/firecracker/vmlinux
Set up a tap device for networking
# Create a tap device for the microVM
ip tuntap add dev tap0 mode tap
ip addr add 172.16.0.1/24 dev tap0
ip link set tap0 up
# Enable NAT so the microVM can reach the outside
iptables -t nat -A POSTROUTING -o br0 -j MASQUERADE
iptables -A FORWARD -i tap0 -o br0 -j ACCEPT
iptables -A FORWARD -i br0 -o tap0 -m state --state RELATED,ESTABLISHED -j ACCEPT
echo 1 > /proc/sys/net/ipv4/ip_forward
Launch a microVM
# Launch with firectl — the fastest way to get a VM running
firectl \
--kernel=/var/lib/firecracker/vmlinux \
--root-drive=/var/lib/firecracker/rootfs/alpine.ext4 \
--tap-device=tap0/aa:fc:00:00:00:01 \
--kernel-opts="console=ttyS0 reboot=k panic=1 pci=off ip=172.16.0.2::172.16.0.1:255.255.255.0::eth0:off"
# The VM boots in under 125ms. You get a serial console.
# Login: root / firecracker
4. Lambda-style function runner
This is the pattern that Lambda uses internally: create a microVM, run a function, capture the output, destroy the VM. Each invocation is completely isolated. The rootfs rolls back to the snapshot between invocations so every execution starts from byte-identical state.
The function runner script
#!/bin/bash
# /usr/local/bin/kfc-run — kldload Firecracker function runner
# Usage: kfc-run <function-name> [args...]
set -euo pipefail
FUNC_NAME="${1:?Usage: kfc-run <function-name> [args...]}"
shift
FUNC_ARGS="$*"
FC_ROOT="/var/lib/firecracker"
ROOTFS_BASE="${FC_ROOT}/rootfs"
KERNEL="${FC_ROOT}/vmlinux"
FUNC_DIR="/srv/functions/${FUNC_NAME}"
VM_ID="fc-$(date +%s%N | sha256sum | head -c 8)"
CLONE_DS="rpool/firecracker/${VM_ID}"
# ── Validate ──────────────────────────────────────────────────────────────
[[ -d "${FUNC_DIR}" ]] || { echo "ERROR: function dir not found: ${FUNC_DIR}"; exit 1; }
[[ -f "${FUNC_DIR}/handler.sh" ]] || { echo "ERROR: no handler.sh in ${FUNC_DIR}"; exit 1; }
# ── Clone the golden rootfs snapshot ──────────────────────────────────────
zfs clone rpool/firecracker/golden@base "${CLONE_DS}"
CLONE_MOUNT=$(zfs get -H -o value mountpoint "${CLONE_DS}")
# Copy the function into the clone
mkdir -p "${CLONE_MOUNT}/srv/function"
cp -a "${FUNC_DIR}/." "${CLONE_MOUNT}/srv/function/"
# Write the invocation script
cat > "${CLONE_MOUNT}/srv/function/invoke.sh" <<INVOKE
#!/bin/sh
cd /srv/function
exec /srv/function/handler.sh ${FUNC_ARGS} 2>&1
INVOKE
chmod +x "${CLONE_MOUNT}/srv/function/invoke.sh"
# Create the ext4 image from the clone
ROOTFS_IMG="/tmp/${VM_ID}.ext4"
truncate -s 200M "${ROOTFS_IMG}"
mkfs.ext4 -q "${ROOTFS_IMG}"
mount "${ROOTFS_IMG}" /mnt
cp -a "${CLONE_MOUNT}/." /mnt/
umount /mnt
# ── Create tap device ─────────────────────────────────────────────────────
TAP="tap-${VM_ID:0:8}"
ip tuntap add dev "${TAP}" mode tap
ip addr add 172.16.0.1/30 dev "${TAP}"
ip link set "${TAP}" up
# ── Launch the microVM ────────────────────────────────────────────────────
SOCKET="/tmp/${VM_ID}.sock"
# Start firecracker in the background
firecracker --api-sock "${SOCKET}" &
FC_PID=$!
sleep 0.1
# Configure the VM via the API
curl -s --unix-socket "${SOCKET}" -X PUT "http://localhost/boot-source" \
-H "Content-Type: application/json" \
-d "{
\"kernel_image_path\": \"${KERNEL}\",
\"boot_args\": \"console=ttyS0 reboot=k panic=1 pci=off init=/srv/function/invoke.sh ip=172.16.0.2::172.16.0.1:255.255.255.0::eth0:off\"
}"
curl -s --unix-socket "${SOCKET}" -X PUT "http://localhost/drives/rootfs" \
-H "Content-Type: application/json" \
-d "{
\"drive_id\": \"rootfs\",
\"path_on_host\": \"${ROOTFS_IMG}\",
\"is_root_device\": true,
\"is_read_only\": false
}"
curl -s --unix-socket "${SOCKET}" -X PUT "http://localhost/network-interfaces/eth0" \
-H "Content-Type: application/json" \
-d "{
\"iface_id\": \"eth0\",
\"guest_mac\": \"aa:fc:00:00:00:01\",
\"host_dev_name\": \"${TAP}\"
}"
curl -s --unix-socket "${SOCKET}" -X PUT "http://localhost/machine-config" \
-H "Content-Type: application/json" \
-d "{\"vcpu_count\": 1, \"mem_size_mib\": 128}"
# Start the VM
curl -s --unix-socket "${SOCKET}" -X PUT "http://localhost/actions" \
-H "Content-Type: application/json" \
-d "{\"action_type\": \"InstanceStart\"}"
# ── Wait for the function to finish (max 30 seconds) ─────────────────────
TIMEOUT=30
while kill -0 "${FC_PID}" 2>/dev/null && [[ ${TIMEOUT} -gt 0 ]]; do
sleep 1
((TIMEOUT--))
done
# ── Cleanup ───────────────────────────────────────────────────────────────
kill "${FC_PID}" 2>/dev/null || true
wait "${FC_PID}" 2>/dev/null || true
ip link del "${TAP}" 2>/dev/null || true
rm -f "${ROOTFS_IMG}" "${SOCKET}"
zfs destroy "${CLONE_DS}"
echo "--- ${FUNC_NAME} completed (VM: ${VM_ID}) ---"
Create the golden rootfs snapshot
# Create the base dataset and snapshot for cloning
zfs create -o compression=zstd -o recordsize=64k \
-o mountpoint=/var/lib/firecracker/rootfs/golden rpool/firecracker/golden
# Copy in your prepared Alpine rootfs
cp -a /tmp/alpine-rootfs/* /var/lib/firecracker/rootfs/golden/
# Snapshot it — this is the template every function clone uses
zfs snapshot rpool/firecracker/golden@base
# Every kfc-run invocation clones from this snapshot
# The clone is instant, regardless of rootfs size
Example: image resize function
# Create the function directory
mkdir -p /srv/functions/image-resize
# The handler — this runs inside the microVM
cat > /srv/functions/image-resize/handler.sh <<'EOF'
#!/bin/sh
# Resize an image to 800x600 — runs inside a disposable microVM
INPUT="/srv/function/input.jpg"
OUTPUT="/srv/function/output.jpg"
if [ ! -f "${INPUT}" ]; then
echo "ERROR: no input.jpg found"
exit 1
fi
# ImageMagick would be in the rootfs if you need it
# For this example, use ffmpeg (smaller footprint)
ffmpeg -i "${INPUT}" -vf scale=800:600 "${OUTPUT}" 2>/dev/null
echo "Resized to 800x600: $(stat -c%s "${OUTPUT}") bytes"
EOF
chmod +x /srv/functions/image-resize/handler.sh
# Copy an image into the function directory and run it
cp photo.jpg /srv/functions/image-resize/input.jpg
kfc-run image-resize
Example: API endpoint function
# A function that responds to an HTTP request
mkdir -p /srv/functions/api-hello
cat > /srv/functions/api-hello/handler.sh <<'EOF'
#!/bin/sh
# Minimal HTTP response — runs inside a disposable microVM
TIMESTAMP=$(date -Iseconds)
HOSTNAME=$(hostname)
cat <<RESPONSE
HTTP/1.1 200 OK
Content-Type: application/json
{"status":"ok","host":"${HOSTNAME}","time":"${TIMESTAMP}","message":"Hello from a microVM that booted just for you"}
RESPONSE
EOF
chmod +x /srv/functions/api-hello/handler.sh
kfc-run api-hello
5. Scaling — 100 microVMs on one host
Each Firecracker microVM uses about 5MB of host memory overhead beyond what you allocate to the guest. A VM with 128MB of guest RAM costs 133MB total. ZFS clones mean the rootfs disk cost per VM is effectively zero until the guest starts writing. On a 64GB host, you can run hundreds of concurrent microVMs.
Parallel launcher
#!/bin/bash
# /usr/local/bin/kfc-scale — launch N microVMs in parallel
# Usage: kfc-scale <count> <function-name>
COUNT="${1:?Usage: kfc-scale <count> <function-name>}"
FUNC="${2:?Usage: kfc-scale <count> <function-name>}"
PIDS=()
echo "Launching ${COUNT} microVMs running ${FUNC}..."
for i in $(seq 1 "${COUNT}"); do
kfc-run "${FUNC}" &
PIDS+=($!)
# Stagger launches slightly to avoid tap device collisions
sleep 0.05
done
echo "All ${COUNT} VMs launched. Waiting for completion..."
FAILED=0
for pid in "${PIDS[@]}"; do
wait "${pid}" || ((FAILED++))
done
echo "Completed: $((COUNT - FAILED)) succeeded, ${FAILED} failed"
Memory budget
128MB guest + 5MB overhead = 133MB per VM. A 64GB host with 8GB reserved for the host OS and ZFS ARC gives you ~420 concurrent microVMs. Reduce the guest RAM to 64MB for simple functions and you double that. The ARC is your friend here — it caches the shared rootfs blocks that every clone reads.
Disk budget
ZFS clones share all unchanged blocks. If your golden rootfs is 200MB and each function writes 2MB of temp data, 100 VMs cost 200MB + (100 x 2MB) = 400MB total. Without ZFS, 100 copies of a 200MB image would cost 20GB. This is not a trick. This is how copy-on-write filesystems work.
Rate limiting with jailer
The jailer binary puts each microVM in its own cgroup. You can set CPU and memory limits
per VM so a runaway function cannot starve other VMs. Combined with seccomp filters, you get
defense-in-depth: the guest kernel is isolated, the process is cgroup-limited, and the syscalls are
filtered.
# Launch with jailer for production isolation
jailer --id "${VM_ID}" \
--exec-file /usr/local/bin/firecracker \
--uid 65534 --gid 65534 \
--chroot-base-dir /srv/jailer \
--cgroup-version 2 \
-- --api-sock /run/firecracker.sock
Network namespace per VM
Each microVM gets its own tap device and can be placed in its own network namespace. Outbound traffic routes through the host's bridge. VMs cannot see each other's traffic unless you explicitly route between their namespaces.
# Isolated network namespace per VM
ip netns add "ns-${VM_ID}"
ip tuntap add dev "tap-${VM_ID}" mode tap
ip link set "tap-${VM_ID}" netns "ns-${VM_ID}"
ip netns exec "ns-${VM_ID}" \
ip addr add 172.16.0.1/30 dev "tap-${VM_ID}"
ip netns exec "ns-${VM_ID}" \
ip link set "tap-${VM_ID}" up
6. Postinstaller integration
Bake the entire Firecracker function platform into a kldload postinstaller so it deploys automatically when you install a KVM-profile host.
Postinstaller script
#!/bin/bash
# /srv/postinstallers/firecracker-platform.sh
# Runs after kldload install — sets up the Firecracker function runtime
set -euo pipefail
LOG="/var/log/kldload-postinstall-firecracker.log"
exec >>"${LOG}" 2>&1
echo "=== Firecracker platform setup — $(date) ==="
# ── ZFS datasets ──────────────────────────────────────────────────────────
zfs create -o compression=zstd -o recordsize=64k \
-o mountpoint=/var/lib/firecracker rpool/firecracker
zfs create -o mountpoint=/var/lib/firecracker/rootfs/golden \
rpool/firecracker/golden
zfs create -o mountpoint=/srv/functions rpool/srv/functions
# ── Build the golden Alpine rootfs ─────────────────────────────────────────
GOLDEN="/var/lib/firecracker/rootfs/golden"
curl -fsSL https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-minirootfs-3.20.0-x86_64.tar.gz \
| tar xzf - -C "${GOLDEN}"
chroot "${GOLDEN}" /bin/sh -c '
apk add --no-cache openrc util-linux
ln -s agetty /etc/init.d/agetty.ttyS0
echo ttyS0 > /etc/securetty
rc-update add agetty.ttyS0 default
rc-update add devfs boot
rc-update add procfs boot
rc-update add sysfs boot
echo "root:firecracker" | chpasswd
'
# Snapshot the golden rootfs — all clones derive from here
zfs snapshot rpool/firecracker/golden@base
# ── Download a known-good kernel ───────────────────────────────────────────
curl -fsSL https://s3.amazonaws.com/spec.ccfc.min/ci-artifacts/kernels/x86_64/vmlinux-5.10.217 \
-o /var/lib/firecracker/vmlinux
# ── Install the function runner scripts ────────────────────────────────────
install -m 0755 /srv/postinstallers/files/kfc-run /usr/local/bin/kfc-run
install -m 0755 /srv/postinstallers/files/kfc-scale /usr/local/bin/kfc-scale
# ── Create the bridge + NAT rules ──────────────────────────────────────────
cat > /etc/sysctl.d/99-firecracker.conf <<'SYSCTL'
net.ipv4.ip_forward = 1
SYSCTL
sysctl -p /etc/sysctl.d/99-firecracker.conf
# ── Snapshot the entire setup for recovery ────────────────────────────────
ksnap /var/lib/firecracker
echo "=== Firecracker platform ready — $(date) ==="
7. AI integration
The local AI assistant can manage the microVM fleet — launching functions, monitoring VM health, cleaning up stale clones, and reporting on resource usage. Give it a context script that feeds live Firecracker state into every query.
Firecracker context for the AI
#!/bin/bash
# /usr/local/bin/kai-firecracker — query the AI about the microVM fleet
build_fc_context() {
echo "=== FIRECRACKER STATE ($(date -Iseconds)) ==="
echo -e "\n--- Running microVMs ---"
ps aux | grep '[f]irecracker' | awk '{print $2, $11, $12}'
echo -e "\n--- ZFS clones (active VMs) ---"
zfs list -r rpool/firecracker -o name,used,refer,origin 2>/dev/null
echo -e "\n--- Golden rootfs snapshots ---"
zfs list -t snapshot -r rpool/firecracker/golden \
-o name,used,creation 2>/dev/null
echo -e "\n--- Function definitions ---"
ls -la /srv/functions/ 2>/dev/null
echo -e "\n--- Tap devices ---"
ip link show type tun 2>/dev/null
echo -e "\n--- Memory pressure ---"
free -h
echo ""
cat /proc/meminfo | grep -E 'MemTotal|MemAvail|Committed_AS'
echo -e "\n--- Jailer cgroups ---"
find /sys/fs/cgroup -name "fc-*" -type d 2>/dev/null | head -20
}
QUESTION="$*"
if [ -z "$QUESTION" ]; then
echo "Usage: kai-firecracker <question>"
echo ""
echo "Examples:"
echo " kai-firecracker 'how many VMs are running?'"
echo " kai-firecracker 'clean up stale clones'"
echo " kai-firecracker 'can I launch 50 more VMs?'"
echo " kai-firecracker 'which functions ran today?'"
exit 1
fi
CONTEXT=$(build_fc_context)
echo -e "${CONTEXT}\n\n=== QUESTION ===\n${QUESTION}" | \
ollama run kldload-admin
"How many more VMs can I run?"
The AI reads free -h, counts running VMs, calculates per-VM overhead, and tells you
exactly how many 128MB VMs fit in the remaining memory. It factors in the ZFS ARC reservation.
kai-firecracker "how many more 128MB VMs can I launch?"
"Clean up stale clones"
The AI lists ZFS clones under rpool/firecracker, cross-references with running
firecracker processes, and identifies orphaned clones from crashed VMs. It gives you
the exact zfs destroy commands.
kai-firecracker "find and destroy any orphaned VM clones"
Firecracker gives you the isolation of VMs at the speed of containers. ZFS gives you instant clones, snapshots, compression, and replication underneath. The combination is a serverless runtime that runs on your hardware, boots in milliseconds, and leaves no trace when the function is done.
No orchestrator. No control plane. No billing API. Just a script that clones a dataset, boots a kernel, runs your code, and destroys the VM. That is serverless without the server bill.