| your Linux construction kit
Source
← Back to Hypervisor

Serverless & MicroVMs — Firecracker on ZFS

Firecracker is the virtual machine monitor that powers AWS Lambda and Fargate. It is open source, it runs on KVM, and it boots a microVM in under 125 milliseconds. Each microVM is a real virtual machine with its own Linux kernel — not a container, not a namespace trick, not a shared-kernel sandbox. Hardware-level isolation at container speed.

The kldload KVM profile installs Firecracker, jailer, and firectl from the darksite automatically. This page shows you how to use them.

1. What Firecracker is

MicroVMs, not containers

A container shares the host kernel. A microVM boots its own kernel. Firecracker strips the virtual machine down to the absolute minimum — no BIOS, no PCI bus, no USB, no GPU passthrough. Just a kernel, a rootfs, a network interface, and a block device. That is why it boots in under 125ms and uses about 5MB of memory overhead per VM.

A container is a room with a shared wall. A microVM is a separate building with a very thin foundation. Same neighborhood, very different blast radius.

KVM underneath

Firecracker is a KVM virtual machine monitor written in Rust. It talks to /dev/kvm directly. No QEMU, no libvirt, no management layer. The KVM profile in kldload enables kvm_intel / kvm_amd and sets up the necessary device permissions. If your hardware supports VT-x or AMD-V, Firecracker works.

QEMU is a Swiss Army knife. Firecracker is a scalpel. Both cut. One fits in a 3MB binary.

The jailer

Firecracker ships with jailer — a wrapper that puts each microVM inside a chroot with its own cgroup, UID, seccomp filter, and network namespace. Even if the guest kernel is compromised, the attacker lands inside a cgroup jail with no access to the host filesystem, no outbound network unless you explicitly route it, and no syscalls beyond a minimal whitelist.

The microVM is the locked room. The jailer is the locked building around the locked room.

2. Why ZFS makes it better

Every microVM needs a rootfs — a filesystem image it boots from. Without ZFS, you are copying multi-megabyte images for every VM. With ZFS, you are cloning them in milliseconds for zero space.

Snapshot before execution

Snapshot the rootfs dataset before the microVM boots. If the function mutates the filesystem, roll back to the snapshot after execution. Every invocation starts from a clean state. No image rebuilds, no re-downloads. One zfs rollback and you are back to byte-identical pristine.

Clone for parallel execution

zfs clone creates a writable copy of a snapshot in under a second regardless of the dataset size. Need 50 VMs? Clone 50 datasets. They share every block with the original until a write diverges. Disk usage is proportional to what each VM changes, not what it contains.

Compress idle images

ZFS compression=zstd on the rootfs dataset means idle images take a fraction of their raw size on disk. A 200MB Alpine rootfs compresses to about 70MB. You are not paying for that compression at runtime — the ARC caches decompressed blocks in RAM.

Replicate golden images

zfs send / zfs recv replicates a rootfs snapshot to another node in one command. Build the golden image once, ship it to every host. Incremental sends mean updates transfer only the changed blocks.

3. Setup

The KVM profile handles the heavy lifting. kldload-firstboot calls setup_kvm(), which installs firecracker, jailer, and firectl from the darksite binaries in /root/darksite/binaries/. It also creates /var/lib/firecracker/rootfs/ and /run/firecracker/.

Verify the install

# All three should return version info
firecracker --version
jailer --version
firectl --version

# Confirm KVM is available
ls -l /dev/kvm
# crw-rw---- 1 root kvm 10, 232 ... /dev/kvm

Create the rootfs dataset

# ZFS dataset for all microVM rootfs images
zfs create -o compression=zstd -o recordsize=64k \
  -o mountpoint=/var/lib/firecracker/rootfs rpool/firecracker

# Create the golden Alpine rootfs
mkdir -p /tmp/alpine-rootfs
cd /tmp/alpine-rootfs

# Bootstrap a minimal Alpine root
curl -fsSL https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-minirootfs-3.20.0-x86_64.tar.gz \
  | tar xzf -

# Add OpenRC init and a shell
chroot /tmp/alpine-rootfs /bin/sh -c '
  apk add --no-cache openrc
  ln -s agetty /etc/init.d/agetty.ttyS0
  echo ttyS0 > /etc/securetty
  rc-update add agetty.ttyS0 default
  rc-update add devfs boot
  rc-update add procfs boot
  rc-update add sysfs boot
  echo "root:firecracker" | chpasswd
'

# Create an ext4 rootfs image from the chroot
truncate -s 200M /var/lib/firecracker/rootfs/alpine.ext4
mkfs.ext4 /var/lib/firecracker/rootfs/alpine.ext4
mount /var/lib/firecracker/rootfs/alpine.ext4 /mnt
cp -a /tmp/alpine-rootfs/* /mnt/
umount /mnt
rm -rf /tmp/alpine-rootfs
The rootfs image is the stage. The microVM is the actor. Build the stage once, let a thousand actors perform on clones of it.

Get a kernel

Firecracker needs an uncompressed Linux kernel (vmlinux). You can extract one from the host or download a prebuilt one.

# Option A: extract vmlinux from the host kernel
/usr/src/kernels/$(uname -r)/scripts/extract-vmlinux \
  /boot/vmlinuz-$(uname -r) > /var/lib/firecracker/vmlinux

# Option B: use the Firecracker CI kernel (known-good, minimal)
curl -fsSL https://s3.amazonaws.com/spec.ccfc.min/ci-artifacts/kernels/x86_64/vmlinux-5.10.217 \
  -o /var/lib/firecracker/vmlinux

Set up a tap device for networking

# Create a tap device for the microVM
ip tuntap add dev tap0 mode tap
ip addr add 172.16.0.1/24 dev tap0
ip link set tap0 up

# Enable NAT so the microVM can reach the outside
iptables -t nat -A POSTROUTING -o br0 -j MASQUERADE
iptables -A FORWARD -i tap0 -o br0 -j ACCEPT
iptables -A FORWARD -i br0 -o tap0 -m state --state RELATED,ESTABLISHED -j ACCEPT
echo 1 > /proc/sys/net/ipv4/ip_forward

Launch a microVM

# Launch with firectl — the fastest way to get a VM running
firectl \
  --kernel=/var/lib/firecracker/vmlinux \
  --root-drive=/var/lib/firecracker/rootfs/alpine.ext4 \
  --tap-device=tap0/aa:fc:00:00:00:01 \
  --kernel-opts="console=ttyS0 reboot=k panic=1 pci=off ip=172.16.0.2::172.16.0.1:255.255.255.0::eth0:off"

# The VM boots in under 125ms. You get a serial console.
# Login: root / firecracker
From cold metal to a running Linux kernel with a shell prompt: less time than it takes your browser to render this page.

4. Lambda-style function runner

This is the pattern that Lambda uses internally: create a microVM, run a function, capture the output, destroy the VM. Each invocation is completely isolated. The rootfs rolls back to the snapshot between invocations so every execution starts from byte-identical state.

The function runner script

#!/bin/bash
# /usr/local/bin/kfc-run — kldload Firecracker function runner
# Usage: kfc-run <function-name> [args...]

set -euo pipefail

FUNC_NAME="${1:?Usage: kfc-run <function-name> [args...]}"
shift
FUNC_ARGS="$*"

FC_ROOT="/var/lib/firecracker"
ROOTFS_BASE="${FC_ROOT}/rootfs"
KERNEL="${FC_ROOT}/vmlinux"
FUNC_DIR="/srv/functions/${FUNC_NAME}"
VM_ID="fc-$(date +%s%N | sha256sum | head -c 8)"
CLONE_DS="rpool/firecracker/${VM_ID}"

# ── Validate ──────────────────────────────────────────────────────────────
[[ -d "${FUNC_DIR}" ]] || { echo "ERROR: function dir not found: ${FUNC_DIR}"; exit 1; }
[[ -f "${FUNC_DIR}/handler.sh" ]] || { echo "ERROR: no handler.sh in ${FUNC_DIR}"; exit 1; }

# ── Clone the golden rootfs snapshot ──────────────────────────────────────
zfs clone rpool/firecracker/golden@base "${CLONE_DS}"
CLONE_MOUNT=$(zfs get -H -o value mountpoint "${CLONE_DS}")

# Copy the function into the clone
mkdir -p "${CLONE_MOUNT}/srv/function"
cp -a "${FUNC_DIR}/." "${CLONE_MOUNT}/srv/function/"

# Write the invocation script
cat > "${CLONE_MOUNT}/srv/function/invoke.sh" <<INVOKE
#!/bin/sh
cd /srv/function
exec /srv/function/handler.sh ${FUNC_ARGS} 2>&1
INVOKE
chmod +x "${CLONE_MOUNT}/srv/function/invoke.sh"

# Create the ext4 image from the clone
ROOTFS_IMG="/tmp/${VM_ID}.ext4"
truncate -s 200M "${ROOTFS_IMG}"
mkfs.ext4 -q "${ROOTFS_IMG}"
mount "${ROOTFS_IMG}" /mnt
cp -a "${CLONE_MOUNT}/." /mnt/
umount /mnt

# ── Create tap device ─────────────────────────────────────────────────────
TAP="tap-${VM_ID:0:8}"
ip tuntap add dev "${TAP}" mode tap
ip addr add 172.16.0.1/30 dev "${TAP}"
ip link set "${TAP}" up

# ── Launch the microVM ────────────────────────────────────────────────────
SOCKET="/tmp/${VM_ID}.sock"

# Start firecracker in the background
firecracker --api-sock "${SOCKET}" &
FC_PID=$!
sleep 0.1

# Configure the VM via the API
curl -s --unix-socket "${SOCKET}" -X PUT "http://localhost/boot-source" \
  -H "Content-Type: application/json" \
  -d "{
    \"kernel_image_path\": \"${KERNEL}\",
    \"boot_args\": \"console=ttyS0 reboot=k panic=1 pci=off init=/srv/function/invoke.sh ip=172.16.0.2::172.16.0.1:255.255.255.0::eth0:off\"
  }"

curl -s --unix-socket "${SOCKET}" -X PUT "http://localhost/drives/rootfs" \
  -H "Content-Type: application/json" \
  -d "{
    \"drive_id\": \"rootfs\",
    \"path_on_host\": \"${ROOTFS_IMG}\",
    \"is_root_device\": true,
    \"is_read_only\": false
  }"

curl -s --unix-socket "${SOCKET}" -X PUT "http://localhost/network-interfaces/eth0" \
  -H "Content-Type: application/json" \
  -d "{
    \"iface_id\": \"eth0\",
    \"guest_mac\": \"aa:fc:00:00:00:01\",
    \"host_dev_name\": \"${TAP}\"
  }"

curl -s --unix-socket "${SOCKET}" -X PUT "http://localhost/machine-config" \
  -H "Content-Type: application/json" \
  -d "{\"vcpu_count\": 1, \"mem_size_mib\": 128}"

# Start the VM
curl -s --unix-socket "${SOCKET}" -X PUT "http://localhost/actions" \
  -H "Content-Type: application/json" \
  -d "{\"action_type\": \"InstanceStart\"}"

# ── Wait for the function to finish (max 30 seconds) ─────────────────────
TIMEOUT=30
while kill -0 "${FC_PID}" 2>/dev/null && [[ ${TIMEOUT} -gt 0 ]]; do
  sleep 1
  ((TIMEOUT--))
done

# ── Cleanup ───────────────────────────────────────────────────────────────
kill "${FC_PID}" 2>/dev/null || true
wait "${FC_PID}" 2>/dev/null || true

ip link del "${TAP}" 2>/dev/null || true
rm -f "${ROOTFS_IMG}" "${SOCKET}"
zfs destroy "${CLONE_DS}"

echo "--- ${FUNC_NAME} completed (VM: ${VM_ID}) ---"
A disposable coffee cup. Pour the function in, drink the output, throw the VM away. The ZFS clone means the cup costs nothing to make.

Create the golden rootfs snapshot

# Create the base dataset and snapshot for cloning
zfs create -o compression=zstd -o recordsize=64k \
  -o mountpoint=/var/lib/firecracker/rootfs/golden rpool/firecracker/golden

# Copy in your prepared Alpine rootfs
cp -a /tmp/alpine-rootfs/* /var/lib/firecracker/rootfs/golden/

# Snapshot it — this is the template every function clone uses
zfs snapshot rpool/firecracker/golden@base

# Every kfc-run invocation clones from this snapshot
# The clone is instant, regardless of rootfs size

Example: image resize function

# Create the function directory
mkdir -p /srv/functions/image-resize

# The handler — this runs inside the microVM
cat > /srv/functions/image-resize/handler.sh <<'EOF'
#!/bin/sh
# Resize an image to 800x600 — runs inside a disposable microVM
INPUT="/srv/function/input.jpg"
OUTPUT="/srv/function/output.jpg"

if [ ! -f "${INPUT}" ]; then
  echo "ERROR: no input.jpg found"
  exit 1
fi

# ImageMagick would be in the rootfs if you need it
# For this example, use ffmpeg (smaller footprint)
ffmpeg -i "${INPUT}" -vf scale=800:600 "${OUTPUT}" 2>/dev/null
echo "Resized to 800x600: $(stat -c%s "${OUTPUT}") bytes"
EOF
chmod +x /srv/functions/image-resize/handler.sh

# Copy an image into the function directory and run it
cp photo.jpg /srv/functions/image-resize/input.jpg
kfc-run image-resize

Example: API endpoint function

# A function that responds to an HTTP request
mkdir -p /srv/functions/api-hello

cat > /srv/functions/api-hello/handler.sh <<'EOF'
#!/bin/sh
# Minimal HTTP response — runs inside a disposable microVM
TIMESTAMP=$(date -Iseconds)
HOSTNAME=$(hostname)

cat <<RESPONSE
HTTP/1.1 200 OK
Content-Type: application/json

{"status":"ok","host":"${HOSTNAME}","time":"${TIMESTAMP}","message":"Hello from a microVM that booted just for you"}
RESPONSE
EOF
chmod +x /srv/functions/api-hello/handler.sh

kfc-run api-hello
Every API call gets its own kernel. If the function crashes, segfaults, or gets exploited, nothing else on the host notices. The VM dies, the clone is destroyed, the snapshot remains untouched.

5. Scaling — 100 microVMs on one host

Each Firecracker microVM uses about 5MB of host memory overhead beyond what you allocate to the guest. A VM with 128MB of guest RAM costs 133MB total. ZFS clones mean the rootfs disk cost per VM is effectively zero until the guest starts writing. On a 64GB host, you can run hundreds of concurrent microVMs.

Parallel launcher

#!/bin/bash
# /usr/local/bin/kfc-scale — launch N microVMs in parallel
# Usage: kfc-scale <count> <function-name>

COUNT="${1:?Usage: kfc-scale <count> <function-name>}"
FUNC="${2:?Usage: kfc-scale <count> <function-name>}"
PIDS=()

echo "Launching ${COUNT} microVMs running ${FUNC}..."

for i in $(seq 1 "${COUNT}"); do
  kfc-run "${FUNC}" &
  PIDS+=($!)

  # Stagger launches slightly to avoid tap device collisions
  sleep 0.05
done

echo "All ${COUNT} VMs launched. Waiting for completion..."

FAILED=0
for pid in "${PIDS[@]}"; do
  wait "${pid}" || ((FAILED++))
done

echo "Completed: $((COUNT - FAILED)) succeeded, ${FAILED} failed"

Memory budget

128MB guest + 5MB overhead = 133MB per VM. A 64GB host with 8GB reserved for the host OS and ZFS ARC gives you ~420 concurrent microVMs. Reduce the guest RAM to 64MB for simple functions and you double that. The ARC is your friend here — it caches the shared rootfs blocks that every clone reads.

Disk budget

ZFS clones share all unchanged blocks. If your golden rootfs is 200MB and each function writes 2MB of temp data, 100 VMs cost 200MB + (100 x 2MB) = 400MB total. Without ZFS, 100 copies of a 200MB image would cost 20GB. This is not a trick. This is how copy-on-write filesystems work.

Rate limiting with jailer

The jailer binary puts each microVM in its own cgroup. You can set CPU and memory limits per VM so a runaway function cannot starve other VMs. Combined with seccomp filters, you get defense-in-depth: the guest kernel is isolated, the process is cgroup-limited, and the syscalls are filtered.

# Launch with jailer for production isolation
jailer --id "${VM_ID}" \
  --exec-file /usr/local/bin/firecracker \
  --uid 65534 --gid 65534 \
  --chroot-base-dir /srv/jailer \
  --cgroup-version 2 \
  -- --api-sock /run/firecracker.sock

Network namespace per VM

Each microVM gets its own tap device and can be placed in its own network namespace. Outbound traffic routes through the host's bridge. VMs cannot see each other's traffic unless you explicitly route between their namespaces.

# Isolated network namespace per VM
ip netns add "ns-${VM_ID}"
ip tuntap add dev "tap-${VM_ID}" mode tap
ip link set "tap-${VM_ID}" netns "ns-${VM_ID}"
ip netns exec "ns-${VM_ID}" \
  ip addr add 172.16.0.1/30 dev "tap-${VM_ID}"
ip netns exec "ns-${VM_ID}" \
  ip link set "tap-${VM_ID}" up

6. Postinstaller integration

Bake the entire Firecracker function platform into a kldload postinstaller so it deploys automatically when you install a KVM-profile host.

Postinstaller script

#!/bin/bash
# /srv/postinstallers/firecracker-platform.sh
# Runs after kldload install — sets up the Firecracker function runtime

set -euo pipefail
LOG="/var/log/kldload-postinstall-firecracker.log"
exec >>"${LOG}" 2>&1

echo "=== Firecracker platform setup — $(date) ==="

# ── ZFS datasets ──────────────────────────────────────────────────────────
zfs create -o compression=zstd -o recordsize=64k \
  -o mountpoint=/var/lib/firecracker rpool/firecracker

zfs create -o mountpoint=/var/lib/firecracker/rootfs/golden \
  rpool/firecracker/golden

zfs create -o mountpoint=/srv/functions rpool/srv/functions

# ── Build the golden Alpine rootfs ─────────────────────────────────────────
GOLDEN="/var/lib/firecracker/rootfs/golden"
curl -fsSL https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-minirootfs-3.20.0-x86_64.tar.gz \
  | tar xzf - -C "${GOLDEN}"

chroot "${GOLDEN}" /bin/sh -c '
  apk add --no-cache openrc util-linux
  ln -s agetty /etc/init.d/agetty.ttyS0
  echo ttyS0 > /etc/securetty
  rc-update add agetty.ttyS0 default
  rc-update add devfs boot
  rc-update add procfs boot
  rc-update add sysfs boot
  echo "root:firecracker" | chpasswd
'

# Snapshot the golden rootfs — all clones derive from here
zfs snapshot rpool/firecracker/golden@base

# ── Download a known-good kernel ───────────────────────────────────────────
curl -fsSL https://s3.amazonaws.com/spec.ccfc.min/ci-artifacts/kernels/x86_64/vmlinux-5.10.217 \
  -o /var/lib/firecracker/vmlinux

# ── Install the function runner scripts ────────────────────────────────────
install -m 0755 /srv/postinstallers/files/kfc-run   /usr/local/bin/kfc-run
install -m 0755 /srv/postinstallers/files/kfc-scale /usr/local/bin/kfc-scale

# ── Create the bridge + NAT rules ──────────────────────────────────────────
cat > /etc/sysctl.d/99-firecracker.conf <<'SYSCTL'
net.ipv4.ip_forward = 1
SYSCTL
sysctl -p /etc/sysctl.d/99-firecracker.conf

# ── Snapshot the entire setup for recovery ────────────────────────────────
ksnap /var/lib/firecracker

echo "=== Firecracker platform ready — $(date) ==="
You build the factory once. The postinstaller is the blueprint. Every new host gets the same factory, the same golden rootfs, the same function runner. Install the ISO, boot, and you have a serverless platform.

7. AI integration

The local AI assistant can manage the microVM fleet — launching functions, monitoring VM health, cleaning up stale clones, and reporting on resource usage. Give it a context script that feeds live Firecracker state into every query.

Firecracker context for the AI

#!/bin/bash
# /usr/local/bin/kai-firecracker — query the AI about the microVM fleet

build_fc_context() {
    echo "=== FIRECRACKER STATE ($(date -Iseconds)) ==="

    echo -e "\n--- Running microVMs ---"
    ps aux | grep '[f]irecracker' | awk '{print $2, $11, $12}'

    echo -e "\n--- ZFS clones (active VMs) ---"
    zfs list -r rpool/firecracker -o name,used,refer,origin 2>/dev/null

    echo -e "\n--- Golden rootfs snapshots ---"
    zfs list -t snapshot -r rpool/firecracker/golden \
      -o name,used,creation 2>/dev/null

    echo -e "\n--- Function definitions ---"
    ls -la /srv/functions/ 2>/dev/null

    echo -e "\n--- Tap devices ---"
    ip link show type tun 2>/dev/null

    echo -e "\n--- Memory pressure ---"
    free -h
    echo ""
    cat /proc/meminfo | grep -E 'MemTotal|MemAvail|Committed_AS'

    echo -e "\n--- Jailer cgroups ---"
    find /sys/fs/cgroup -name "fc-*" -type d 2>/dev/null | head -20
}

QUESTION="$*"
if [ -z "$QUESTION" ]; then
    echo "Usage: kai-firecracker <question>"
    echo ""
    echo "Examples:"
    echo "  kai-firecracker 'how many VMs are running?'"
    echo "  kai-firecracker 'clean up stale clones'"
    echo "  kai-firecracker 'can I launch 50 more VMs?'"
    echo "  kai-firecracker 'which functions ran today?'"
    exit 1
fi

CONTEXT=$(build_fc_context)

echo -e "${CONTEXT}\n\n=== QUESTION ===\n${QUESTION}" | \
  ollama run kldload-admin
The AI sees running VMs like a nurse sees monitors. Pulse, memory, network, clone lineage. Ask it a question and it answers from live data, not from documentation.

"How many more VMs can I run?"

The AI reads free -h, counts running VMs, calculates per-VM overhead, and tells you exactly how many 128MB VMs fit in the remaining memory. It factors in the ZFS ARC reservation.

kai-firecracker "how many more 128MB VMs can I launch?"

"Clean up stale clones"

The AI lists ZFS clones under rpool/firecracker, cross-references with running firecracker processes, and identifies orphaned clones from crashed VMs. It gives you the exact zfs destroy commands.

kai-firecracker "find and destroy any orphaned VM clones"

Firecracker gives you the isolation of VMs at the speed of containers. ZFS gives you instant clones, snapshots, compression, and replication underneath. The combination is a serverless runtime that runs on your hardware, boots in milliseconds, and leaves no trace when the function is done.

No orchestrator. No control plane. No billing API. Just a script that clones a dataset, boots a kernel, runs your code, and destroys the VM. That is serverless without the server bill.