kldload kldload — your Linux re-packer your Linux re-packer — for freegt; kldload — infrastructure, your way — for freemdash; pick your distro, get ZFS on root

Build Your Own · Expert

Custom Postinstallers — turn a bare install into production infrastructure.

A postinstaller is a bash script that runs inside the installed system after kldload finishes the base OS install. It is the bridge between "I have a fresh OS with ZFS on root" and "I have a configured production server." Every customization — packages, services, datasets, users, firewall rules, container images, certificates — lives in the postinstaller. The base install is the platform. The postinstaller is your application layer. They ship together in one ISO.

This guide covers everything: what postinstallers are, how the lifecycle works, how to write robust scripts, 10+ complete working examples for real-world server roles, how to compose and test them, and how to debug when things go wrong. By the end, you will be able to build a single ISO that installs a fully-configured web server, database, Kubernetes node, or anything else — without touching a keyboard after plugging in the USB stick.

This is the most powerful page on the site. Everything else — the ZFS tutorials, the kernel architecture, the security model — is foundation. This is where you build on it. A postinstaller turns kldload from "an installer with nice defaults" into "a deployment platform that produces complete, sealed, offline infrastructure from a USB stick." If you read one page after the basics, read this one.

What postinstallers are

The hook point

kldload installs the base system: kernel, ZFS, bootloader, tools. When it finishes, it copies everything under live-build/config/includes.chroot/root/darksite/ into the target system at /root/darksite/. If a file called /root/darksite/postinstall.sh exists on the target, the kldload-firstboot.service systemd unit runs it on the first real boot. That script is your entry point. Everything you put in it runs with root privileges on a freshly-installed, fully-booted system with ZFS on root.

#!/bin/bash
# /root/darksite/postinstall.sh
# Runs after kldload finishes the base install and the system boots for the first time.
# You have: root access, ZFS on root, network (if configured), all base packages.
# You do: whatever you want.

echo "My custom postinstaller is running!"
dnf install -y nginx
systemctl enable --now nginx
echo "<h1>Built by kldload</h1>" > /usr/share/nginx/html/index.html

postinstall.sh is the recipe card. The base system is the kitchen. You decide what gets cooked.

What a postinstaller is NOT

A postinstaller is not a configuration management tool. It does not run repeatedly. It does not check convergence. It does not connect to an external orchestrator. It runs once, on first boot, from local files. When it finishes, the system is configured. There is no "day 2" step. There is no "run the playbook after deploy." The machine configures itself from its own payload.

If you need ongoing configuration management (Ansible, Salt, Puppet), the postinstaller is where you install and configure those tools. The postinstaller bootstraps the bootstrap. Everything downstream starts from the state it creates.

This is fundamentally different from Ansible/Puppet/Chef. Those tools run after the machine is deployed, over the network, from an external orchestrator. The postinstaller runs during deployment, from local files, with no network dependency. The machine configures itself. When it comes up, it is done. No "day 2" configuration step. No "run the playbook after deploy." It is all in the image.

Think about what this means for reproducibility. An Ansible playbook that runs against a machine depends on the state of every package mirror, every GPG key, every template variable at the moment of execution. A postinstaller that runs from baked-in darksite packages depends on nothing external. The image is the artifact. The artifact is the deployment. The deployment is identical every time.

How postinstallers work — the lifecycle

Understanding the postinstaller lifecycle means understanding exactly when your script runs, what state the system is in, and what resources are available. Here is the complete sequence from ISO boot to production-ready system.

The full install timeline

Phase 1: ISO boots (live CentOS Stream 9 environment)
  ├── Web UI or answers file selects distro, profile, disk
  ├── kldload-install-target partitions disk, creates ZFS pool
  ├── Bootstrap installs base OS into /target (dnf/debootstrap/pacstrap)
  ├── ZFS DKMS built, ZFSBootMenu installed, bootloader configured
  ├── Darksite payload copied: /root/darksite/ → /target/root/darksite/
  ├── kldload-firstboot.service enabled in target
  └── System reboots from disk (ISO removed)

Phase 2: First boot from disk
  ├── systemd starts kldload-firstboot.service (runs once, Type=oneshot)
  ├── Firstboot reads /etc/kldload/install-manifest.env for KLDLOAD_* vars
  ├── WireGuard planes configured (if hub.env present)
  ├── /root/darksite/postinstall.sh executes HERE
  ├── Firstboot service marks itself complete, disables for future boots
  └── System is production-ready

Phase 3: Every subsequent boot
  ├── Normal systemd boot — firstboot does not run again
  ├── All services configured by postinstaller are running
  └── System is in steady state

The critical insight: your postinstaller runs on a fully booted system, not in a chroot. The ZFS pool is imported. The network is up (if configured). Systemd is running. You can start services, create ZFS datasets, download files, and do anything a logged-in root user could do.

Environment variables available inside postinstall.sh

The firstboot service sources /etc/kldload/install-manifest.env before running your script. These variables are available:

# Core install parameters
KLDLOAD_DISTRO=debian           # centos, debian, ubuntu, fedora, rhel, rocky, arch, alpine
KLDLOAD_PROFILE=server          # desktop, server, core
KLDLOAD_HOSTNAME=web-prod-01    # hostname set during install
KLDLOAD_DISK=/dev/vda           # disk that was installed to
KLDLOAD_TIMEZONE=America/Toronto
KLDLOAD_LOCALE=en_US.UTF-8

# Network configuration
KLDLOAD_NET_METHOD=dhcp         # dhcp or static
KLDLOAD_NET_IP=10.0.0.50/24    # if static
KLDLOAD_NET_GW=10.0.0.1        # if static
KLDLOAD_NET_DNS=10.0.0.1       # if static

# Storage
KLDLOAD_STORAGE_MODE=zfs        # always zfs (this is kldload)
KLDLOAD_POOL_NAME=rpool         # ZFS pool name

# Infrastructure
KLDLOAD_INFRA_MODE=standalone   # standalone or cluster
KLDLOAD_CLUSTER_DOMAIN=infra.local
KLDLOAD_KEEP_DARKSITE=0         # 1 = keep darksite packages on target

# Security
KLDLOAD_SECURE_BOOT=0           # 1 if Secure Boot was detected
KLDLOAD_TPM_PRESENT=0           # 1 if TPM was detected

Use these to write postinstallers that adapt to the install configuration. A script can check KLDLOAD_DISTRO to use the right package manager, or check KLDLOAD_PROFILE to skip desktop-only steps on a server install.

The manifest file is the contract between the installer and your postinstaller. It tells your script everything about the environment it is running in: which distro, which profile, which disk, which network config. Your script does not need to discover anything. It reads the manifest and acts accordingly. This is what makes postinstallers portable across distros — the same script can check KLDLOAD_DISTRO and branch to dnf, apt, or pacman as needed.

Understanding the chroot vs. firstboot distinction

Some parts of the kldload install run in a chroot (the installer reaches into /target to install packages and configure the bootloader). Your postinstaller runs on first boot — the system has rebooted from disk, systemd is running, and the ZFS pool is fully imported. This distinction matters:

Chroot (during install): No systemd. No network. No running services. Package install and file placement only.
Firstboot (your postinstaller): Full systemd. Network up. ZFS imported. Services can be started. APIs can be called.

This is why postinstallers can do things the base install cannot: start databases, initialize clusters, generate certificates with running services, pull container images, and register with external systems.

Writing a postinstaller — structure and best practices

The skeleton

Every postinstaller should follow this structure. The logging, error handling, and completion signaling are not optional — they are what make debugging possible when a deploy fails at 3am.

#!/bin/bash
# postinstall.sh — [describe what this postinstaller builds]
# Runs on first boot via kldload-firstboot.service
set -euo pipefail

# ── Logging ──────────────────────────────────────────────────────────────
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1

log() { printf '[%(%F %T)T] [postinstall] %s\n' -1 "$*"; }
die() { log "FATAL: $*"; exit 1; }

log "Postinstaller starting"
log "Distro: ${KLDLOAD_DISTRO:-unknown}  Profile: ${KLDLOAD_PROFILE:-unknown}"

# ── Detect package manager ───────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora) PKG="dnf install -y" ;;
  debian|ubuntu)            PKG="apt-get install -y" ;;
  arch)                     PKG="pacman -S --noconfirm" ;;
  alpine)                   PKG="apk add" ;;
  *)                        die "Unknown distro: ${KLDLOAD_DISTRO}" ;;
esac

# ── Phase 1: Packages ────────────────────────────────────────────────────
log "Phase 1: Installing packages"
$PKG package1 package2 package3

# ── Phase 2: Configuration ───────────────────────────────────────────────
log "Phase 2: Configuring services"
# ... write config files, create users, set permissions ...

# ── Phase 3: ZFS datasets ────────────────────────────────────────────────
log "Phase 3: Creating ZFS datasets"
# ... create application-specific datasets with tuned properties ...

# ── Phase 4: Enable services ─────────────────────────────────────────────
log "Phase 4: Enabling services"
systemctl enable --now service1 service2

# ── Done ─────────────────────────────────────────────────────────────────
log "Postinstaller complete"

A postinstaller is a recipe with numbered steps. Each step logs what it is doing. If a step fails, the log tells you exactly where.

Best practices

Always use set -euo pipefail — stop on first error, catch unset variables, catch pipe failures. A postinstaller that silently continues past errors produces machines that look installed but are broken.
Log every phase — timestamps, descriptions, and the output of key commands. When something fails, the log is all you have. Make it good.
Use the distro-detection pattern — check KLDLOAD_DISTRO and branch. One postinstaller should work across all supported distros if possible.
Create ZFS datasets for application data — do not dump everything into the root dataset. Create rpool/data/postgres, rpool/data/docker, etc. with tuned recordsize and compression. This is the whole point of ZFS on root.
Make it idempotent — if the postinstaller runs twice (because someone reboots mid-run), it should not break. Check before creating. Use install -m instead of cp. Use systemctl enable (idempotent) not systemctl start alone.
Pin package versions — if you need PostgreSQL 16, install postgresql16-server, not postgresql-server. The darksite has what you baked in. Be explicit.
Never hardcode IPs — read them from the manifest or from hostname -I. The same ISO might install on different networks.

Error handling patterns

#!/bin/bash
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1

log() { printf '[%(%F %T)T] [postinstall] %s\n' -1 "$*"; }

# Trap errors and log context before exiting
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR

# Retry wrapper for network-dependent operations
retry() {
  local max_attempts="${1}"; shift
  local delay="${1}"; shift
  local attempt=1
  while true; do
    if "$@"; then return 0; fi
    if (( attempt >= max_attempts )); then
      log "Command failed after ${max_attempts} attempts: $*"
      return 1
    fi
    log "Attempt ${attempt}/${max_attempts} failed, retrying in ${delay}s: $*"
    sleep "${delay}"
    ((attempt++))
  done
}

# Wait for network to be available (DHCP might not be instant)
retry 30 2 ping -c1 -W2 1.1.1.1

# Retry package installs (mirrors might be slow)
retry 3 5 dnf install -y nginx

# Check that a service actually started
systemctl enable --now nginx
sleep 2
systemctl is-active --quiet nginx || {
  log "nginx failed to start — check journalctl -u nginx"
  journalctl -u nginx --no-pager -n 20 >> "${LOGFILE}"
  exit 1
}

Error handling in a postinstaller is like preflight checks on an aircraft. You catch problems before they become emergencies. The trap gives you the exact line that failed. The retry handles transient network issues. The service check proves it actually works.

The ERR trap with LINENO is the single most important debugging technique for postinstallers. When a deploy fails, the log says "FAILED at line 47: dnf install -y nginx" instead of just... silence. You know exactly what command failed and where. Combined with set -euo pipefail, this catches every class of failure: command errors (set -e), unset variables (set -u), and broken pipes (pipefail). The retry wrapper handles the reality that networks are unreliable and package mirrors sometimes hiccup. The service check proves the install actually worked, not just that the package manager exited 0.

Complete postinstaller examples

Each example below is a complete, working postinstaller that you can drop into live-build/config/includes.chroot/root/darksite/postinstall.sh and build an ISO. Every script follows the same structure: logging, package detection, ZFS dataset creation with tuned properties, package installation, configuration, service enablement. Copy the one closest to your use case and modify it.

1. Web server — nginx + Let's Encrypt + ZFS-optimized config

postinstall-webserver.sh

#!/bin/bash
# Postinstaller: Production web server with nginx, certbot, and ZFS-tuned storage
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:webserver] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR

source /etc/kldload/install-manifest.env 2>/dev/null || true
log "Starting web server postinstaller on ${KLDLOAD_DISTRO:-centos}"

# ── Phase 1: ZFS datasets ────────────────────────────────────────────────
# Web content: small files, high compression, frequent reads
zfs create -o recordsize=16K \
           -o compression=zstd \
           -o atime=off \
           -o primarycache=all \
           rpool/data/www
log "Created rpool/data/www (recordsize=16K, zstd, atime=off)"

# Logs: sequential writes, large records, aggressive compression
zfs create -o recordsize=128K \
           -o compression=zstd-3 \
           -o atime=off \
           -o logbias=throughput \
           rpool/data/logs/nginx
log "Created rpool/data/logs/nginx (recordsize=128K, throughput-biased)"

# TLS certificates: tiny files, no special tuning needed
zfs create -o recordsize=4K \
           -o compression=off \
           rpool/data/certs
chmod 700 /data/certs
log "Created rpool/data/certs"

# ── Phase 2: Install packages ────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora)
    dnf install -y nginx certbot python3-certbot-nginx logrotate
    ;;
  debian|ubuntu)
    apt-get update
    apt-get install -y nginx certbot python3-certbot-nginx logrotate
    ;;
  arch)
    pacman -S --noconfirm nginx certbot certbot-nginx logrotate
    ;;
esac
log "Packages installed"

# ── Phase 3: nginx configuration ─────────────────────────────────────────
cat > /etc/nginx/nginx.conf <<'NGINX'
user nginx;
worker_processes auto;
worker_rlimit_nofile 65535;
error_log /data/logs/nginx/error.log warn;
pid /run/nginx.pid;

events {
    worker_connections 4096;
    multi_accept on;
    use epoll;
}

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Logging
    log_format main '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent" '
                    '$request_time $upstream_response_time';
    access_log /data/logs/nginx/access.log main buffer=64k flush=5s;

    # Performance
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 1000;
    types_hash_max_size 2048;
    client_max_body_size 64m;

    # Compression
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 4;
    gzip_types text/plain text/css application/json application/javascript
               text/xml application/xml application/xml+rss text/javascript;

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;
    add_header Referrer-Policy "strict-origin-when-cross-origin" always;

    # Default server
    server {
        listen 80 default_server;
        listen [::]:80 default_server;
        server_name _;
        root /data/www/default;
        index index.html;

        location /.well-known/acme-challenge/ {
            root /data/www/certbot;
        }
    }

    include /etc/nginx/conf.d/*.conf;
}
NGINX

# Create default site
mkdir -p /data/www/default /data/www/certbot
cat > /data/www/default/index.html <<'HTML'
<!DOCTYPE html>
<html><head><title>kldload web server</title></head>
<body><h1>kldload web server is running</h1>
<p>nginx on ZFS. Configured by postinstaller.</p></body></html>
HTML
log "nginx configured with ZFS-backed document root and logs"

# ── Phase 4: Certbot renewal timer ───────────────────────────────────────
cat > /etc/systemd/system/certbot-renew.timer <<'UNIT'
[Unit]
Description=Certbot renewal timer

[Timer]
OnCalendar=*-*-* 02:00:00
RandomizedDelaySec=3600
Persistent=true

[Install]
WantedBy=timers.target
UNIT

cat > /etc/systemd/system/certbot-renew.service <<'UNIT'
[Unit]
Description=Certbot renewal

[Service]
Type=oneshot
ExecStart=/usr/bin/certbot renew --quiet --deploy-hook "systemctl reload nginx"
UNIT

systemctl daemon-reload
systemctl enable certbot-renew.timer
log "Certbot renewal timer configured"

# ── Phase 5: Firewall ────────────────────────────────────────────────────
cat > /etc/nftables.conf <<'NFT'
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
    chain input {
        type filter hook input priority 0; policy drop;
        iif "lo" accept
        ct state established,related accept
        ct state invalid drop
        tcp dport { 22, 80, 443 } accept
        icmp type echo-request accept
        icmpv6 type { echo-request, nd-neighbor-solicit, nd-router-advert, nd-neighbor-advert } accept
    }
    chain forward { type filter hook forward priority 0; policy drop; }
    chain output { type filter hook output priority 0; policy accept; }
}
NFT

systemctl enable --now nftables
log "Firewall configured (SSH, HTTP, HTTPS)"

# ── Phase 6: Snapshot schedule for web content ──────────────────────────
cat > /etc/cron.d/zfs-www-snapshots <<'CRON'
# Hourly snapshots of web content, keep 48
0 * * * * root zfs snapshot rpool/data/www@auto-$(date +\%Y\%m\%d-\%H\%M) 2>/dev/null
5 * * * * root zfs list -t snapshot -o name -H rpool/data/www | head -n -48 | xargs -r -n1 zfs destroy 2>/dev/null
CRON
log "ZFS snapshot schedule configured for /data/www"

# ── Phase 7: Start nginx ─────────────────────────────────────────────────
nginx -t || die "nginx config test failed"
systemctl enable --now nginx
log "nginx started and enabled"

log "Web server postinstaller complete"

A production web server in one script: ZFS datasets tuned for web workloads, nginx with security headers, certbot for TLS, nftables firewall, and automatic snapshots of your content. Boot the ISO, pull the USB, and you have a web server.

2. Database server — PostgreSQL on ZFS with tuned recordsize

postinstall-postgres.sh

#!/bin/bash
# Postinstaller: PostgreSQL 16 on ZFS with production tuning
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:postgres] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR

source /etc/kldload/install-manifest.env 2>/dev/null || true
log "Starting PostgreSQL postinstaller on ${KLDLOAD_DISTRO:-centos}"

# ── Phase 1: ZFS datasets ────────────────────────────────────────────────
# PostgreSQL data: 8K recordsize matches PG page size exactly
# This is THE most important tuning for PG on ZFS — mismatched
# recordsize causes write amplification that kills performance
zfs create -o recordsize=8K \
           -o compression=lz4 \
           -o atime=off \
           -o primarycache=all \
           -o logbias=latency \
           -o redundant_metadata=most \
           rpool/data/postgres
log "Created rpool/data/postgres (recordsize=8K — matches PG page size)"

# WAL: sequential writes, 8K records (WAL segment = 16MB of 8K pages)
zfs create -o recordsize=8K \
           -o compression=lz4 \
           -o atime=off \
           -o logbias=latency \
           -o primarycache=metadata \
           rpool/data/postgres/wal
log "Created rpool/data/postgres/wal"

# Backups: large sequential reads/writes, max compression
zfs create -o recordsize=1M \
           -o compression=zstd-7 \
           -o atime=off \
           rpool/data/postgres/backups
log "Created rpool/data/postgres/backups"

# ── Phase 2: Install PostgreSQL ──────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky)
    dnf install -y postgresql16-server postgresql16-contrib
    PGDATA="/data/postgres/data"
    PGWAL="/data/postgres/wal"
    PGUSER="postgres"
    ;;
  fedora)
    dnf install -y postgresql-server postgresql-contrib
    PGDATA="/data/postgres/data"
    PGWAL="/data/postgres/wal"
    PGUSER="postgres"
    ;;
  debian|ubuntu)
    apt-get update
    apt-get install -y postgresql-16 postgresql-contrib-16
    PGDATA="/data/postgres/data"
    PGWAL="/data/postgres/wal"
    PGUSER="postgres"
    ;;
  arch)
    pacman -S --noconfirm postgresql
    PGDATA="/data/postgres/data"
    PGWAL="/data/postgres/wal"
    PGUSER="postgres"
    ;;
esac
log "PostgreSQL installed"

# ── Phase 3: Initialize database ─────────────────────────────────────────
chown -R "${PGUSER}:${PGUSER}" /data/postgres
chmod 700 /data/postgres/data /data/postgres/wal

sudo -u "${PGUSER}" initdb \
  --pgdata="${PGDATA}" \
  --waldir="${PGWAL}" \
  --encoding=UTF8 \
  --locale=en_US.UTF-8 \
  --auth-local=peer \
  --auth-host=scram-sha-256
log "Database initialized at ${PGDATA}, WAL at ${PGWAL}"

# ── Phase 4: Production tuning ───────────────────────────────────────────
# Calculate shared_buffers as 25% of RAM (standard PG recommendation)
TOTAL_RAM_KB=$(awk '/MemTotal/ {print $2}' /proc/meminfo)
SHARED_BUFFERS=$((TOTAL_RAM_KB / 4))KB
EFFECTIVE_CACHE=$((TOTAL_RAM_KB * 3 / 4))KB
WORK_MEM=$((TOTAL_RAM_KB / 256))KB

cat >> "${PGDATA}/postgresql.conf" <<PGCONF

# ── kldload postinstaller tuning ─────────────────────────────────────
# Memory (auto-calculated from ${TOTAL_RAM_KB}KB total RAM)
shared_buffers = ${SHARED_BUFFERS}
effective_cache_size = ${EFFECTIVE_CACHE}
work_mem = ${WORK_MEM}
maintenance_work_mem = $((TOTAL_RAM_KB / 16))KB

# ZFS-specific: disable PG checksums (ZFS does checksumming)
# ZFS-specific: disable full_page_writes (ZFS is CoW — no torn pages)
full_page_writes = off

# WAL
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1GB
archive_mode = off

# Connections
listen_addresses = '*'
max_connections = 200

# Logging
log_destination = 'stderr'
logging_collector = on
log_directory = '/data/postgres/data/log'
log_filename = 'postgresql-%Y-%m-%d.log'
log_min_duration_statement = 1000
log_checkpoints = on
log_lock_waits = on

# Performance
random_page_cost = 1.1
effective_io_concurrency = 200
PGCONF

# Allow remote connections
echo "host all all 0.0.0.0/0 scram-sha-256" >> "${PGDATA}/pg_hba.conf"
log "PostgreSQL tuned for ZFS (full_page_writes=off, recordsize=8K)"

# ── Phase 5: Override systemd unit to use our PGDATA ────────────────────
mkdir -p /etc/systemd/system/postgresql.service.d
cat > /etc/systemd/system/postgresql.service.d/override.conf <<OVERRIDE
[Service]
Environment=PGDATA=${PGDATA}
OVERRIDE

systemctl daemon-reload
systemctl enable --now postgresql
log "PostgreSQL started on ZFS-backed storage"

# ── Phase 6: Backup cron ─────────────────────────────────────────────────
cat > /etc/cron.d/pg-backup <<'CRON'
# Nightly ZFS snapshot + pg_dump
0 2 * * * postgres pg_dumpall | gzip > /data/postgres/backups/full-$(date +\%Y\%m\%d).sql.gz 2>/dev/null
5 2 * * * root zfs snapshot rpool/data/postgres@nightly-$(date +\%Y\%m\%d) 2>/dev/null
# Keep 30 days of snapshots
10 2 * * * root zfs list -t snapshot -o name -H rpool/data/postgres | grep nightly | head -n -30 | xargs -r -n1 zfs destroy 2>/dev/null
CRON
log "Backup schedule configured (nightly pg_dump + ZFS snapshot)"

log "PostgreSQL postinstaller complete"

PostgreSQL on ZFS is a natural fit, but only if you get the recordsize right. 8K matches PG's page size. full_page_writes=off is safe because ZFS is copy-on-write. The backup strategy layers pg_dump (logical) on top of ZFS snapshots (physical). Belt and suspenders.

The recordsize=8K setting is not a suggestion. PostgreSQL writes 8KB pages. If ZFS uses its default 128KB recordsize, every 8KB PG write triggers a 128KB ZFS write — that is 16x write amplification. On SSDs this burns write endurance. On spinning disks this kills IOPS. Set recordsize=8K and write amplification drops to 1x. The full_page_writes=off setting is the other critical ZFS optimization: PostgreSQL writes full pages after a checkpoint to protect against torn pages on crash. ZFS is copy-on-write and checksummed — torn pages are physically impossible. Disabling full_page_writes saves ~30% write volume.

3. Docker host — Docker + Podman with ZFS storage driver

postinstall-docker.sh

#!/bin/bash
# Postinstaller: Docker + Podman with ZFS storage driver
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:docker] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR

source /etc/kldload/install-manifest.env 2>/dev/null || true
log "Starting Docker/Podman postinstaller on ${KLDLOAD_DISTRO:-centos}"

# ── Phase 1: ZFS datasets for container storage ─────────────────────────
# Docker uses ZFS snapshots for image layers — each layer is a snapshot
# This is the most space-efficient storage driver for Docker
zfs create -o recordsize=128K \
           -o compression=zstd \
           -o atime=off \
           rpool/data/docker
log "Created rpool/data/docker"

# Separate dataset for container volumes (user data)
zfs create -o recordsize=128K \
           -o compression=zstd \
           -o atime=off \
           rpool/data/docker-volumes
log "Created rpool/data/docker-volumes"

# Podman rootless storage
zfs create -o recordsize=128K \
           -o compression=zstd \
           -o atime=off \
           rpool/data/containers
log "Created rpool/data/containers"

# ── Phase 2: Install Docker + Podman ─────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora)
    dnf config-manager --add-repo \
      https://download.docker.com/linux/centos/docker-ce.repo 2>/dev/null || true
    dnf install -y docker-ce docker-ce-cli containerd.io \
                   docker-buildx-plugin docker-compose-plugin \
                   podman podman-compose
    ;;
  debian|ubuntu)
    apt-get update
    apt-get install -y docker.io docker-compose podman podman-compose
    ;;
  arch)
    pacman -S --noconfirm docker docker-compose podman podman-compose
    ;;
esac
log "Docker and Podman installed"

# ── Phase 3: Configure Docker to use ZFS storage driver ──────────────────
mkdir -p /etc/docker
cat > /etc/docker/daemon.json <<'JSON'
{
  "storage-driver": "zfs",
  "data-root": "/data/docker",
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "50m",
    "max-file": "5"
  },
  "default-ulimits": {
    "nofile": { "Name": "nofile", "Hard": 65535, "Soft": 65535 }
  },
  "live-restore": true,
  "userland-proxy": false,
  "default-address-pools": [
    { "base": "172.17.0.0/12", "size": 24 }
  ]
}
JSON
log "Docker configured with ZFS storage driver at /data/docker"

# ── Phase 4: Configure Podman for ZFS ────────────────────────────────────
mkdir -p /etc/containers
cat > /etc/containers/storage.conf <<'CONF'
[storage]
driver = "zfs"
graphroot = "/data/containers"

[storage.options.zfs]
fsname = "rpool/data/containers"
CONF
log "Podman configured with ZFS storage driver"

# ── Phase 5: Enable Docker, add deploy user ──────────────────────────────
systemctl enable --now docker

# Create a non-root deploy user with docker access
useradd -m -s /bin/bash -G docker deploy 2>/dev/null || true
log "Docker started, deploy user created"

# ── Phase 6: Prune timer ─────────────────────────────────────────────────
cat > /etc/systemd/system/docker-prune.timer <<'UNIT'
[Unit]
Description=Weekly Docker prune

[Timer]
OnCalendar=Sun *-*-* 03:00:00
Persistent=true

[Install]
WantedBy=timers.target
UNIT

cat > /etc/systemd/system/docker-prune.service <<'UNIT'
[Unit]
Description=Docker system prune

[Service]
Type=oneshot
ExecStart=/usr/bin/docker system prune -af --volumes --filter "until=168h"
UNIT

systemctl daemon-reload
systemctl enable docker-prune.timer
log "Weekly Docker prune timer enabled"

log "Docker/Podman postinstaller complete"

Docker on ZFS means every image layer is a ZFS snapshot. Pulling an image is creating snapshots. Running a container is cloning a snapshot. Deleting a container reclaims only the divergent blocks. No overlay2 mess, no devicemapper pain. Just ZFS doing what it does best.

4. Kubernetes node — kubeadm + containerd + ZFS

postinstall-k8s.sh

#!/bin/bash
# Postinstaller: Kubernetes node (control plane or worker)
# Role is determined by K8S_ROLE variable (defaults to worker)
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:k8s] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR

source /etc/kldload/install-manifest.env 2>/dev/null || true
K8S_ROLE="${K8S_ROLE:-worker}"
K8S_VERSION="${K8S_VERSION:-1.30}"
log "Starting Kubernetes ${K8S_ROLE} postinstaller"

# ── Phase 1: Kernel prerequisites ────────────────────────────────────────
cat > /etc/modules-load.d/k8s.conf <<'EOF'
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter

cat > /etc/sysctl.d/99-k8s.conf <<'EOF'
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
net.ipv4.conf.all.forwarding        = 1
EOF
sysctl --system >/dev/null
log "Kernel modules and sysctl configured"

# ── Phase 2: ZFS datasets ────────────────────────────────────────────────
zfs create -o recordsize=128K \
           -o compression=zstd \
           -o atime=off \
           rpool/data/containerd
zfs create -o recordsize=128K \
           -o compression=zstd \
           -o atime=off \
           rpool/data/kubelet
log "ZFS datasets created for containerd and kubelet"

# ── Phase 3: Install containerd ──────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora)
    dnf install -y containerd.io
    ;;
  debian|ubuntu)
    apt-get update
    apt-get install -y containerd
    ;;
esac

# Generate default config and enable SystemdCgroup
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml

# Point containerd at ZFS-backed storage
sed -i "s|root = .*|root = \"/data/containerd\"|" /etc/containerd/config.toml

systemctl enable --now containerd
log "containerd installed and configured with SystemdCgroup"

# ── Phase 4: Install kubeadm, kubelet, kubectl ──────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora)
    cat > /etc/yum.repos.d/kubernetes.repo <<REPO
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/rpm/repodata/repomd.xml.key
REPO
    dnf install -y kubelet kubeadm kubectl
    ;;
  debian|ubuntu)
    apt-get install -y apt-transport-https ca-certificates curl
    curl -fsSL "https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/Release.key" \
      | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
    echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/ /" \
      > /etc/apt/sources.list.d/kubernetes.list
    apt-get update
    apt-get install -y kubelet kubeadm kubectl
    ;;
esac

# Point kubelet at ZFS-backed directory
mkdir -p /etc/systemd/system/kubelet.service.d
cat > /etc/systemd/system/kubelet.service.d/override.conf <<'OVERRIDE'
[Service]
Environment="KUBELET_EXTRA_ARGS=--root-dir=/data/kubelet"
OVERRIDE

systemctl daemon-reload
systemctl enable kubelet
log "kubeadm, kubelet, kubectl installed (v${K8S_VERSION})"

# ── Phase 5: Disable swap (Kubernetes requirement) ──────────────────────
swapoff -a 2>/dev/null || true
sed -i '/swap/d' /etc/fstab 2>/dev/null || true
log "Swap disabled"

# ── Phase 6: Initialize or join ──────────────────────────────────────────
if [[ "${K8S_ROLE}" == "control-plane" ]]; then
  log "Control plane node — run 'kubeadm init' manually or via automation"
  log "Suggested: kubeadm init --pod-network-cidr=10.244.0.0/16"
else
  log "Worker node — waiting for join command from control plane"
  log "Place join command in /root/darksite/k8s-join.sh and it will execute"
  if [[ -x /root/darksite/k8s-join.sh ]]; then
    log "Found k8s-join.sh — executing"
    bash /root/darksite/k8s-join.sh
    log "Joined cluster"
  fi
fi

log "Kubernetes ${K8S_ROLE} postinstaller complete"

5. Monitoring stack — Prometheus + Grafana + node_exporter

postinstall-monitoring.sh

#!/bin/bash
# Postinstaller: Prometheus + Grafana + node_exporter monitoring stack
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:monitoring] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR

source /etc/kldload/install-manifest.env 2>/dev/null || true
PROM_VERSION="2.51.0"
GRAFANA_VERSION="10.4.1"
log "Starting monitoring stack postinstaller"

# ── Phase 1: ZFS datasets ────────────────────────────────────────────────
# Prometheus TSDB: 128K recordsize for chunk files, high compression
zfs create -o recordsize=128K \
           -o compression=zstd \
           -o atime=off \
           -o primarycache=all \
           rpool/data/prometheus
log "Created rpool/data/prometheus"

# Grafana: small DB files, default tuning is fine
zfs create -o recordsize=16K \
           -o compression=zstd \
           -o atime=off \
           rpool/data/grafana
log "Created rpool/data/grafana"

# ── Phase 2: Create service users ────────────────────────────────────────
useradd -r -s /sbin/nologin -d /data/prometheus prometheus 2>/dev/null || true
useradd -r -s /sbin/nologin -d /data/grafana grafana 2>/dev/null || true

# ── Phase 3: Install node_exporter ───────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora)
    dnf install -y golang-github-prometheus-node-exporter
    ;;
  debian|ubuntu)
    apt-get update
    apt-get install -y prometheus-node-exporter
    ;;
  *)
    # Manual install for other distros
    curl -fsSL "https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz" \
      | tar xz -C /usr/local/bin --strip-components=1 --wildcards '*/node_exporter'
    ;;
esac
systemctl enable --now node_exporter 2>/dev/null || {
  # Create unit if distro package did not
  cat > /etc/systemd/system/node_exporter.service <<'UNIT'
[Unit]
Description=Prometheus Node Exporter
After=network-online.target
[Service]
Type=simple
User=nobody
ExecStart=/usr/local/bin/node_exporter
Restart=always
[Install]
WantedBy=multi-user.target
UNIT
  systemctl daemon-reload
  systemctl enable --now node_exporter
}
log "node_exporter installed and running on :9100"

# ── Phase 4: Install Prometheus ──────────────────────────────────────────
cd /tmp
curl -fsSL "https://github.com/prometheus/prometheus/releases/download/v${PROM_VERSION}/prometheus-${PROM_VERSION}.linux-amd64.tar.gz" \
  | tar xz
install -m 0755 "prometheus-${PROM_VERSION}.linux-amd64/prometheus" /usr/local/bin/
install -m 0755 "prometheus-${PROM_VERSION}.linux-amd64/promtool" /usr/local/bin/
rm -rf "prometheus-${PROM_VERSION}.linux-amd64"

mkdir -p /etc/prometheus
cat > /etc/prometheus/prometheus.yml <<'YAML'
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

  # Add more targets here:
  # - job_name: 'postgres'
  #   static_configs:
  #     - targets: ['db-01:9187']
  #
  # - job_name: 'nginx'
  #   static_configs:
  #     - targets: ['web-01:9113']
YAML

chown -R prometheus:prometheus /data/prometheus /etc/prometheus

cat > /etc/systemd/system/prometheus.service <<'UNIT'
[Unit]
Description=Prometheus
After=network-online.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/data/prometheus \
  --storage.tsdb.retention.time=90d \
  --web.enable-lifecycle
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
UNIT

systemctl daemon-reload
systemctl enable --now prometheus
log "Prometheus installed (90d retention on ZFS, port 9090)"

# ── Phase 5: Install Grafana ─────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora)
    cat > /etc/yum.repos.d/grafana.repo <<'REPO'
[grafana]
name=grafana
baseurl=https://rpm.grafana.com
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://rpm.grafana.com/gpg.key
REPO
    dnf install -y grafana
    ;;
  debian|ubuntu)
    apt-get install -y apt-transport-https software-properties-common
    curl -fsSL https://apt.grafana.com/gpg.key | gpg --dearmor -o /etc/apt/keyrings/grafana.gpg
    echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" \
      > /etc/apt/sources.list.d/grafana.list
    apt-get update
    apt-get install -y grafana
    ;;
esac

# Point Grafana data at ZFS dataset
sed -i "s|;data = .*|data = /data/grafana|" /etc/grafana/grafana.ini 2>/dev/null || true
chown -R grafana:grafana /data/grafana

# Auto-provision Prometheus as a data source
mkdir -p /etc/grafana/provisioning/datasources
cat > /etc/grafana/provisioning/datasources/prometheus.yaml <<'YAML'
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9090
    isDefault: true
YAML

systemctl enable --now grafana-server
log "Grafana installed and running on :3000 (default: admin/admin)"

# ── Phase 6: ZFS snapshot schedule for metrics data ─────────────────────
cat > /etc/cron.d/zfs-monitoring-snapshots <<'CRON'
# Daily snapshots of Prometheus and Grafana data
0 4 * * * root zfs snapshot rpool/data/prometheus@daily-$(date +\%Y\%m\%d) 2>/dev/null
0 4 * * * root zfs snapshot rpool/data/grafana@daily-$(date +\%Y\%m\%d) 2>/dev/null
# Keep 30 days
5 4 * * * root for ds in prometheus grafana; do zfs list -t snapshot -o name -H rpool/data/$ds | grep daily | head -n -30 | xargs -r -n1 zfs destroy; done 2>/dev/null
CRON
log "ZFS snapshot schedule configured for monitoring data"

log "Monitoring stack postinstaller complete"
log "  Prometheus: http://$(hostname -I | awk '{print $1}'):9090"
log "  Grafana:    http://$(hostname -I | awk '{print $1}'):3000"
log "  Node Exporter: http://$(hostname -I | awk '{print $1}'):9100/metrics"

6. NFS server — ZFS sharenfs + tuned for NFS workloads

postinstall-nfs.sh

#!/bin/bash
# Postinstaller: NFS file server with ZFS sharenfs
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:nfs] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR

source /etc/kldload/install-manifest.env 2>/dev/null || true
NFS_SUBNET="${NFS_SUBNET:-10.0.0.0/24}"
log "Starting NFS server postinstaller (allowed subnet: ${NFS_SUBNET})"

# ── Phase 1: Install NFS server ──────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora) dnf install -y nfs-utils ;;
  debian|ubuntu) apt-get update && apt-get install -y nfs-kernel-server ;;
  arch) pacman -S --noconfirm nfs-utils ;;
esac
log "NFS packages installed"

# ── Phase 2: Create ZFS datasets with sharenfs ──────────────────────────
# General file share: large records for bulk transfers
zfs create -o recordsize=1M \
           -o compression=zstd \
           -o atime=off \
           -o sharenfs="rw=@${NFS_SUBNET},sync,no_subtree_check,no_root_squash" \
           rpool/data/share
log "Created rpool/data/share (sharenfs enabled for ${NFS_SUBNET})"

# Home directories: mixed I/O pattern
zfs create -o recordsize=128K \
           -o compression=zstd \
           -o atime=on \
           -o sharenfs="rw=@${NFS_SUBNET},sync,no_subtree_check" \
           rpool/data/home
log "Created rpool/data/home (sharenfs, with atime for maildir)"

# ISO/image storage: large sequential reads, read-mostly
zfs create -o recordsize=1M \
           -o compression=off \
           -o atime=off \
           -o sharenfs="ro=@${NFS_SUBNET},async,no_subtree_check" \
           rpool/data/images
log "Created rpool/data/images (read-only NFS share)"

# ── Phase 3: NFS performance tuning ──────────────────────────────────────
# Increase NFS threads to match ZFS's I/O parallelism
mkdir -p /etc/nfs.conf.d
cat > /etc/nfs.conf.d/local.conf <<'CONF'
[nfsd]
threads = 32
udp = n
tcp = y
vers3 = n
vers4 = y
vers4.1 = y
vers4.2 = y
CONF
log "NFS tuned: 32 threads, NFSv4.2 only"

# ── Phase 4: Enable and start ────────────────────────────────────────────
systemctl enable --now nfs-server
# ZFS sharenfs auto-exports — verify
exportfs -v
log "NFS server running, exports:"
zfs get sharenfs rpool/data/share rpool/data/home rpool/data/images

# ── Phase 5: Snapshot schedule ───────────────────────────────────────────
cat > /etc/cron.d/zfs-nfs-snapshots <<'CRON'
# Hourly snapshots of all NFS shares
0 * * * * root for ds in share home; do zfs snapshot rpool/data/$ds@hourly-$(date +\%Y\%m\%d-\%H00) 2>/dev/null; done
# Daily cleanup: keep 48 hourly snapshots
5 0 * * * root for ds in share home; do zfs list -t snapshot -o name -H rpool/data/$ds | grep hourly | head -n -48 | xargs -r -n1 zfs destroy; done 2>/dev/null
CRON
log "Snapshot schedule: hourly, keep 48"

log "NFS server postinstaller complete"

ZFS's sharenfs property eliminates /etc/exports entirely. Create a dataset, set the property, and it is exported. No separate configuration file to maintain. The NFS share and the filesystem are the same object.

7. CI runner — GitLab Runner + Docker-in-Docker on ZFS

postinstall-ci-runner.sh

#!/bin/bash
# Postinstaller: GitLab CI Runner with Docker executor on ZFS
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:ci-runner] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR

source /etc/kldload/install-manifest.env 2>/dev/null || true
GITLAB_URL="${GITLAB_URL:-https://gitlab.com}"
RUNNER_TOKEN="${RUNNER_TOKEN:-REPLACE_ME}"
RUNNER_TAGS="${RUNNER_TAGS:-kldload,zfs,docker}"
CONCURRENT_JOBS="${CONCURRENT_JOBS:-4}"
log "Starting CI runner postinstaller"

# ── Phase 1: ZFS datasets ────────────────────────────────────────────────
# Docker storage for build images
zfs create -o recordsize=128K \
           -o compression=zstd \
           -o atime=off \
           rpool/data/docker

# Build cache: large layer tarballs, max compression
zfs create -o recordsize=1M \
           -o compression=zstd-5 \
           -o atime=off \
           rpool/data/ci-cache

# Build workspace: scratch space, destroyed after each job
zfs create -o recordsize=128K \
           -o compression=zstd \
           -o atime=off \
           rpool/data/ci-builds
log "ZFS datasets created for CI workloads"

# ── Phase 2: Install Docker ──────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora)
    dnf config-manager --add-repo \
      https://download.docker.com/linux/centos/docker-ce.repo 2>/dev/null || true
    dnf install -y docker-ce docker-ce-cli containerd.io
    ;;
  debian|ubuntu)
    apt-get update
    apt-get install -y docker.io
    ;;
esac

mkdir -p /etc/docker
cat > /etc/docker/daemon.json <<'JSON'
{
  "storage-driver": "zfs",
  "data-root": "/data/docker",
  "log-driver": "json-file",
  "log-opts": { "max-size": "20m", "max-file": "3" }
}
JSON
systemctl enable --now docker
log "Docker installed with ZFS storage driver"

# ── Phase 3: Install GitLab Runner ───────────────────────────────────────
curl -fsSL "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.rpm.sh" \
  | bash 2>/dev/null || true
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora) dnf install -y gitlab-runner ;;
  debian|ubuntu) apt-get install -y gitlab-runner ;;
esac
log "GitLab Runner installed"

# ── Phase 4: Register runner ─────────────────────────────────────────────
if [[ "${RUNNER_TOKEN}" != "REPLACE_ME" ]]; then
  gitlab-runner register \
    --non-interactive \
    --url "${GITLAB_URL}" \
    --token "${RUNNER_TOKEN}" \
    --executor "docker" \
    --docker-image "alpine:latest" \
    --docker-privileged \
    --docker-volumes "/data/ci-cache:/cache" \
    --docker-volumes "/data/ci-builds:/builds" \
    --tag-list "${RUNNER_TAGS}" \
    --run-untagged="true" \
    --locked="false"
  log "Runner registered with ${GITLAB_URL} (tags: ${RUNNER_TAGS})"
else
  log "RUNNER_TOKEN not set — register manually:"
  log "  gitlab-runner register --url ${GITLAB_URL} --token YOUR_TOKEN"
fi

# ── Phase 5: Configure concurrency ───────────────────────────────────────
sed -i "s/concurrent = .*/concurrent = ${CONCURRENT_JOBS}/" /etc/gitlab-runner/config.toml
systemctl enable --now gitlab-runner
log "Runner configured for ${CONCURRENT_JOBS} concurrent jobs"

# ── Phase 6: Cleanup cron (ZFS makes this cheap) ────────────────────────
cat > /etc/cron.d/ci-cleanup <<'CRON'
# Prune Docker images older than 24h every 6 hours
0 */6 * * * root docker system prune -af --filter "until=24h" >/dev/null 2>&1
# Snapshot CI cache daily
0 3 * * * root zfs snapshot rpool/data/ci-cache@daily-$(date +\%Y\%m\%d) 2>/dev/null
CRON
log "Cleanup schedule configured"

log "CI runner postinstaller complete"

8. VPN gateway — WireGuard hub + nftables + routing

postinstall-vpn-gateway.sh

#!/bin/bash
# Postinstaller: WireGuard VPN gateway with nftables NAT and routing
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:vpn-gw] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR

source /etc/kldload/install-manifest.env 2>/dev/null || true
WG_PORT="${WG_PORT:-51820}"
WG_NET="${WG_NET:-10.200.0.0/24}"
WG_ADDR="${WG_ADDR:-10.200.0.1/24}"
WAN_IFACE="$(ip route show default | awk '{print $5}' | head -1)"
log "Starting VPN gateway postinstaller (WAN: ${WAN_IFACE}, WG net: ${WG_NET})"

# ── Phase 1: Install packages ────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora) dnf install -y wireguard-tools nftables qrencode ;;
  debian|ubuntu) apt-get update && apt-get install -y wireguard nftables qrencode ;;
  arch) pacman -S --noconfirm wireguard-tools nftables qrencode ;;
esac
log "Packages installed"

# ── Phase 2: Enable IP forwarding ────────────────────────────────────────
cat > /etc/sysctl.d/99-vpn-gateway.conf <<'SYSCTL'
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.ipv4.conf.all.proxy_arp = 0
SYSCTL
sysctl --system >/dev/null
log "IP forwarding enabled"

# ── Phase 3: Generate WireGuard keys ─────────────────────────────────────
mkdir -p /etc/wireguard
chmod 700 /etc/wireguard
wg genkey | tee /etc/wireguard/server.key | wg pubkey > /etc/wireguard/server.pub
chmod 600 /etc/wireguard/server.key

cat > /etc/wireguard/wg0.conf <<EOF
[Interface]
Address = ${WG_ADDR}
ListenPort = ${WG_PORT}
PrivateKey = $(cat /etc/wireguard/server.key)
SaveConfig = false

# Add peers below or use the add-vpn-client script
EOF
chmod 600 /etc/wireguard/wg0.conf
log "WireGuard server key generated, listening on :${WG_PORT}"

# ── Phase 4: nftables firewall + NAT ─────────────────────────────────────
cat > /etc/nftables.conf <<NFT
#!/usr/sbin/nft -f
flush ruleset

table inet filter {
    chain input {
        type filter hook input priority 0; policy drop;
        iif "lo" accept
        ct state established,related accept
        ct state invalid drop
        tcp dport 22 accept
        udp dport ${WG_PORT} accept
        iifname "wg0" accept
        icmp type echo-request accept
        icmpv6 type { echo-request, nd-neighbor-solicit, nd-router-advert, nd-neighbor-advert } accept
    }
    chain forward {
        type filter hook forward priority 0; policy drop;
        iifname "wg0" accept
        oifname "wg0" ct state established,related accept
    }
    chain output {
        type filter hook output priority 0; policy accept;
    }
}

table inet nat {
    chain postrouting {
        type nat hook postrouting priority 100;
        oifname "${WAN_IFACE}" masquerade
    }
}
NFT

systemctl enable --now nftables
log "nftables configured: NAT masquerade on ${WAN_IFACE}, WG traffic forwarded"

# ── Phase 5: Client provisioning script ──────────────────────────────────
cat > /usr/local/bin/add-vpn-client <<'SCRIPT'
#!/bin/bash
# Usage: add-vpn-client 
set -euo pipefail
CLIENT="${1:?Usage: add-vpn-client }"
WG_DIR="/etc/wireguard"
CLIENT_DIR="${WG_DIR}/clients/${CLIENT}"

# Find next available IP
LAST_IP=$(grep -oP 'AllowedIPs = 10\.200\.0\.\K[0-9]+' "${WG_DIR}/wg0.conf" 2>/dev/null | sort -n | tail -1)
NEXT_IP=$((${LAST_IP:-1} + 1))

mkdir -p "${CLIENT_DIR}"
wg genkey | tee "${CLIENT_DIR}/private.key" | wg pubkey > "${CLIENT_DIR}/public.key"

# Add peer to server config
cat >> "${WG_DIR}/wg0.conf" < "${CLIENT_DIR}/${CLIENT}.conf" <# ── Phase 6: Start WireGuard ─────────────────────────────────────────────
systemctl enable --now wg-quick@wg0
log "WireGuard VPN gateway running"
log "Add clients with: add-vpn-client <name>"

log "VPN gateway postinstaller complete"

A VPN gateway in one script: WireGuard server, nftables NAT, IP forwarding, and a client provisioning tool that generates configs with QR codes. Plug in the USB, boot, and you have a VPN endpoint.

9. Mail server — Postfix + Dovecot on ZFS

postinstall-mail.sh

#!/bin/bash
# Postinstaller: Mail server with Postfix (SMTP) + Dovecot (IMAP) on ZFS
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:mail] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR

source /etc/kldload/install-manifest.env 2>/dev/null || true
MAIL_DOMAIN="${MAIL_DOMAIN:-$(hostname -d)}"
log "Starting mail server postinstaller (domain: ${MAIL_DOMAIN})"

# ── Phase 1: ZFS datasets ────────────────────────────────────────────────
# Mailboxes: many small files (individual emails), enable atime for IMAP
zfs create -o recordsize=16K \
           -o compression=zstd \
           -o atime=on \
           -o relatime=on \
           rpool/data/mail

# Mail queue: temporary spool, fast writes
zfs create -o recordsize=64K \
           -o compression=lz4 \
           -o atime=off \
           -o sync=disabled \
           rpool/data/mail/queue

# Mail logs
zfs create -o recordsize=128K \
           -o compression=zstd-3 \
           -o atime=off \
           rpool/data/logs/mail
log "ZFS datasets created for mail storage"

# ── Phase 2: Install packages ────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora)
    dnf install -y postfix dovecot dovecot-pigeonhole opendkim opendkim-tools
    ;;
  debian|ubuntu)
    DEBIAN_FRONTEND=noninteractive apt-get install -y \
      postfix dovecot-imapd dovecot-lmtpd dovecot-sieve \
      opendkim opendkim-tools
    ;;
esac
log "Mail packages installed"

# ── Phase 3: Postfix configuration ───────────────────────────────────────
cat > /etc/postfix/main.cf <<POSTFIX
# Basic settings
myhostname = mail.${MAIL_DOMAIN}
mydomain = ${MAIL_DOMAIN}
myorigin = \$mydomain
mydestination = \$myhostname, localhost.\$mydomain, localhost, \$mydomain
inet_interfaces = all
inet_protocols = all

# TLS (generate certs with certbot after DNS is pointed)
smtpd_tls_cert_file = /etc/pki/tls/certs/localhost.crt
smtpd_tls_key_file = /etc/pki/tls/private/localhost.key
smtpd_use_tls = yes
smtpd_tls_auth_only = yes
smtpd_tls_security_level = may
smtp_tls_security_level = may

# SASL authentication via Dovecot
smtpd_sasl_type = dovecot
smtpd_sasl_path = private/auth
smtpd_sasl_auth_enable = yes
smtpd_recipient_restrictions = permit_mynetworks, permit_sasl_authenticated, reject_unauth_destination

# Delivery via Dovecot LMTP
virtual_transport = lmtp:unix:private/dovecot-lmtp
virtual_mailbox_domains = ${MAIL_DOMAIN}

# Queue on ZFS
queue_directory = /data/mail/queue

# Limits
message_size_limit = 52428800
mailbox_size_limit = 0
POSTFIX
log "Postfix configured for ${MAIL_DOMAIN}"

# ── Phase 4: Dovecot configuration ───────────────────────────────────────
cat > /etc/dovecot/local.conf <<'DOVECOT'
protocols = imap lmtp sieve

mail_location = maildir:/data/mail/users/%u/Maildir
mail_privileged_group = mail

# Authentication
auth_mechanisms = plain login
passdb {
  driver = pam
}
userdb {
  driver = passwd
}

# LMTP socket for Postfix
service lmtp {
  unix_listener /var/spool/postfix/private/dovecot-lmtp {
    mode = 0600
    user = postfix
    group = postfix
  }
}

# SASL socket for Postfix
service auth {
  unix_listener /var/spool/postfix/private/auth {
    mode = 0660
    user = postfix
    group = postfix
  }
}

# Sieve filtering
plugin {
  sieve = /data/mail/users/%u/sieve/active.sieve
  sieve_dir = /data/mail/users/%u/sieve
}

# TLS
ssl = required
ssl_cert = # ── Phase 5: Enable services ─────────────────────────────────────────────
systemctl enable --now postfix dovecot
log "Postfix and Dovecot started"

# ── Phase 6: Per-user mailbox snapshots ──────────────────────────────────
cat > /etc/cron.d/zfs-mail-snapshots <<'CRON'
# Hourly snapshots of all mail, keep 72 hours
0 * * * * root zfs snapshot rpool/data/mail@hourly-$(date +\%Y\%m\%d-\%H00) 2>/dev/null
5 0 * * * root zfs list -t snapshot -o name -H rpool/data/mail | grep hourly | head -n -72 | xargs -r -n1 zfs destroy 2>/dev/null
CRON
log "Mail snapshot schedule: hourly, keep 72"

log "Mail server postinstaller complete"
log "  SMTP: port 25/587  IMAP: port 993"
log "  Replace TLS certs with real ones via certbot"

10. Desktop workstation — dev tools, VSCode, Docker, GPU drivers

postinstall-workstation.sh

#!/bin/bash
# Postinstaller: Developer workstation with GNOME, Docker, dev tools
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:workstation] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR

source /etc/kldload/install-manifest.env 2>/dev/null || true
DEV_USER="${DEV_USER:-dev}"
log "Starting workstation postinstaller (user: ${DEV_USER})"

# ── Phase 1: ZFS datasets for dev work ───────────────────────────────────
zfs create -o recordsize=128K \
           -o compression=zstd \
           -o atime=off \
           rpool/data/docker

zfs create -o recordsize=128K \
           -o compression=zstd \
           -o atime=off \
           "rpool/home/${DEV_USER}/projects"

zfs create -o recordsize=1M \
           -o compression=zstd-5 \
           -o atime=off \
           "rpool/home/${DEV_USER}/vms"
log "ZFS datasets created for development"

# ── Phase 2: Development tools ───────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky)
    dnf install -y \
      gcc gcc-c++ make cmake git curl wget jq htop tmux vim \
      python3 python3-pip python3-devel \
      nodejs npm \
      golang \
      openssl-devel zlib-devel readline-devel \
      libvirt virt-install qemu-kvm \
      flatpak
    ;;
  fedora)
    dnf install -y \
      gcc gcc-c++ make cmake git curl wget jq htop tmux vim \
      python3 python3-pip python3-devel \
      nodejs npm \
      golang \
      openssl-devel zlib-devel readline-devel \
      libvirt virt-install qemu-kvm \
      flatpak code
    ;;
  debian|ubuntu)
    apt-get update
    apt-get install -y \
      build-essential cmake git curl wget jq htop tmux vim \
      python3 python3-pip python3-venv python3-dev \
      nodejs npm \
      golang-go \
      libssl-dev zlib1g-dev libreadline-dev \
      libvirt-daemon-system virtinst qemu-kvm \
      flatpak
    ;;
esac
log "Development tools installed"

# ── Phase 3: Docker ──────────────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora)
    dnf config-manager --add-repo \
      https://download.docker.com/linux/centos/docker-ce.repo 2>/dev/null || true
    dnf install -y docker-ce docker-ce-cli containerd.io \
                   docker-buildx-plugin docker-compose-plugin
    ;;
  debian|ubuntu)
    apt-get install -y docker.io docker-compose
    ;;
esac

mkdir -p /etc/docker
cat > /etc/docker/daemon.json <<'JSON'
{
  "storage-driver": "zfs",
  "data-root": "/data/docker",
  "log-driver": "json-file",
  "log-opts": { "max-size": "50m", "max-file": "3" }
}
JSON
systemctl enable --now docker
log "Docker configured with ZFS storage driver"

# ── Phase 4: Create developer user ───────────────────────────────────────
useradd -m -s /bin/bash -G docker,libvirt,wheel "${DEV_USER}" 2>/dev/null || true
chown -R "${DEV_USER}:${DEV_USER}" "/home/${DEV_USER}"

# Git config
sudo -u "${DEV_USER}" git config --global init.defaultBranch main
sudo -u "${DEV_USER}" git config --global pull.rebase true

# Shell enhancements
cat >> "/home/${DEV_USER}/.bashrc" <<'BASHRC'

# Dev environment
export EDITOR=vim
export PATH="${HOME}/.local/bin:${HOME}/go/bin:${PATH}"

# Aliases
alias ll='ls -alF'
alias gs='git status'
alias gd='git diff'
alias gl='git log --oneline --graph -20'
alias dc='docker compose'
alias k='kubectl'
BASHRC
log "Developer user ${DEV_USER} configured"

# ── Phase 5: VSCode (Flatpak for distro independence) ────────────────────
flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo 2>/dev/null || true
flatpak install -y flathub com.visualstudio.code 2>/dev/null || {
  log "Flatpak VSCode install failed — install manually"
}
log "VSCode installed via Flatpak"

# ── Phase 6: NVIDIA drivers (if GPU detected) ───────────────────────────
if lspci | grep -qi nvidia; then
  log "NVIDIA GPU detected — installing drivers"
  case "${KLDLOAD_DISTRO:-centos}" in
    centos|rhel|rocky)
      dnf install -y epel-release
      dnf install -y nvidia-driver nvidia-driver-cuda
      ;;
    fedora)
      dnf install -y akmod-nvidia xorg-x11-drv-nvidia-cuda
      ;;
    debian|ubuntu)
      apt-get install -y nvidia-driver firmware-misc-nonfree 2>/dev/null || \
      apt-get install -y nvidia-driver-535 2>/dev/null || \
        log "NVIDIA driver install failed — install manually"
      ;;
  esac
  log "NVIDIA drivers installed"
else
  log "No NVIDIA GPU detected — skipping driver install"
fi

# ── Phase 7: Snapshot the dev environment ────────────────────────────────
zfs snapshot "rpool/home/${DEV_USER}@fresh-install"
log "Snapshot taken: rpool/home/${DEV_USER}@fresh-install"

log "Workstation postinstaller complete"

A developer workstation is the most personal kind of server. The postinstaller sets up the tools and the environment. The ZFS snapshot at the end means you can always roll back to a clean state after a week of installing random npm packages.

The workstation postinstaller demonstrates something important: the same technique that deploys production servers also sets up developer laptops. The fresh-install snapshot at the end is the developer equivalent of a golden image. Two months from now, after installing 47 Python packages and three conflicting versions of Node, you can zfs rollback rpool/home/dev@fresh-install and be back to day one in seconds. No reinstall. No "nuke and pave." Just rewind the filesystem. This is what ZFS on root gives you that ext4 and Btrfs do not: fearless experimentation with instant undo.

11. Hardened bastion — SSH jump host with audit logging

postinstall-bastion.sh

#!/bin/bash
# Postinstaller: Hardened SSH bastion / jump host
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:bastion] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR

source /etc/kldload/install-manifest.env 2>/dev/null || true
ALLOWED_USERS="${ALLOWED_USERS:-}"
log "Starting bastion postinstaller"

# ── Phase 1: ZFS dataset for audit logs ──────────────────────────────────
zfs create -o recordsize=16K \
           -o compression=zstd \
           -o atime=off \
           rpool/data/audit
chmod 700 /data/audit
log "Created rpool/data/audit"

# ── Phase 2: Harden SSH ──────────────────────────────────────────────────
cat > /etc/ssh/sshd_config.d/99-bastion.conf <<'SSHD'
# Bastion hardening
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AuthenticationMethods publickey
MaxAuthTries 3
MaxSessions 10
ClientAliveInterval 300
ClientAliveCountMax 2
X11Forwarding no
AllowTcpForwarding yes
AllowAgentForwarding yes
PermitTunnel no

# Logging
LogLevel VERBOSE
SyslogFacility AUTH

# Crypto
KexAlgorithms curve25519-sha256,curve25519-sha256@libssh.org
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com
HostKeyAlgorithms ssh-ed25519
SSHD

systemctl restart sshd
log "SSH hardened: key-only, ed25519, verbose logging"

# ── Phase 3: Session recording ───────────────────────────────────────────
# Record all SSH sessions to ZFS-backed audit log
case "${KLDLOAD_DISTRO:-centos}" in
  centos|rhel|rocky|fedora) dnf install -y tlog ;;
  debian|ubuntu) apt-get update && apt-get install -y tlog ;;
  *) log "tlog not available for this distro — skipping session recording" ;;
esac

if command -v tlog-rec >/dev/null 2>&1; then
  cat > /etc/tlog/tlog-rec-session.conf <<'TLOG'
{
    "shell": "/bin/bash",
    "writer": "file",
    "file": {
        "path": "/data/audit/tlog-sessions.log"
    }
}
TLOG
  log "tlog session recording enabled"
fi

# ── Phase 4: Strict firewall ─────────────────────────────────────────────
cat > /etc/nftables.conf <<'NFT'
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
    chain input {
        type filter hook input priority 0; policy drop;
        iif "lo" accept
        ct state established,related accept
        ct state invalid drop
        tcp dport 22 accept
        icmp type echo-request limit rate 5/second accept
    }
    chain forward { type filter hook forward priority 0; policy drop; }
    chain output { type filter hook output priority 0; policy accept; }
}
NFT
systemctl enable --now nftables
log "Firewall: SSH only, ICMP rate-limited"

# ── Phase 5: Audit log snapshots (immutable history) ────────────────────
cat > /etc/cron.d/zfs-audit-snapshots <<'CRON'
# Hourly snapshots of audit logs — NEVER auto-delete these
0 * * * * root zfs snapshot rpool/data/audit@hourly-$(date +\%Y\%m\%d-\%H00) 2>/dev/null
CRON
log "Audit log snapshots: hourly, never deleted"

log "Bastion postinstaller complete — SSH-only jump host ready"

A bastion host has one job: be the front door. Harden SSH, record sessions, log everything, snapshot the logs. The ZFS snapshots of audit logs are immutable history that cannot be tampered with after the fact.

Composing postinstallers — combining multiple scripts

Real infrastructure is rarely a single role. A monitoring server might also need Docker for running Grafana. A web server might need monitoring agents. Instead of writing one monolithic script, compose multiple focused postinstallers into a pipeline.

The dispatcher pattern

Instead of putting everything in postinstall.sh, use it as a dispatcher that sources role-specific scripts in order. Each script is self-contained and testable independently.

#!/bin/bash
# postinstall.sh — dispatcher
# Sources role-specific scripts from /root/darksite/roles/
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR

ROLE_DIR="/root/darksite/roles"
source /etc/kldload/install-manifest.env 2>/dev/null || true

# Define the role pipeline — order matters
ROLES=(
  "00-base-hardening.sh"     # Always: sysctl, firewall, SSH hardening
  "10-zfs-datasets.sh"       # Always: create application ZFS datasets
  "20-docker.sh"             # If needed: Docker + ZFS storage driver
  "30-monitoring-agent.sh"   # Always: node_exporter + promtail
  "40-application.sh"        # Role-specific: web server, database, etc.
  "90-verification.sh"       # Always: verify all services are running
)

for role in "${ROLES[@]}"; do
  script="${ROLE_DIR}/${role}"
  if [[ -f "${script}" ]]; then
    log "Running role: ${role}"
    source "${script}"
    log "Completed role: ${role}"
  else
    log "Skipping role: ${role} (not found)"
  fi
done

log "All roles complete"

The dispatcher is a foreman. It calls each trade in order: electrician, plumber, painter. Each trade does its job and leaves. The foreman logs who showed up and who did not.

Directory layout for composed postinstallers

live-build/config/includes.chroot/root/darksite/
├── postinstall.sh                  # Dispatcher (sources roles in order)
├── roles/
│   ├── 00-base-hardening.sh        # sysctl, SSH, nftables
│   ├── 10-zfs-datasets.sh          # Create rpool/data/* datasets
│   ├── 20-docker.sh                # Docker + ZFS storage driver
│   ├── 30-monitoring-agent.sh      # node_exporter, promtail
│   ├── 40-web-server.sh            # nginx + certbot (web role)
│   ├── 40-database.sh              # PostgreSQL (db role)
│   ├── 40-k8s-node.sh              # kubeadm + containerd (k8s role)
│   └── 90-verification.sh          # Verify services, log summary
└── config/
    ├── nginx.conf                  # Pre-built config files
    ├── postgresql.conf             # (rather than heredocs in scripts)
    ├── prometheus.yml
    └── nftables.conf

The numbering convention controls execution order. The 40-* prefix indicates application-layer scripts — only include the one matching your target role. The 00-* through 30-* scripts are shared across all roles.

Dependency management between roles

# 40-application.sh — check that dependencies ran first

# Require Docker (role 20)
if ! command -v docker >/dev/null 2>&1; then
  log "ERROR: Docker not found — did 20-docker.sh run?"
  exit 1
fi

# Require ZFS datasets (role 10)
if ! zfs list rpool/data/app >/dev/null 2>&1; then
  log "ERROR: rpool/data/app not found — did 10-zfs-datasets.sh run?"
  exit 1
fi

# Require monitoring agent (role 30)
if ! systemctl is-active --quiet node_exporter; then
  log "WARNING: node_exporter not running — monitoring will be incomplete"
fi

# All dependencies satisfied — proceed
log "Dependencies verified, configuring application"

Each role checks its prerequisites before proceeding. This catches misconfigurations early — you know immediately if a role was missing from the pipeline instead of finding out an hour later when the application fails to start.

The composed postinstaller pattern is the infrastructure equivalent of Unix philosophy: each script does one thing well, and they compose through a simple interface (source in order, check dependencies). The numbering scheme is borrowed from SysV init scripts and systemd wants/requires ordering. It works because the contract is simple: each script expects certain state to exist (packages installed, datasets created) and produces certain state (services running, configs written). The dispatcher is dumb. The roles are smart. The result is modular infrastructure that you can mix and match.

Testing postinstallers — KVM before hardware

Never deploy a postinstaller to hardware without testing it in a VM first. kldload's KVM deployment path gives you a fast feedback loop: build the ISO, boot it in a VM, watch the postinstaller run, verify the result, iterate.

The test loop

#!/bin/bash
# test-postinstaller.sh — build ISO and test in KVM
# Run from the kldload-free repo root

# 1. Place your postinstaller
cp my-postinstall.sh \
   live-build/config/includes.chroot/root/darksite/postinstall.sh
chmod +x live-build/config/includes.chroot/root/darksite/postinstall.sh

# 2. Build the ISO (incremental — only rebuilds what changed)
PROFILE=server ./deploy.sh build

# 3. Deploy to KVM
./deploy.sh kvm-deploy

# 4. Watch the install (connects to VM serial console)
virsh console kldload-test

# 5. After install completes, SSH in and verify
ssh root@<vm-ip>

# 6. Check postinstall log
cat /var/log/kldload/postinstall.log

# 7. Verify services
systemctl status nginx postgresql docker

# 8. When done testing, destroy and try again
virsh destroy kldload-test
virsh undefine kldload-test --nvram

Build, boot, verify, destroy, repeat. Each cycle takes a few minutes. By the time you burn the USB, you know the postinstaller works.

Using ZFS snapshots for rapid iteration

# After the base install finishes but BEFORE the postinstaller runs,
# snapshot the base state. Then you can revert and re-run the
# postinstaller without reinstalling from scratch.

# On the test VM, after first boot:
zfs snapshot rpool/ROOT/kldload@pre-postinstall

# Run your postinstaller manually
bash /root/darksite/postinstall.sh

# Something broke? Rollback and try again
zfs rollback rpool/ROOT/kldload@pre-postinstall
# Edit the script, re-run. No reinstall needed.

# Works? Snapshot the result
zfs snapshot rpool/ROOT/kldload@post-install
# This is your golden image. Test it for a few days.
# If anything is wrong, rollback to @pre-postinstall and fix the script.

This is the fastest way to iterate on postinstallers. The ZFS rollback takes less than a second regardless of how much the postinstaller changed. No reinstall, no rebuild, no waiting. Just rewind and try again.

Verifying the postinstaller programmatically

#!/bin/bash
# 90-verification.sh — run after all roles to verify the system
set -euo pipefail

log() { printf '[%(%F %T)T] [verify] %s\n' -1 "$*"; }
ERRORS=0

# Check required services
for svc in nginx postgresql docker node_exporter; do
  if systemctl is-active --quiet "${svc}" 2>/dev/null; then
    log "OK: ${svc} is running"
  else
    log "FAIL: ${svc} is NOT running"
    ((ERRORS++))
  fi
done

# Check required ZFS datasets
for ds in rpool/data/www rpool/data/postgres rpool/data/docker; do
  if zfs list "${ds}" >/dev/null 2>&1; then
    log "OK: ${ds} exists"
  else
    log "FAIL: ${ds} does not exist"
    ((ERRORS++))
  fi
done

# Check required ports are listening
for port in 22 80 443 5432 9090 9100; do
  if ss -tlnp | grep -q ":${port} "; then
    log "OK: port ${port} is listening"
  else
    log "FAIL: port ${port} is NOT listening"
    ((ERRORS++))
  fi
done

# Check ZFS pool health
POOL_STATE=$(zpool status -x 2>&1)
if [[ "${POOL_STATE}" == "all pools are healthy" ]]; then
  log "OK: ZFS pool healthy"
else
  log "FAIL: ZFS pool issue: ${POOL_STATE}"
  ((ERRORS++))
fi

# Summary
if (( ERRORS == 0 )); then
  log "ALL CHECKS PASSED"
else
  log "FAILED: ${ERRORS} check(s) failed"
  exit 1
fi

Put this in your role pipeline as the last step. If any check fails, the postinstaller exits with an error code and the log tells you exactly what is wrong. Automate the verification so you never deploy a broken system.

The ZFS rollback test loop is the single biggest productivity gain for postinstaller development. Without it, every failed test means a full reinstall — partition the disk, install the base, wait for DKMS, wait for packages. With ZFS snapshots, a failed postinstaller costs you one second of rollback time. You can iterate 50 times in the time a single reinstall takes. This is why ZFS on root matters even during development — the development workflow itself is faster.

Distributing postinstallers

Once you have a working postinstaller, you need to get it into the ISO. There are several approaches depending on your workflow and how many variants you maintain.

Method 1: Bake into the ISO (recommended)

The simplest and most reliable approach. Drop your script into the ISO build tree and it ships with every ISO you build.

# Place the postinstaller in the ISO build tree
cp postinstall.sh \
   live-build/config/includes.chroot/root/darksite/postinstall.sh
chmod +x live-build/config/includes.chroot/root/darksite/postinstall.sh

# Include supporting files (configs, scripts, data)
mkdir -p live-build/config/includes.chroot/root/darksite/roles
cp roles/*.sh live-build/config/includes.chroot/root/darksite/roles/
cp -r config/ live-build/config/includes.chroot/root/darksite/config/

# Build the ISO — the postinstaller and all supporting files
# are embedded in the squashfs and end up at /root/darksite/ on the target
PROFILE=server ./deploy.sh build

Everything under live-build/config/includes.chroot/ is mirrored into the ISO's root filesystem. When kldload installs the base system, it copies /root/darksite/ from the live environment to the target. Your postinstaller and all its supporting files arrive on the target system automatically.

Method 2: Git repository (for teams)

# Maintain postinstallers in a separate git repo
# Clone into the build tree at build time

# In your CI pipeline or build script:
git clone https://gitlab.internal/infra/postinstallers.git \
  live-build/config/includes.chroot/root/darksite/postinstallers

# The main postinstall.sh dispatches to the cloned repo
cat > live-build/config/includes.chroot/root/darksite/postinstall.sh <<'DISPATCH'
#!/bin/bash
set -euo pipefail
ROLE="${KLDLOAD_HOSTNAME%%-*}"  # web-01 → web, db-01 → db
SCRIPT="/root/darksite/postinstallers/${ROLE}/postinstall.sh"
if [[ -x "${SCRIPT}" ]]; then
  exec "${SCRIPT}"
else
  echo "No postinstaller found for role: ${ROLE}" >&2
  exit 1
fi
DISPATCH

PROFILE=server ./deploy.sh build

This pattern lets your infrastructure team maintain postinstallers in version control while the ISO build pulls the latest version at build time. The hostname convention (web-01, db-01, k8s-cp-01) determines which postinstaller runs.

Method 3: Answers file embedding

# For unattended installs, the answers file can specify a postinstaller URL
# The installer downloads it before the first boot

# answers.env
KLDLOAD_DISTRO=debian
KLDLOAD_PROFILE=server
KLDLOAD_HOSTNAME=web-prod-01
KLDLOAD_DISK=/dev/vda

# Postinstaller — downloaded during install and placed at /root/darksite/
KLDLOAD_POSTINSTALL_URL=http://10.0.0.1:8080/postinstallers/web-server.sh

# Or embed the script directly (base64-encoded)
KLDLOAD_POSTINSTALL_B64="$(base64 -w0 < postinstall.sh)"

Useful for environments where you want one ISO image but different postinstallers per machine. The answers file is per-machine; the ISO is shared.

Method 4: Darksite HTTP server

# Host postinstallers on the darksite HTTP server (port 3142/3143)
# Place scripts in the darksite directory before building

mkdir -p live-build/config/includes.chroot/root/darksite/postinstallers/
cp web-server.sh   live-build/config/includes.chroot/root/darksite/postinstallers/
cp database.sh     live-build/config/includes.chroot/root/darksite/postinstallers/
cp k8s-node.sh     live-build/config/includes.chroot/root/darksite/postinstallers/
cp monitoring.sh   live-build/config/includes.chroot/root/darksite/postinstallers/

# During install, the live ISO serves these via HTTP
# Target systems can curl from the live ISO at install time:
#   curl http://10.0.0.1:3142/postinstallers/web-server.sh

# Useful for multi-node deploys where each node picks its own role

Method 1 (bake into ISO) is the right choice for 90% of deployments. It is self-contained, offline-capable, and deterministic. Methods 2-4 add flexibility but also add dependencies (git server, HTTP server, network connectivity). The darksite philosophy is: if it is not in the image, it does not exist. Every external dependency is a failure mode. That said, for teams managing 50+ machine types, the git repo approach (method 2) scales better than maintaining 50 separate ISO builds. Pick the right tool for your scale.

Debugging postinstallers

Postinstallers fail. Networks are unreliable, packages change names, configs have typos, services refuse to start. The difference between a 3am debugging nightmare and a 5-minute fix is the quality of your logging and the knowledge of where to look.

Log locations

# Primary postinstaller log (your script's output)
/var/log/kldload/postinstall.log

# kldload firstboot service log (the wrapper that runs your script)
/var/log/kldload/firstboot.log

# kldload installer log (base install — before postinstaller)
/var/log/installer/kldload-installer.log
/var/log/installer/bootstrap.log

# systemd journal for the firstboot service
journalctl -u kldload-firstboot.service --no-pager

# Individual service logs (when a service fails to start)
journalctl -u nginx --no-pager -n 50
journalctl -u postgresql --no-pager -n 50
journalctl -u docker --no-pager -n 50

# Install manifest (the environment your postinstaller ran in)
cat /etc/kldload/install-manifest.env

# ZFS pool status (if storage-related issues)
zpool status
zfs list

Common failures and fixes

Package not found: The darksite does not have the package you requested. Either add it to the package set files in build/darksite/config/package-sets/ and rebuild, or ensure network access for online installs.
Service fails to start: Check journalctl -u servicename. Usually a config syntax error. Run nginx -t or postgresql --check before enabling.
Permission denied: ZFS datasets mount at /data/datasetname but are owned by root. chown to the service user after creation.
Network not available: The postinstaller runs on first boot. DHCP might not have completed yet. Use the retry wrapper and wait for network.
Script exits silently: You forgot set -euo pipefail or the ERR trap. A command failed but the script continued. Always use strict mode.
Wrong distro commands: Used dnf on a Debian install. Always check KLDLOAD_DISTRO and branch to the right package manager.
ZFS dataset already exists: The postinstaller ran twice (reboot during execution). Use zfs create with a guard: zfs list rpool/data/app 2>/dev/null || zfs create rpool/data/app.
Disk space: The darksite payload uses space on the root dataset. If your ISO is large, the root dataset might be tight. Create separate datasets early and put data there.

Interactive debugging on a failed install

# The system booted but the postinstaller failed partway through.
# SSH in (or use the console) and debug interactively.

# 1. Check what failed
cat /var/log/kldload/postinstall.log | tail -30
# Look for "FAILED at line XX" — that is the exact failure point

# 2. Check the manifest to understand the environment
cat /etc/kldload/install-manifest.env

# 3. Check ZFS state
zfs list
zpool status

# 4. Snapshot the current (broken) state so you can come back to it
zfs snapshot rpool/ROOT/kldload@debug-$(date +%H%M)

# 5. Fix the postinstaller script
vim /root/darksite/postinstall.sh

# 6. Rollback to pre-postinstall state (if you have the snapshot)
zfs rollback rpool/ROOT/kldload@pre-postinstall

# 7. Re-run the fixed script
bash /root/darksite/postinstall.sh

# 8. If it works, update the ISO build tree with the fixed script
#    and rebuild the ISO for future deploys

The nuclear option: manual chroot debugging

# If the system will not boot at all after install, boot the ISO again
# and manually import the ZFS pool to inspect the target system.

# 1. Boot the kldload ISO (live environment)

# 2. Import the target pool
zpool import -f rpool

# 3. Mount the root dataset
zfs mount rpool/ROOT/kldload

# 4. Check logs from the failed install
cat /target/var/log/kldload/postinstall.log
cat /target/var/log/kldload/firstboot.log

# 5. Chroot in to fix things manually
mount --bind /dev  /target/dev
mount --bind /proc /target/proc
mount --bind /sys  /target/sys
chroot /target /bin/bash

# Now you are "inside" the broken system
# Fix configs, reinstall packages, whatever is needed

# 6. Exit chroot and clean up
exit
umount /target/sys /target/proc /target/dev
zpool export rpool

# 7. Remove the ISO and reboot from disk

This is the last resort. It works because ZFS pools are self-describing — you can import them on any system with ZFS, inspect the filesystem, and fix things. The live ISO always has ZFS, so you always have a recovery environment.

The debugging story is why ZFS on root matters for postinstallers. On ext4, a failed postinstaller means reinstalling from scratch. On ZFS, you snapshot before the postinstaller, rollback if it fails, fix the script, and re-run. The entire debug cycle takes seconds instead of minutes. The nuclear option (booting the ISO and importing the pool) means you are never locked out of a broken system. The pool is portable. The data is always accessible. This is the safety net that makes it sane to ship postinstallers to production hardware.

The darksite pattern — baking everything in

What is a darksite?

A "darksite" is an air-gapped deployment — no internet, no upstream repos, no cloud APIs. Everything the system needs must be baked into the ISO or carried on the USB drive. This includes:

APT/DNF packages — a complete local repository snapshot
Container images — OCI tarballs loaded into containerd/Docker on first boot
Ansible playbooks — the entire orchestration tree
Helm charts — bundled for offline Kubernetes deployments
TLS certificates — pre-generated PKI for etcd, API server, etc.
WireGuard keys — hub keypairs for mesh networking
Configuration files — per-node or per-role configs baked in

A darksite ISO is a shipping container. Everything needed for the destination is packed inside. Nothing is downloaded at deploy time. Nothing phones home.

The darksite pattern comes from classified environments where network access is physically impossible. Ships, air-gapped facilities, SCIF rooms, factory floors. The concept is old — people have been building self-contained install media since the BBS era. What is new is doing it with modern infrastructure: Kubernetes, containers, WireGuard mesh, PKI certificates, Helm charts — all offline, all baked in, all verified at build time. kldload makes the same technique accessible to anyone who wants a deployment that does not depend on the internet being up.

Baking container images into the ISO

# At build time: pull and save container images as tarballs
mkdir -p live-build/config/includes.chroot/root/darksite/images

# Save each image as an OCI tarball
docker pull nginx:1.25-alpine
docker save nginx:1.25-alpine -o \
  live-build/config/includes.chroot/root/darksite/images/nginx-1.25-alpine.tar

docker pull postgres:16-alpine
docker save postgres:16-alpine -o \
  live-build/config/includes.chroot/root/darksite/images/postgres-16-alpine.tar

docker pull grafana/grafana:10.4.1
docker save grafana/grafana:10.4.1 -o \
  live-build/config/includes.chroot/root/darksite/images/grafana-10.4.1.tar

# In the postinstaller: load the pre-saved images
for img in /root/darksite/images/*.tar; do
  docker load -i "${img}"
  log "Loaded container image: $(basename "${img}")"
done

# Now docker run nginx:1.25-alpine works — no pull needed

This is how you run containers offline. Every image is pre-pulled at build time, saved as a tarball, embedded in the ISO, and loaded on first boot. The container runtime never contacts a registry. The images are verified at build time and identical at deploy time.

Payload directory structure

live-build/config/includes.chroot/root/darksite/
├── postinstall.sh              # Entry point
├── roles/                      # Composed role scripts
│   ├── 00-base-hardening.sh
│   ├── 10-zfs-datasets.sh
│   ├── 20-docker.sh
│   └── 40-application.sh
├── config/                     # Pre-built configuration files
│   ├── nginx.conf
│   ├── postgresql.conf
│   └── nftables.conf
├── images/                     # Pre-pulled container images (OCI tarballs)
│   ├── nginx-1.25-alpine.tar
│   └── postgres-16-alpine.tar
├── certs/                      # Pre-generated TLS certificates
│   ├── ca.crt
│   ├── server.crt
│   └── server.key
├── keys/                       # WireGuard keys, SSH keys
│   ├── wg-hub.key
│   └── deploy.pub
└── helm/                       # Helm charts for K8s deployments
    ├── ingress-nginx-4.10.0.tgz
    └── prometheus-25.11.0.tgz

Advanced patterns

The two-poweroff pattern

Why the system powers off twice

Boot 1: ISO installer
  +-- kldload installs base OS to disk
  +-- Darksite payload copied to target
  +-- kldload-firstboot.service enabled
  +-- REBOOT (installer done, boots from disk)

Boot 2: First boot from disk
  +-- kldload-firstboot.service runs
  +-- Reads install manifest
  +-- Runs /root/darksite/postinstall.sh
  +-- System configured, services started
  +-- (Optional) POWEROFF for golden image snapshot

Boot 3+: Production
  +-- Normal boot, all services running
  +-- firstboot does not run again

This separation is deliberate. The first boot proves the base install worked. The postinstaller proves the customization worked. Each phase is independently verifiable. If any phase fails, you know exactly where.

Assembly line: Station 1 builds the frame. Station 2 installs the engine. Station 3 starts the car. Each station signs off before the next one starts.

The two-poweroff pattern is a debugging strategy disguised as a deployment pattern. If the machine does not come up after the first boot, the base install is broken — check the installer logs. If the postinstaller fails, check the postinstall logs. If it comes up on the second boot with everything running, it worked. Each phase is independently verifiable because each phase has its own logs and its own failure modes. No ambiguity about where a failure occurred.

This also means you can snapshot between phases. zfs snapshot rpool@post-base after the first boot. zfs snapshot rpool@post-install after the postinstaller. If the postinstaller breaks something, roll back to @post-base and try again. You do not rebuild from scratch. You rewind to the last good state. This is why ZFS on root matters for deployment — the deployment itself is recoverable.

The golden image pattern

Snapshot, clone, and replicate

Once you have a working system (post-postinstall), snapshot it. That snapshot becomes your golden image. Clone it for every new node. Each clone takes milliseconds and uses zero extra space.

# After postinstall completes, snapshot the golden state
zfs snapshot rpool/ROOT/kldload-node@golden

# Clone for each new node (instant, zero space)
zfs clone rpool/ROOT/kldload-node@golden rpool/ROOT/worker-01
zfs clone rpool/ROOT/kldload-node@golden rpool/ROOT/worker-02
zfs clone rpool/ROOT/kldload-node@golden rpool/ROOT/worker-03

# Or replicate to another machine
zfs send rpool/ROOT/kldload-node@golden | ssh kvm-host zfs recv tank/golden/worker

# Create a ZVOL from the golden image for KVM
zfs send rpool/ROOT/kldload-node@golden | zfs recv rpool/vms/worker-01
# Boot as a VM — instant deployment

Bake one cake. Cut as many slices as you need. Each slice is a running server.

This is the golden image pattern that every cloud provider uses internally. AWS does not install EC2 instances from an ISO. They snapshot a golden AMI and stamp out copies. Google, Azure, Oracle — same thing. The difference: they do it on proprietary storage with proprietary tooling. You are doing it on ZFS with zfs clone. Same pattern. Open source. On your hardware. Each clone is instant and uses zero space until it diverges. You can clone 1,000 nodes from one snapshot and the pool barely notices.

Multi-node cluster deployment

Role-based postinstallers for cluster nodes

For multi-node deployments, build a separate ISO per role or use hostname-based dispatch. Each node runs its role-specific postinstaller.

#!/bin/bash
# postinstall.sh — hostname-based role dispatch
set -euo pipefail

LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall] %s\n' -1 "$*"; }

HOSTNAME="$(hostname -s)"
ROLE_DIR="/root/darksite/roles"

# Dispatch based on hostname prefix
case "${HOSTNAME}" in
  cp-*)     ROLE="control-plane" ;;
  worker-*) ROLE="worker" ;;
  lb-*)     ROLE="loadbalancer" ;;
  mon-*)    ROLE="monitoring" ;;
  db-*)     ROLE="database" ;;
  web-*)    ROLE="webserver" ;;
  *)        ROLE="base" ;;
esac

log "Hostname: ${HOSTNAME} → Role: ${ROLE}"

# Run shared roles first, then role-specific
for script in \
  "${ROLE_DIR}/00-base-hardening.sh" \
  "${ROLE_DIR}/10-zfs-datasets-${ROLE}.sh" \
  "${ROLE_DIR}/30-monitoring-agent.sh" \
  "${ROLE_DIR}/40-${ROLE}.sh" \
  "${ROLE_DIR}/90-verification.sh"; do
  if [[ -f "${script}" ]]; then
    log "Running: $(basename "${script}")"
    source "${script}"
  fi
done

log "Role ${ROLE} deployment complete"

One ISO, many roles. The hostname determines the role. Each node picks up the right set of scripts at boot time. Use this with unattended install to deploy entire clusters from a single ISO image.

The drop-off points

A postinstaller has natural "drop-off points" where you can stop and use the system as-is, or continue adding more layers. Each point is a valid, working system.

Level 0

Base kldload install. ZFS on root, boot environments, tools. You are here when kldload finishes. Everything below is postinstall.sh territory.

Level 1

+ packages. Install your application stack. nginx, postgres, redis, whatever. dnf install in postinstall.sh. Snapshot. Done.

Level 2

+ ZFS datasets. Tuned recordsize, compression, and caching for each workload. This is where the performance difference shows up.

Level 3

+ WireGuard. Encrypted mesh networking. Nodes can talk to each other securely. Hub-and-spoke or full mesh. Enrollment window for adding new nodes.

Level 4

+ Monitoring. Prometheus, Grafana, node_exporter on every node. ZFS-backed metric storage with 90-day retention and daily snapshots.

Level 5

+ Containers. Docker or Podman with ZFS storage driver. Image layers are ZFS snapshots. Container volumes on dedicated datasets.

Level 6

+ Kubernetes. kubeadm cluster with containerd on ZFS, etcd on dedicated datasets, Cilium networking. Full container orchestration.

Level 7

+ Golden images. Snapshot the entire stack. Clone for new nodes. zfs send to replicate across sites. Multi-site, multi-cloud, from one snapshot.

Every level above is just bash scripts and package installs. There is no proprietary orchestrator. No vendor SDK. No magic binary. postinstall.sh is bash. The configs are text files. The services are systemd units. ZFS datasets are one command. You can audit every step. You can modify every step. You can build every step yourself. That is the point.

← kldload + Packer + Terraform Environment Variable Reference →