Custom Postinstallers — turn a bare install into production infrastructure.
A postinstaller is a bash script that runs inside the installed system after kldload finishes the base OS install. It is the bridge between "I have a fresh OS with ZFS on root" and "I have a configured production server." Every customization — packages, services, datasets, users, firewall rules, container images, certificates — lives in the postinstaller. The base install is the platform. The postinstaller is your application layer. They ship together in one ISO.
This guide covers everything: what postinstallers are, how the lifecycle works, how to write robust scripts, 10+ complete working examples for real-world server roles, how to compose and test them, and how to debug when things go wrong. By the end, you will be able to build a single ISO that installs a fully-configured web server, database, Kubernetes node, or anything else — without touching a keyboard after plugging in the USB stick.
This is the most powerful page on the site. Everything else — the ZFS tutorials, the kernel architecture, the security model — is foundation. This is where you build on it. A postinstaller turns kldload from "an installer with nice defaults" into "a deployment platform that produces complete, sealed, offline infrastructure from a USB stick." If you read one page after the basics, read this one.
What postinstallers are
The hook point
kldload installs the base system: kernel, ZFS, bootloader, tools. When it finishes,
it copies everything under live-build/config/includes.chroot/root/darksite/
into the target system at /root/darksite/. If a file called
/root/darksite/postinstall.sh exists on the target, the
kldload-firstboot.service systemd unit runs it on the first real boot.
That script is your entry point. Everything you put in it runs with root privileges
on a freshly-installed, fully-booted system with ZFS on root.
#!/bin/bash
# /root/darksite/postinstall.sh
# Runs after kldload finishes the base install and the system boots for the first time.
# You have: root access, ZFS on root, network (if configured), all base packages.
# You do: whatever you want.
echo "My custom postinstaller is running!"
dnf install -y nginx
systemctl enable --now nginx
echo "<h1>Built by kldload</h1>" > /usr/share/nginx/html/index.html
What a postinstaller is NOT
A postinstaller is not a configuration management tool. It does not run repeatedly. It does not check convergence. It does not connect to an external orchestrator. It runs once, on first boot, from local files. When it finishes, the system is configured. There is no "day 2" step. There is no "run the playbook after deploy." The machine configures itself from its own payload.
If you need ongoing configuration management (Ansible, Salt, Puppet), the postinstaller is where you install and configure those tools. The postinstaller bootstraps the bootstrap. Everything downstream starts from the state it creates.
This is fundamentally different from Ansible/Puppet/Chef. Those tools run after the machine is deployed, over the network, from an external orchestrator. The postinstaller runs during deployment, from local files, with no network dependency. The machine configures itself. When it comes up, it is done. No "day 2" configuration step. No "run the playbook after deploy." It is all in the image.
Think about what this means for reproducibility. An Ansible playbook that runs against a machine depends on the state of every package mirror, every GPG key, every template variable at the moment of execution. A postinstaller that runs from baked-in darksite packages depends on nothing external. The image is the artifact. The artifact is the deployment. The deployment is identical every time.
How postinstallers work — the lifecycle
Understanding the postinstaller lifecycle means understanding exactly when your script runs, what state the system is in, and what resources are available. Here is the complete sequence from ISO boot to production-ready system.
The full install timeline
Phase 1: ISO boots (live CentOS Stream 9 environment)
├── Web UI or answers file selects distro, profile, disk
├── kldload-install-target partitions disk, creates ZFS pool
├── Bootstrap installs base OS into /target (dnf/debootstrap/pacstrap)
├── ZFS DKMS built, ZFSBootMenu installed, bootloader configured
├── Darksite payload copied: /root/darksite/ → /target/root/darksite/
├── kldload-firstboot.service enabled in target
└── System reboots from disk (ISO removed)
Phase 2: First boot from disk
├── systemd starts kldload-firstboot.service (runs once, Type=oneshot)
├── Firstboot reads /etc/kldload/install-manifest.env for KLDLOAD_* vars
├── WireGuard planes configured (if hub.env present)
├── /root/darksite/postinstall.sh executes HERE
├── Firstboot service marks itself complete, disables for future boots
└── System is production-ready
Phase 3: Every subsequent boot
├── Normal systemd boot — firstboot does not run again
├── All services configured by postinstaller are running
└── System is in steady state
The critical insight: your postinstaller runs on a fully booted system, not in a chroot. The ZFS pool is imported. The network is up (if configured). Systemd is running. You can start services, create ZFS datasets, download files, and do anything a logged-in root user could do.
Environment variables available inside postinstall.sh
The firstboot service sources /etc/kldload/install-manifest.env before
running your script. These variables are available:
# Core install parameters
KLDLOAD_DISTRO=debian # centos, debian, ubuntu, fedora, rhel, rocky, arch, alpine
KLDLOAD_PROFILE=server # desktop, server, core
KLDLOAD_HOSTNAME=web-prod-01 # hostname set during install
KLDLOAD_DISK=/dev/vda # disk that was installed to
KLDLOAD_TIMEZONE=America/Toronto
KLDLOAD_LOCALE=en_US.UTF-8
# Network configuration
KLDLOAD_NET_METHOD=dhcp # dhcp or static
KLDLOAD_NET_IP=10.0.0.50/24 # if static
KLDLOAD_NET_GW=10.0.0.1 # if static
KLDLOAD_NET_DNS=10.0.0.1 # if static
# Storage
KLDLOAD_STORAGE_MODE=zfs # always zfs (this is kldload)
KLDLOAD_POOL_NAME=rpool # ZFS pool name
# Infrastructure
KLDLOAD_INFRA_MODE=standalone # standalone or cluster
KLDLOAD_CLUSTER_DOMAIN=infra.local
KLDLOAD_KEEP_DARKSITE=0 # 1 = keep darksite packages on target
# Security
KLDLOAD_SECURE_BOOT=0 # 1 if Secure Boot was detected
KLDLOAD_TPM_PRESENT=0 # 1 if TPM was detected
Use these to write postinstallers that adapt to the install configuration.
A script can check KLDLOAD_DISTRO to use the right package manager,
or check KLDLOAD_PROFILE to skip desktop-only steps on a server install.
The manifest file is the contract between the installer and your postinstaller. It tells your script everything about the environment it is running in: which distro, which profile, which disk, which network config. Your script does not need to discover anything. It reads the manifest and acts accordingly. This is what makes postinstallers portable across distros — the same script can check KLDLOAD_DISTRO and branch to dnf, apt, or pacman as needed.
Understanding the chroot vs. firstboot distinction
Some parts of the kldload install run in a chroot (the installer reaches
into /target to install packages and configure the bootloader). Your postinstaller
runs on first boot — the system has rebooted from disk, systemd is running,
and the ZFS pool is fully imported. This distinction matters:
- Chroot (during install): No systemd. No network. No running services. Package install and file placement only.
- Firstboot (your postinstaller): Full systemd. Network up. ZFS imported. Services can be started. APIs can be called.
This is why postinstallers can do things the base install cannot: start databases, initialize clusters, generate certificates with running services, pull container images, and register with external systems.
Writing a postinstaller — structure and best practices
The skeleton
Every postinstaller should follow this structure. The logging, error handling, and completion signaling are not optional — they are what make debugging possible when a deploy fails at 3am.
#!/bin/bash
# postinstall.sh — [describe what this postinstaller builds]
# Runs on first boot via kldload-firstboot.service
set -euo pipefail
# ── Logging ──────────────────────────────────────────────────────────────
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall] %s\n' -1 "$*"; }
die() { log "FATAL: $*"; exit 1; }
log "Postinstaller starting"
log "Distro: ${KLDLOAD_DISTRO:-unknown} Profile: ${KLDLOAD_PROFILE:-unknown}"
# ── Detect package manager ───────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora) PKG="dnf install -y" ;;
debian|ubuntu) PKG="apt-get install -y" ;;
arch) PKG="pacman -S --noconfirm" ;;
alpine) PKG="apk add" ;;
*) die "Unknown distro: ${KLDLOAD_DISTRO}" ;;
esac
# ── Phase 1: Packages ────────────────────────────────────────────────────
log "Phase 1: Installing packages"
$PKG package1 package2 package3
# ── Phase 2: Configuration ───────────────────────────────────────────────
log "Phase 2: Configuring services"
# ... write config files, create users, set permissions ...
# ── Phase 3: ZFS datasets ────────────────────────────────────────────────
log "Phase 3: Creating ZFS datasets"
# ... create application-specific datasets with tuned properties ...
# ── Phase 4: Enable services ─────────────────────────────────────────────
log "Phase 4: Enabling services"
systemctl enable --now service1 service2
# ── Done ─────────────────────────────────────────────────────────────────
log "Postinstaller complete"
Best practices
- Always use
set -euo pipefail— stop on first error, catch unset variables, catch pipe failures. A postinstaller that silently continues past errors produces machines that look installed but are broken. - Log every phase — timestamps, descriptions, and the output of key commands. When something fails, the log is all you have. Make it good.
- Use the distro-detection pattern — check
KLDLOAD_DISTROand branch. One postinstaller should work across all supported distros if possible. - Create ZFS datasets for application data — do not dump everything into the root dataset. Create
rpool/data/postgres,rpool/data/docker, etc. with tuned recordsize and compression. This is the whole point of ZFS on root. - Make it idempotent — if the postinstaller runs twice (because someone reboots mid-run), it should not break. Check before creating. Use
install -minstead ofcp. Usesystemctl enable(idempotent) notsystemctl startalone. - Pin package versions — if you need PostgreSQL 16, install
postgresql16-server, notpostgresql-server. The darksite has what you baked in. Be explicit. - Never hardcode IPs — read them from the manifest or from
hostname -I. The same ISO might install on different networks.
Error handling patterns
#!/bin/bash
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall] %s\n' -1 "$*"; }
# Trap errors and log context before exiting
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR
# Retry wrapper for network-dependent operations
retry() {
local max_attempts="${1}"; shift
local delay="${1}"; shift
local attempt=1
while true; do
if "$@"; then return 0; fi
if (( attempt >= max_attempts )); then
log "Command failed after ${max_attempts} attempts: $*"
return 1
fi
log "Attempt ${attempt}/${max_attempts} failed, retrying in ${delay}s: $*"
sleep "${delay}"
((attempt++))
done
}
# Wait for network to be available (DHCP might not be instant)
retry 30 2 ping -c1 -W2 1.1.1.1
# Retry package installs (mirrors might be slow)
retry 3 5 dnf install -y nginx
# Check that a service actually started
systemctl enable --now nginx
sleep 2
systemctl is-active --quiet nginx || {
log "nginx failed to start — check journalctl -u nginx"
journalctl -u nginx --no-pager -n 20 >> "${LOGFILE}"
exit 1
}
The ERR trap with LINENO is the single most important debugging technique for postinstallers. When a deploy fails, the log says "FAILED at line 47: dnf install -y nginx" instead of just... silence. You know exactly what command failed and where. Combined with set -euo pipefail, this catches every class of failure: command errors (set -e), unset variables (set -u), and broken pipes (pipefail). The retry wrapper handles the reality that networks are unreliable and package mirrors sometimes hiccup. The service check proves the install actually worked, not just that the package manager exited 0.
Complete postinstaller examples
Each example below is a complete, working postinstaller that you can drop into
live-build/config/includes.chroot/root/darksite/postinstall.sh and build an ISO.
Every script follows the same structure: logging, package detection, ZFS dataset creation with
tuned properties, package installation, configuration, service enablement. Copy the one closest
to your use case and modify it.
1. Web server — nginx + Let's Encrypt + ZFS-optimized config
postinstall-webserver.sh
#!/bin/bash
# Postinstaller: Production web server with nginx, certbot, and ZFS-tuned storage
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:webserver] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR
source /etc/kldload/install-manifest.env 2>/dev/null || true
log "Starting web server postinstaller on ${KLDLOAD_DISTRO:-centos}"
# ── Phase 1: ZFS datasets ────────────────────────────────────────────────
# Web content: small files, high compression, frequent reads
zfs create -o recordsize=16K \
-o compression=zstd \
-o atime=off \
-o primarycache=all \
rpool/data/www
log "Created rpool/data/www (recordsize=16K, zstd, atime=off)"
# Logs: sequential writes, large records, aggressive compression
zfs create -o recordsize=128K \
-o compression=zstd-3 \
-o atime=off \
-o logbias=throughput \
rpool/data/logs/nginx
log "Created rpool/data/logs/nginx (recordsize=128K, throughput-biased)"
# TLS certificates: tiny files, no special tuning needed
zfs create -o recordsize=4K \
-o compression=off \
rpool/data/certs
chmod 700 /data/certs
log "Created rpool/data/certs"
# ── Phase 2: Install packages ────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora)
dnf install -y nginx certbot python3-certbot-nginx logrotate
;;
debian|ubuntu)
apt-get update
apt-get install -y nginx certbot python3-certbot-nginx logrotate
;;
arch)
pacman -S --noconfirm nginx certbot certbot-nginx logrotate
;;
esac
log "Packages installed"
# ── Phase 3: nginx configuration ─────────────────────────────────────────
cat > /etc/nginx/nginx.conf <<'NGINX'
user nginx;
worker_processes auto;
worker_rlimit_nofile 65535;
error_log /data/logs/nginx/error.log warn;
pid /run/nginx.pid;
events {
worker_connections 4096;
multi_accept on;
use epoll;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
# Logging
log_format main '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'$request_time $upstream_response_time';
access_log /data/logs/nginx/access.log main buffer=64k flush=5s;
# Performance
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
keepalive_requests 1000;
types_hash_max_size 2048;
client_max_body_size 64m;
# Compression
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 4;
gzip_types text/plain text/css application/json application/javascript
text/xml application/xml application/xml+rss text/javascript;
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
# Default server
server {
listen 80 default_server;
listen [::]:80 default_server;
server_name _;
root /data/www/default;
index index.html;
location /.well-known/acme-challenge/ {
root /data/www/certbot;
}
}
include /etc/nginx/conf.d/*.conf;
}
NGINX
# Create default site
mkdir -p /data/www/default /data/www/certbot
cat > /data/www/default/index.html <<'HTML'
<!DOCTYPE html>
<html><head><title>kldload web server</title></head>
<body><h1>kldload web server is running</h1>
<p>nginx on ZFS. Configured by postinstaller.</p></body></html>
HTML
log "nginx configured with ZFS-backed document root and logs"
# ── Phase 4: Certbot renewal timer ───────────────────────────────────────
cat > /etc/systemd/system/certbot-renew.timer <<'UNIT'
[Unit]
Description=Certbot renewal timer
[Timer]
OnCalendar=*-*-* 02:00:00
RandomizedDelaySec=3600
Persistent=true
[Install]
WantedBy=timers.target
UNIT
cat > /etc/systemd/system/certbot-renew.service <<'UNIT'
[Unit]
Description=Certbot renewal
[Service]
Type=oneshot
ExecStart=/usr/bin/certbot renew --quiet --deploy-hook "systemctl reload nginx"
UNIT
systemctl daemon-reload
systemctl enable certbot-renew.timer
log "Certbot renewal timer configured"
# ── Phase 5: Firewall ────────────────────────────────────────────────────
cat > /etc/nftables.conf <<'NFT'
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
iif "lo" accept
ct state established,related accept
ct state invalid drop
tcp dport { 22, 80, 443 } accept
icmp type echo-request accept
icmpv6 type { echo-request, nd-neighbor-solicit, nd-router-advert, nd-neighbor-advert } accept
}
chain forward { type filter hook forward priority 0; policy drop; }
chain output { type filter hook output priority 0; policy accept; }
}
NFT
systemctl enable --now nftables
log "Firewall configured (SSH, HTTP, HTTPS)"
# ── Phase 6: Snapshot schedule for web content ──────────────────────────
cat > /etc/cron.d/zfs-www-snapshots <<'CRON'
# Hourly snapshots of web content, keep 48
0 * * * * root zfs snapshot rpool/data/www@auto-$(date +\%Y\%m\%d-\%H\%M) 2>/dev/null
5 * * * * root zfs list -t snapshot -o name -H rpool/data/www | head -n -48 | xargs -r -n1 zfs destroy 2>/dev/null
CRON
log "ZFS snapshot schedule configured for /data/www"
# ── Phase 7: Start nginx ─────────────────────────────────────────────────
nginx -t || die "nginx config test failed"
systemctl enable --now nginx
log "nginx started and enabled"
log "Web server postinstaller complete"
2. Database server — PostgreSQL on ZFS with tuned recordsize
postinstall-postgres.sh
#!/bin/bash
# Postinstaller: PostgreSQL 16 on ZFS with production tuning
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:postgres] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR
source /etc/kldload/install-manifest.env 2>/dev/null || true
log "Starting PostgreSQL postinstaller on ${KLDLOAD_DISTRO:-centos}"
# ── Phase 1: ZFS datasets ────────────────────────────────────────────────
# PostgreSQL data: 8K recordsize matches PG page size exactly
# This is THE most important tuning for PG on ZFS — mismatched
# recordsize causes write amplification that kills performance
zfs create -o recordsize=8K \
-o compression=lz4 \
-o atime=off \
-o primarycache=all \
-o logbias=latency \
-o redundant_metadata=most \
rpool/data/postgres
log "Created rpool/data/postgres (recordsize=8K — matches PG page size)"
# WAL: sequential writes, 8K records (WAL segment = 16MB of 8K pages)
zfs create -o recordsize=8K \
-o compression=lz4 \
-o atime=off \
-o logbias=latency \
-o primarycache=metadata \
rpool/data/postgres/wal
log "Created rpool/data/postgres/wal"
# Backups: large sequential reads/writes, max compression
zfs create -o recordsize=1M \
-o compression=zstd-7 \
-o atime=off \
rpool/data/postgres/backups
log "Created rpool/data/postgres/backups"
# ── Phase 2: Install PostgreSQL ──────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky)
dnf install -y postgresql16-server postgresql16-contrib
PGDATA="/data/postgres/data"
PGWAL="/data/postgres/wal"
PGUSER="postgres"
;;
fedora)
dnf install -y postgresql-server postgresql-contrib
PGDATA="/data/postgres/data"
PGWAL="/data/postgres/wal"
PGUSER="postgres"
;;
debian|ubuntu)
apt-get update
apt-get install -y postgresql-16 postgresql-contrib-16
PGDATA="/data/postgres/data"
PGWAL="/data/postgres/wal"
PGUSER="postgres"
;;
arch)
pacman -S --noconfirm postgresql
PGDATA="/data/postgres/data"
PGWAL="/data/postgres/wal"
PGUSER="postgres"
;;
esac
log "PostgreSQL installed"
# ── Phase 3: Initialize database ─────────────────────────────────────────
chown -R "${PGUSER}:${PGUSER}" /data/postgres
chmod 700 /data/postgres/data /data/postgres/wal
sudo -u "${PGUSER}" initdb \
--pgdata="${PGDATA}" \
--waldir="${PGWAL}" \
--encoding=UTF8 \
--locale=en_US.UTF-8 \
--auth-local=peer \
--auth-host=scram-sha-256
log "Database initialized at ${PGDATA}, WAL at ${PGWAL}"
# ── Phase 4: Production tuning ───────────────────────────────────────────
# Calculate shared_buffers as 25% of RAM (standard PG recommendation)
TOTAL_RAM_KB=$(awk '/MemTotal/ {print $2}' /proc/meminfo)
SHARED_BUFFERS=$((TOTAL_RAM_KB / 4))KB
EFFECTIVE_CACHE=$((TOTAL_RAM_KB * 3 / 4))KB
WORK_MEM=$((TOTAL_RAM_KB / 256))KB
cat >> "${PGDATA}/postgresql.conf" <<PGCONF
# ── kldload postinstaller tuning ─────────────────────────────────────
# Memory (auto-calculated from ${TOTAL_RAM_KB}KB total RAM)
shared_buffers = ${SHARED_BUFFERS}
effective_cache_size = ${EFFECTIVE_CACHE}
work_mem = ${WORK_MEM}
maintenance_work_mem = $((TOTAL_RAM_KB / 16))KB
# ZFS-specific: disable PG checksums (ZFS does checksumming)
# ZFS-specific: disable full_page_writes (ZFS is CoW — no torn pages)
full_page_writes = off
# WAL
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1GB
archive_mode = off
# Connections
listen_addresses = '*'
max_connections = 200
# Logging
log_destination = 'stderr'
logging_collector = on
log_directory = '/data/postgres/data/log'
log_filename = 'postgresql-%Y-%m-%d.log'
log_min_duration_statement = 1000
log_checkpoints = on
log_lock_waits = on
# Performance
random_page_cost = 1.1
effective_io_concurrency = 200
PGCONF
# Allow remote connections
echo "host all all 0.0.0.0/0 scram-sha-256" >> "${PGDATA}/pg_hba.conf"
log "PostgreSQL tuned for ZFS (full_page_writes=off, recordsize=8K)"
# ── Phase 5: Override systemd unit to use our PGDATA ────────────────────
mkdir -p /etc/systemd/system/postgresql.service.d
cat > /etc/systemd/system/postgresql.service.d/override.conf <<OVERRIDE
[Service]
Environment=PGDATA=${PGDATA}
OVERRIDE
systemctl daemon-reload
systemctl enable --now postgresql
log "PostgreSQL started on ZFS-backed storage"
# ── Phase 6: Backup cron ─────────────────────────────────────────────────
cat > /etc/cron.d/pg-backup <<'CRON'
# Nightly ZFS snapshot + pg_dump
0 2 * * * postgres pg_dumpall | gzip > /data/postgres/backups/full-$(date +\%Y\%m\%d).sql.gz 2>/dev/null
5 2 * * * root zfs snapshot rpool/data/postgres@nightly-$(date +\%Y\%m\%d) 2>/dev/null
# Keep 30 days of snapshots
10 2 * * * root zfs list -t snapshot -o name -H rpool/data/postgres | grep nightly | head -n -30 | xargs -r -n1 zfs destroy 2>/dev/null
CRON
log "Backup schedule configured (nightly pg_dump + ZFS snapshot)"
log "PostgreSQL postinstaller complete"
The recordsize=8K setting is not a suggestion. PostgreSQL writes 8KB pages. If ZFS uses its default 128KB recordsize, every 8KB PG write triggers a 128KB ZFS write — that is 16x write amplification. On SSDs this burns write endurance. On spinning disks this kills IOPS. Set recordsize=8K and write amplification drops to 1x. The full_page_writes=off setting is the other critical ZFS optimization: PostgreSQL writes full pages after a checkpoint to protect against torn pages on crash. ZFS is copy-on-write and checksummed — torn pages are physically impossible. Disabling full_page_writes saves ~30% write volume.
3. Docker host — Docker + Podman with ZFS storage driver
postinstall-docker.sh
#!/bin/bash
# Postinstaller: Docker + Podman with ZFS storage driver
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:docker] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR
source /etc/kldload/install-manifest.env 2>/dev/null || true
log "Starting Docker/Podman postinstaller on ${KLDLOAD_DISTRO:-centos}"
# ── Phase 1: ZFS datasets for container storage ─────────────────────────
# Docker uses ZFS snapshots for image layers — each layer is a snapshot
# This is the most space-efficient storage driver for Docker
zfs create -o recordsize=128K \
-o compression=zstd \
-o atime=off \
rpool/data/docker
log "Created rpool/data/docker"
# Separate dataset for container volumes (user data)
zfs create -o recordsize=128K \
-o compression=zstd \
-o atime=off \
rpool/data/docker-volumes
log "Created rpool/data/docker-volumes"
# Podman rootless storage
zfs create -o recordsize=128K \
-o compression=zstd \
-o atime=off \
rpool/data/containers
log "Created rpool/data/containers"
# ── Phase 2: Install Docker + Podman ─────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora)
dnf config-manager --add-repo \
https://download.docker.com/linux/centos/docker-ce.repo 2>/dev/null || true
dnf install -y docker-ce docker-ce-cli containerd.io \
docker-buildx-plugin docker-compose-plugin \
podman podman-compose
;;
debian|ubuntu)
apt-get update
apt-get install -y docker.io docker-compose podman podman-compose
;;
arch)
pacman -S --noconfirm docker docker-compose podman podman-compose
;;
esac
log "Docker and Podman installed"
# ── Phase 3: Configure Docker to use ZFS storage driver ──────────────────
mkdir -p /etc/docker
cat > /etc/docker/daemon.json <<'JSON'
{
"storage-driver": "zfs",
"data-root": "/data/docker",
"log-driver": "json-file",
"log-opts": {
"max-size": "50m",
"max-file": "5"
},
"default-ulimits": {
"nofile": { "Name": "nofile", "Hard": 65535, "Soft": 65535 }
},
"live-restore": true,
"userland-proxy": false,
"default-address-pools": [
{ "base": "172.17.0.0/12", "size": 24 }
]
}
JSON
log "Docker configured with ZFS storage driver at /data/docker"
# ── Phase 4: Configure Podman for ZFS ────────────────────────────────────
mkdir -p /etc/containers
cat > /etc/containers/storage.conf <<'CONF'
[storage]
driver = "zfs"
graphroot = "/data/containers"
[storage.options.zfs]
fsname = "rpool/data/containers"
CONF
log "Podman configured with ZFS storage driver"
# ── Phase 5: Enable Docker, add deploy user ──────────────────────────────
systemctl enable --now docker
# Create a non-root deploy user with docker access
useradd -m -s /bin/bash -G docker deploy 2>/dev/null || true
log "Docker started, deploy user created"
# ── Phase 6: Prune timer ─────────────────────────────────────────────────
cat > /etc/systemd/system/docker-prune.timer <<'UNIT'
[Unit]
Description=Weekly Docker prune
[Timer]
OnCalendar=Sun *-*-* 03:00:00
Persistent=true
[Install]
WantedBy=timers.target
UNIT
cat > /etc/systemd/system/docker-prune.service <<'UNIT'
[Unit]
Description=Docker system prune
[Service]
Type=oneshot
ExecStart=/usr/bin/docker system prune -af --volumes --filter "until=168h"
UNIT
systemctl daemon-reload
systemctl enable docker-prune.timer
log "Weekly Docker prune timer enabled"
log "Docker/Podman postinstaller complete"
4. Kubernetes node — kubeadm + containerd + ZFS
postinstall-k8s.sh
#!/bin/bash
# Postinstaller: Kubernetes node (control plane or worker)
# Role is determined by K8S_ROLE variable (defaults to worker)
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:k8s] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR
source /etc/kldload/install-manifest.env 2>/dev/null || true
K8S_ROLE="${K8S_ROLE:-worker}"
K8S_VERSION="${K8S_VERSION:-1.30}"
log "Starting Kubernetes ${K8S_ROLE} postinstaller"
# ── Phase 1: Kernel prerequisites ────────────────────────────────────────
cat > /etc/modules-load.d/k8s.conf <<'EOF'
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
cat > /etc/sysctl.d/99-k8s.conf <<'EOF'
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
net.ipv4.conf.all.forwarding = 1
EOF
sysctl --system >/dev/null
log "Kernel modules and sysctl configured"
# ── Phase 2: ZFS datasets ────────────────────────────────────────────────
zfs create -o recordsize=128K \
-o compression=zstd \
-o atime=off \
rpool/data/containerd
zfs create -o recordsize=128K \
-o compression=zstd \
-o atime=off \
rpool/data/kubelet
log "ZFS datasets created for containerd and kubelet"
# ── Phase 3: Install containerd ──────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora)
dnf install -y containerd.io
;;
debian|ubuntu)
apt-get update
apt-get install -y containerd
;;
esac
# Generate default config and enable SystemdCgroup
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
# Point containerd at ZFS-backed storage
sed -i "s|root = .*|root = \"/data/containerd\"|" /etc/containerd/config.toml
systemctl enable --now containerd
log "containerd installed and configured with SystemdCgroup"
# ── Phase 4: Install kubeadm, kubelet, kubectl ──────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora)
cat > /etc/yum.repos.d/kubernetes.repo <<REPO
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/rpm/repodata/repomd.xml.key
REPO
dnf install -y kubelet kubeadm kubectl
;;
debian|ubuntu)
apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL "https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/Release.key" \
| gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/ /" \
> /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet kubeadm kubectl
;;
esac
# Point kubelet at ZFS-backed directory
mkdir -p /etc/systemd/system/kubelet.service.d
cat > /etc/systemd/system/kubelet.service.d/override.conf <<'OVERRIDE'
[Service]
Environment="KUBELET_EXTRA_ARGS=--root-dir=/data/kubelet"
OVERRIDE
systemctl daemon-reload
systemctl enable kubelet
log "kubeadm, kubelet, kubectl installed (v${K8S_VERSION})"
# ── Phase 5: Disable swap (Kubernetes requirement) ──────────────────────
swapoff -a 2>/dev/null || true
sed -i '/swap/d' /etc/fstab 2>/dev/null || true
log "Swap disabled"
# ── Phase 6: Initialize or join ──────────────────────────────────────────
if [[ "${K8S_ROLE}" == "control-plane" ]]; then
log "Control plane node — run 'kubeadm init' manually or via automation"
log "Suggested: kubeadm init --pod-network-cidr=10.244.0.0/16"
else
log "Worker node — waiting for join command from control plane"
log "Place join command in /root/darksite/k8s-join.sh and it will execute"
if [[ -x /root/darksite/k8s-join.sh ]]; then
log "Found k8s-join.sh — executing"
bash /root/darksite/k8s-join.sh
log "Joined cluster"
fi
fi
log "Kubernetes ${K8S_ROLE} postinstaller complete"
5. Monitoring stack — Prometheus + Grafana + node_exporter
postinstall-monitoring.sh
#!/bin/bash
# Postinstaller: Prometheus + Grafana + node_exporter monitoring stack
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:monitoring] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR
source /etc/kldload/install-manifest.env 2>/dev/null || true
PROM_VERSION="2.51.0"
GRAFANA_VERSION="10.4.1"
log "Starting monitoring stack postinstaller"
# ── Phase 1: ZFS datasets ────────────────────────────────────────────────
# Prometheus TSDB: 128K recordsize for chunk files, high compression
zfs create -o recordsize=128K \
-o compression=zstd \
-o atime=off \
-o primarycache=all \
rpool/data/prometheus
log "Created rpool/data/prometheus"
# Grafana: small DB files, default tuning is fine
zfs create -o recordsize=16K \
-o compression=zstd \
-o atime=off \
rpool/data/grafana
log "Created rpool/data/grafana"
# ── Phase 2: Create service users ────────────────────────────────────────
useradd -r -s /sbin/nologin -d /data/prometheus prometheus 2>/dev/null || true
useradd -r -s /sbin/nologin -d /data/grafana grafana 2>/dev/null || true
# ── Phase 3: Install node_exporter ───────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora)
dnf install -y golang-github-prometheus-node-exporter
;;
debian|ubuntu)
apt-get update
apt-get install -y prometheus-node-exporter
;;
*)
# Manual install for other distros
curl -fsSL "https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz" \
| tar xz -C /usr/local/bin --strip-components=1 --wildcards '*/node_exporter'
;;
esac
systemctl enable --now node_exporter 2>/dev/null || {
# Create unit if distro package did not
cat > /etc/systemd/system/node_exporter.service <<'UNIT'
[Unit]
Description=Prometheus Node Exporter
After=network-online.target
[Service]
Type=simple
User=nobody
ExecStart=/usr/local/bin/node_exporter
Restart=always
[Install]
WantedBy=multi-user.target
UNIT
systemctl daemon-reload
systemctl enable --now node_exporter
}
log "node_exporter installed and running on :9100"
# ── Phase 4: Install Prometheus ──────────────────────────────────────────
cd /tmp
curl -fsSL "https://github.com/prometheus/prometheus/releases/download/v${PROM_VERSION}/prometheus-${PROM_VERSION}.linux-amd64.tar.gz" \
| tar xz
install -m 0755 "prometheus-${PROM_VERSION}.linux-amd64/prometheus" /usr/local/bin/
install -m 0755 "prometheus-${PROM_VERSION}.linux-amd64/promtool" /usr/local/bin/
rm -rf "prometheus-${PROM_VERSION}.linux-amd64"
mkdir -p /etc/prometheus
cat > /etc/prometheus/prometheus.yml <<'YAML'
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']
# Add more targets here:
# - job_name: 'postgres'
# static_configs:
# - targets: ['db-01:9187']
#
# - job_name: 'nginx'
# static_configs:
# - targets: ['web-01:9113']
YAML
chown -R prometheus:prometheus /data/prometheus /etc/prometheus
cat > /etc/systemd/system/prometheus.service <<'UNIT'
[Unit]
Description=Prometheus
After=network-online.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/data/prometheus \
--storage.tsdb.retention.time=90d \
--web.enable-lifecycle
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
UNIT
systemctl daemon-reload
systemctl enable --now prometheus
log "Prometheus installed (90d retention on ZFS, port 9090)"
# ── Phase 5: Install Grafana ─────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora)
cat > /etc/yum.repos.d/grafana.repo <<'REPO'
[grafana]
name=grafana
baseurl=https://rpm.grafana.com
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://rpm.grafana.com/gpg.key
REPO
dnf install -y grafana
;;
debian|ubuntu)
apt-get install -y apt-transport-https software-properties-common
curl -fsSL https://apt.grafana.com/gpg.key | gpg --dearmor -o /etc/apt/keyrings/grafana.gpg
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" \
> /etc/apt/sources.list.d/grafana.list
apt-get update
apt-get install -y grafana
;;
esac
# Point Grafana data at ZFS dataset
sed -i "s|;data = .*|data = /data/grafana|" /etc/grafana/grafana.ini 2>/dev/null || true
chown -R grafana:grafana /data/grafana
# Auto-provision Prometheus as a data source
mkdir -p /etc/grafana/provisioning/datasources
cat > /etc/grafana/provisioning/datasources/prometheus.yaml <<'YAML'
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://localhost:9090
isDefault: true
YAML
systemctl enable --now grafana-server
log "Grafana installed and running on :3000 (default: admin/admin)"
# ── Phase 6: ZFS snapshot schedule for metrics data ─────────────────────
cat > /etc/cron.d/zfs-monitoring-snapshots <<'CRON'
# Daily snapshots of Prometheus and Grafana data
0 4 * * * root zfs snapshot rpool/data/prometheus@daily-$(date +\%Y\%m\%d) 2>/dev/null
0 4 * * * root zfs snapshot rpool/data/grafana@daily-$(date +\%Y\%m\%d) 2>/dev/null
# Keep 30 days
5 4 * * * root for ds in prometheus grafana; do zfs list -t snapshot -o name -H rpool/data/$ds | grep daily | head -n -30 | xargs -r -n1 zfs destroy; done 2>/dev/null
CRON
log "ZFS snapshot schedule configured for monitoring data"
log "Monitoring stack postinstaller complete"
log " Prometheus: http://$(hostname -I | awk '{print $1}'):9090"
log " Grafana: http://$(hostname -I | awk '{print $1}'):3000"
log " Node Exporter: http://$(hostname -I | awk '{print $1}'):9100/metrics"
6. NFS server — ZFS sharenfs + tuned for NFS workloads
postinstall-nfs.sh
#!/bin/bash
# Postinstaller: NFS file server with ZFS sharenfs
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:nfs] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR
source /etc/kldload/install-manifest.env 2>/dev/null || true
NFS_SUBNET="${NFS_SUBNET:-10.0.0.0/24}"
log "Starting NFS server postinstaller (allowed subnet: ${NFS_SUBNET})"
# ── Phase 1: Install NFS server ──────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora) dnf install -y nfs-utils ;;
debian|ubuntu) apt-get update && apt-get install -y nfs-kernel-server ;;
arch) pacman -S --noconfirm nfs-utils ;;
esac
log "NFS packages installed"
# ── Phase 2: Create ZFS datasets with sharenfs ──────────────────────────
# General file share: large records for bulk transfers
zfs create -o recordsize=1M \
-o compression=zstd \
-o atime=off \
-o sharenfs="rw=@${NFS_SUBNET},sync,no_subtree_check,no_root_squash" \
rpool/data/share
log "Created rpool/data/share (sharenfs enabled for ${NFS_SUBNET})"
# Home directories: mixed I/O pattern
zfs create -o recordsize=128K \
-o compression=zstd \
-o atime=on \
-o sharenfs="rw=@${NFS_SUBNET},sync,no_subtree_check" \
rpool/data/home
log "Created rpool/data/home (sharenfs, with atime for maildir)"
# ISO/image storage: large sequential reads, read-mostly
zfs create -o recordsize=1M \
-o compression=off \
-o atime=off \
-o sharenfs="ro=@${NFS_SUBNET},async,no_subtree_check" \
rpool/data/images
log "Created rpool/data/images (read-only NFS share)"
# ── Phase 3: NFS performance tuning ──────────────────────────────────────
# Increase NFS threads to match ZFS's I/O parallelism
mkdir -p /etc/nfs.conf.d
cat > /etc/nfs.conf.d/local.conf <<'CONF'
[nfsd]
threads = 32
udp = n
tcp = y
vers3 = n
vers4 = y
vers4.1 = y
vers4.2 = y
CONF
log "NFS tuned: 32 threads, NFSv4.2 only"
# ── Phase 4: Enable and start ────────────────────────────────────────────
systemctl enable --now nfs-server
# ZFS sharenfs auto-exports — verify
exportfs -v
log "NFS server running, exports:"
zfs get sharenfs rpool/data/share rpool/data/home rpool/data/images
# ── Phase 5: Snapshot schedule ───────────────────────────────────────────
cat > /etc/cron.d/zfs-nfs-snapshots <<'CRON'
# Hourly snapshots of all NFS shares
0 * * * * root for ds in share home; do zfs snapshot rpool/data/$ds@hourly-$(date +\%Y\%m\%d-\%H00) 2>/dev/null; done
# Daily cleanup: keep 48 hourly snapshots
5 0 * * * root for ds in share home; do zfs list -t snapshot -o name -H rpool/data/$ds | grep hourly | head -n -48 | xargs -r -n1 zfs destroy; done 2>/dev/null
CRON
log "Snapshot schedule: hourly, keep 48"
log "NFS server postinstaller complete"
7. CI runner — GitLab Runner + Docker-in-Docker on ZFS
postinstall-ci-runner.sh
#!/bin/bash
# Postinstaller: GitLab CI Runner with Docker executor on ZFS
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:ci-runner] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR
source /etc/kldload/install-manifest.env 2>/dev/null || true
GITLAB_URL="${GITLAB_URL:-https://gitlab.com}"
RUNNER_TOKEN="${RUNNER_TOKEN:-REPLACE_ME}"
RUNNER_TAGS="${RUNNER_TAGS:-kldload,zfs,docker}"
CONCURRENT_JOBS="${CONCURRENT_JOBS:-4}"
log "Starting CI runner postinstaller"
# ── Phase 1: ZFS datasets ────────────────────────────────────────────────
# Docker storage for build images
zfs create -o recordsize=128K \
-o compression=zstd \
-o atime=off \
rpool/data/docker
# Build cache: large layer tarballs, max compression
zfs create -o recordsize=1M \
-o compression=zstd-5 \
-o atime=off \
rpool/data/ci-cache
# Build workspace: scratch space, destroyed after each job
zfs create -o recordsize=128K \
-o compression=zstd \
-o atime=off \
rpool/data/ci-builds
log "ZFS datasets created for CI workloads"
# ── Phase 2: Install Docker ──────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora)
dnf config-manager --add-repo \
https://download.docker.com/linux/centos/docker-ce.repo 2>/dev/null || true
dnf install -y docker-ce docker-ce-cli containerd.io
;;
debian|ubuntu)
apt-get update
apt-get install -y docker.io
;;
esac
mkdir -p /etc/docker
cat > /etc/docker/daemon.json <<'JSON'
{
"storage-driver": "zfs",
"data-root": "/data/docker",
"log-driver": "json-file",
"log-opts": { "max-size": "20m", "max-file": "3" }
}
JSON
systemctl enable --now docker
log "Docker installed with ZFS storage driver"
# ── Phase 3: Install GitLab Runner ───────────────────────────────────────
curl -fsSL "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.rpm.sh" \
| bash 2>/dev/null || true
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora) dnf install -y gitlab-runner ;;
debian|ubuntu) apt-get install -y gitlab-runner ;;
esac
log "GitLab Runner installed"
# ── Phase 4: Register runner ─────────────────────────────────────────────
if [[ "${RUNNER_TOKEN}" != "REPLACE_ME" ]]; then
gitlab-runner register \
--non-interactive \
--url "${GITLAB_URL}" \
--token "${RUNNER_TOKEN}" \
--executor "docker" \
--docker-image "alpine:latest" \
--docker-privileged \
--docker-volumes "/data/ci-cache:/cache" \
--docker-volumes "/data/ci-builds:/builds" \
--tag-list "${RUNNER_TAGS}" \
--run-untagged="true" \
--locked="false"
log "Runner registered with ${GITLAB_URL} (tags: ${RUNNER_TAGS})"
else
log "RUNNER_TOKEN not set — register manually:"
log " gitlab-runner register --url ${GITLAB_URL} --token YOUR_TOKEN"
fi
# ── Phase 5: Configure concurrency ───────────────────────────────────────
sed -i "s/concurrent = .*/concurrent = ${CONCURRENT_JOBS}/" /etc/gitlab-runner/config.toml
systemctl enable --now gitlab-runner
log "Runner configured for ${CONCURRENT_JOBS} concurrent jobs"
# ── Phase 6: Cleanup cron (ZFS makes this cheap) ────────────────────────
cat > /etc/cron.d/ci-cleanup <<'CRON'
# Prune Docker images older than 24h every 6 hours
0 */6 * * * root docker system prune -af --filter "until=24h" >/dev/null 2>&1
# Snapshot CI cache daily
0 3 * * * root zfs snapshot rpool/data/ci-cache@daily-$(date +\%Y\%m\%d) 2>/dev/null
CRON
log "Cleanup schedule configured"
log "CI runner postinstaller complete"
8. VPN gateway — WireGuard hub + nftables + routing
postinstall-vpn-gateway.sh
#!/bin/bash
# Postinstaller: WireGuard VPN gateway with nftables NAT and routing
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:vpn-gw] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR
source /etc/kldload/install-manifest.env 2>/dev/null || true
WG_PORT="${WG_PORT:-51820}"
WG_NET="${WG_NET:-10.200.0.0/24}"
WG_ADDR="${WG_ADDR:-10.200.0.1/24}"
WAN_IFACE="$(ip route show default | awk '{print $5}' | head -1)"
log "Starting VPN gateway postinstaller (WAN: ${WAN_IFACE}, WG net: ${WG_NET})"
# ── Phase 1: Install packages ────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora) dnf install -y wireguard-tools nftables qrencode ;;
debian|ubuntu) apt-get update && apt-get install -y wireguard nftables qrencode ;;
arch) pacman -S --noconfirm wireguard-tools nftables qrencode ;;
esac
log "Packages installed"
# ── Phase 2: Enable IP forwarding ────────────────────────────────────────
cat > /etc/sysctl.d/99-vpn-gateway.conf <<'SYSCTL'
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.ipv4.conf.all.proxy_arp = 0
SYSCTL
sysctl --system >/dev/null
log "IP forwarding enabled"
# ── Phase 3: Generate WireGuard keys ─────────────────────────────────────
mkdir -p /etc/wireguard
chmod 700 /etc/wireguard
wg genkey | tee /etc/wireguard/server.key | wg pubkey > /etc/wireguard/server.pub
chmod 600 /etc/wireguard/server.key
cat > /etc/wireguard/wg0.conf <<EOF
[Interface]
Address = ${WG_ADDR}
ListenPort = ${WG_PORT}
PrivateKey = $(cat /etc/wireguard/server.key)
SaveConfig = false
# Add peers below or use the add-vpn-client script
EOF
chmod 600 /etc/wireguard/wg0.conf
log "WireGuard server key generated, listening on :${WG_PORT}"
# ── Phase 4: nftables firewall + NAT ─────────────────────────────────────
cat > /etc/nftables.conf <<NFT
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
iif "lo" accept
ct state established,related accept
ct state invalid drop
tcp dport 22 accept
udp dport ${WG_PORT} accept
iifname "wg0" accept
icmp type echo-request accept
icmpv6 type { echo-request, nd-neighbor-solicit, nd-router-advert, nd-neighbor-advert } accept
}
chain forward {
type filter hook forward priority 0; policy drop;
iifname "wg0" accept
oifname "wg0" ct state established,related accept
}
chain output {
type filter hook output priority 0; policy accept;
}
}
table inet nat {
chain postrouting {
type nat hook postrouting priority 100;
oifname "${WAN_IFACE}" masquerade
}
}
NFT
systemctl enable --now nftables
log "nftables configured: NAT masquerade on ${WAN_IFACE}, WG traffic forwarded"
# ── Phase 5: Client provisioning script ──────────────────────────────────
cat > /usr/local/bin/add-vpn-client <<'SCRIPT'
#!/bin/bash
# Usage: add-vpn-client
set -euo pipefail
CLIENT="${1:?Usage: add-vpn-client }"
WG_DIR="/etc/wireguard"
CLIENT_DIR="${WG_DIR}/clients/${CLIENT}"
# Find next available IP
LAST_IP=$(grep -oP 'AllowedIPs = 10\.200\.0\.\K[0-9]+' "${WG_DIR}/wg0.conf" 2>/dev/null | sort -n | tail -1)
NEXT_IP=$((${LAST_IP:-1} + 1))
mkdir -p "${CLIENT_DIR}"
wg genkey | tee "${CLIENT_DIR}/private.key" | wg pubkey > "${CLIENT_DIR}/public.key"
# Add peer to server config
cat >> "${WG_DIR}/wg0.conf" < "${CLIENT_DIR}/${CLIENT}.conf" <# ── Phase 6: Start WireGuard ─────────────────────────────────────────────
systemctl enable --now wg-quick@wg0
log "WireGuard VPN gateway running"
log "Add clients with: add-vpn-client <name>"
log "VPN gateway postinstaller complete"
9. Mail server — Postfix + Dovecot on ZFS
postinstall-mail.sh
#!/bin/bash
# Postinstaller: Mail server with Postfix (SMTP) + Dovecot (IMAP) on ZFS
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:mail] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR
source /etc/kldload/install-manifest.env 2>/dev/null || true
MAIL_DOMAIN="${MAIL_DOMAIN:-$(hostname -d)}"
log "Starting mail server postinstaller (domain: ${MAIL_DOMAIN})"
# ── Phase 1: ZFS datasets ────────────────────────────────────────────────
# Mailboxes: many small files (individual emails), enable atime for IMAP
zfs create -o recordsize=16K \
-o compression=zstd \
-o atime=on \
-o relatime=on \
rpool/data/mail
# Mail queue: temporary spool, fast writes
zfs create -o recordsize=64K \
-o compression=lz4 \
-o atime=off \
-o sync=disabled \
rpool/data/mail/queue
# Mail logs
zfs create -o recordsize=128K \
-o compression=zstd-3 \
-o atime=off \
rpool/data/logs/mail
log "ZFS datasets created for mail storage"
# ── Phase 2: Install packages ────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora)
dnf install -y postfix dovecot dovecot-pigeonhole opendkim opendkim-tools
;;
debian|ubuntu)
DEBIAN_FRONTEND=noninteractive apt-get install -y \
postfix dovecot-imapd dovecot-lmtpd dovecot-sieve \
opendkim opendkim-tools
;;
esac
log "Mail packages installed"
# ── Phase 3: Postfix configuration ───────────────────────────────────────
cat > /etc/postfix/main.cf <<POSTFIX
# Basic settings
myhostname = mail.${MAIL_DOMAIN}
mydomain = ${MAIL_DOMAIN}
myorigin = \$mydomain
mydestination = \$myhostname, localhost.\$mydomain, localhost, \$mydomain
inet_interfaces = all
inet_protocols = all
# TLS (generate certs with certbot after DNS is pointed)
smtpd_tls_cert_file = /etc/pki/tls/certs/localhost.crt
smtpd_tls_key_file = /etc/pki/tls/private/localhost.key
smtpd_use_tls = yes
smtpd_tls_auth_only = yes
smtpd_tls_security_level = may
smtp_tls_security_level = may
# SASL authentication via Dovecot
smtpd_sasl_type = dovecot
smtpd_sasl_path = private/auth
smtpd_sasl_auth_enable = yes
smtpd_recipient_restrictions = permit_mynetworks, permit_sasl_authenticated, reject_unauth_destination
# Delivery via Dovecot LMTP
virtual_transport = lmtp:unix:private/dovecot-lmtp
virtual_mailbox_domains = ${MAIL_DOMAIN}
# Queue on ZFS
queue_directory = /data/mail/queue
# Limits
message_size_limit = 52428800
mailbox_size_limit = 0
POSTFIX
log "Postfix configured for ${MAIL_DOMAIN}"
# ── Phase 4: Dovecot configuration ───────────────────────────────────────
cat > /etc/dovecot/local.conf <<'DOVECOT'
protocols = imap lmtp sieve
mail_location = maildir:/data/mail/users/%u/Maildir
mail_privileged_group = mail
# Authentication
auth_mechanisms = plain login
passdb {
driver = pam
}
userdb {
driver = passwd
}
# LMTP socket for Postfix
service lmtp {
unix_listener /var/spool/postfix/private/dovecot-lmtp {
mode = 0600
user = postfix
group = postfix
}
}
# SASL socket for Postfix
service auth {
unix_listener /var/spool/postfix/private/auth {
mode = 0660
user = postfix
group = postfix
}
}
# Sieve filtering
plugin {
sieve = /data/mail/users/%u/sieve/active.sieve
sieve_dir = /data/mail/users/%u/sieve
}
# TLS
ssl = required
ssl_cert = # ── Phase 5: Enable services ─────────────────────────────────────────────
systemctl enable --now postfix dovecot
log "Postfix and Dovecot started"
# ── Phase 6: Per-user mailbox snapshots ──────────────────────────────────
cat > /etc/cron.d/zfs-mail-snapshots <<'CRON'
# Hourly snapshots of all mail, keep 72 hours
0 * * * * root zfs snapshot rpool/data/mail@hourly-$(date +\%Y\%m\%d-\%H00) 2>/dev/null
5 0 * * * root zfs list -t snapshot -o name -H rpool/data/mail | grep hourly | head -n -72 | xargs -r -n1 zfs destroy 2>/dev/null
CRON
log "Mail snapshot schedule: hourly, keep 72"
log "Mail server postinstaller complete"
log " SMTP: port 25/587 IMAP: port 993"
log " Replace TLS certs with real ones via certbot"
10. Desktop workstation — dev tools, VSCode, Docker, GPU drivers
postinstall-workstation.sh
#!/bin/bash
# Postinstaller: Developer workstation with GNOME, Docker, dev tools
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:workstation] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR
source /etc/kldload/install-manifest.env 2>/dev/null || true
DEV_USER="${DEV_USER:-dev}"
log "Starting workstation postinstaller (user: ${DEV_USER})"
# ── Phase 1: ZFS datasets for dev work ───────────────────────────────────
zfs create -o recordsize=128K \
-o compression=zstd \
-o atime=off \
rpool/data/docker
zfs create -o recordsize=128K \
-o compression=zstd \
-o atime=off \
"rpool/home/${DEV_USER}/projects"
zfs create -o recordsize=1M \
-o compression=zstd-5 \
-o atime=off \
"rpool/home/${DEV_USER}/vms"
log "ZFS datasets created for development"
# ── Phase 2: Development tools ───────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky)
dnf install -y \
gcc gcc-c++ make cmake git curl wget jq htop tmux vim \
python3 python3-pip python3-devel \
nodejs npm \
golang \
openssl-devel zlib-devel readline-devel \
libvirt virt-install qemu-kvm \
flatpak
;;
fedora)
dnf install -y \
gcc gcc-c++ make cmake git curl wget jq htop tmux vim \
python3 python3-pip python3-devel \
nodejs npm \
golang \
openssl-devel zlib-devel readline-devel \
libvirt virt-install qemu-kvm \
flatpak code
;;
debian|ubuntu)
apt-get update
apt-get install -y \
build-essential cmake git curl wget jq htop tmux vim \
python3 python3-pip python3-venv python3-dev \
nodejs npm \
golang-go \
libssl-dev zlib1g-dev libreadline-dev \
libvirt-daemon-system virtinst qemu-kvm \
flatpak
;;
esac
log "Development tools installed"
# ── Phase 3: Docker ──────────────────────────────────────────────────────
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora)
dnf config-manager --add-repo \
https://download.docker.com/linux/centos/docker-ce.repo 2>/dev/null || true
dnf install -y docker-ce docker-ce-cli containerd.io \
docker-buildx-plugin docker-compose-plugin
;;
debian|ubuntu)
apt-get install -y docker.io docker-compose
;;
esac
mkdir -p /etc/docker
cat > /etc/docker/daemon.json <<'JSON'
{
"storage-driver": "zfs",
"data-root": "/data/docker",
"log-driver": "json-file",
"log-opts": { "max-size": "50m", "max-file": "3" }
}
JSON
systemctl enable --now docker
log "Docker configured with ZFS storage driver"
# ── Phase 4: Create developer user ───────────────────────────────────────
useradd -m -s /bin/bash -G docker,libvirt,wheel "${DEV_USER}" 2>/dev/null || true
chown -R "${DEV_USER}:${DEV_USER}" "/home/${DEV_USER}"
# Git config
sudo -u "${DEV_USER}" git config --global init.defaultBranch main
sudo -u "${DEV_USER}" git config --global pull.rebase true
# Shell enhancements
cat >> "/home/${DEV_USER}/.bashrc" <<'BASHRC'
# Dev environment
export EDITOR=vim
export PATH="${HOME}/.local/bin:${HOME}/go/bin:${PATH}"
# Aliases
alias ll='ls -alF'
alias gs='git status'
alias gd='git diff'
alias gl='git log --oneline --graph -20'
alias dc='docker compose'
alias k='kubectl'
BASHRC
log "Developer user ${DEV_USER} configured"
# ── Phase 5: VSCode (Flatpak for distro independence) ────────────────────
flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo 2>/dev/null || true
flatpak install -y flathub com.visualstudio.code 2>/dev/null || {
log "Flatpak VSCode install failed — install manually"
}
log "VSCode installed via Flatpak"
# ── Phase 6: NVIDIA drivers (if GPU detected) ───────────────────────────
if lspci | grep -qi nvidia; then
log "NVIDIA GPU detected — installing drivers"
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky)
dnf install -y epel-release
dnf install -y nvidia-driver nvidia-driver-cuda
;;
fedora)
dnf install -y akmod-nvidia xorg-x11-drv-nvidia-cuda
;;
debian|ubuntu)
apt-get install -y nvidia-driver firmware-misc-nonfree 2>/dev/null || \
apt-get install -y nvidia-driver-535 2>/dev/null || \
log "NVIDIA driver install failed — install manually"
;;
esac
log "NVIDIA drivers installed"
else
log "No NVIDIA GPU detected — skipping driver install"
fi
# ── Phase 7: Snapshot the dev environment ────────────────────────────────
zfs snapshot "rpool/home/${DEV_USER}@fresh-install"
log "Snapshot taken: rpool/home/${DEV_USER}@fresh-install"
log "Workstation postinstaller complete"
The workstation postinstaller demonstrates something important: the same technique that deploys production servers also sets up developer laptops. The fresh-install snapshot at the end is the developer equivalent of a golden image. Two months from now, after installing 47 Python packages and three conflicting versions of Node, you can zfs rollback rpool/home/dev@fresh-install and be back to day one in seconds. No reinstall. No "nuke and pave." Just rewind the filesystem. This is what ZFS on root gives you that ext4 and Btrfs do not: fearless experimentation with instant undo.
11. Hardened bastion — SSH jump host with audit logging
postinstall-bastion.sh
#!/bin/bash
# Postinstaller: Hardened SSH bastion / jump host
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall:bastion] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR
source /etc/kldload/install-manifest.env 2>/dev/null || true
ALLOWED_USERS="${ALLOWED_USERS:-}"
log "Starting bastion postinstaller"
# ── Phase 1: ZFS dataset for audit logs ──────────────────────────────────
zfs create -o recordsize=16K \
-o compression=zstd \
-o atime=off \
rpool/data/audit
chmod 700 /data/audit
log "Created rpool/data/audit"
# ── Phase 2: Harden SSH ──────────────────────────────────────────────────
cat > /etc/ssh/sshd_config.d/99-bastion.conf <<'SSHD'
# Bastion hardening
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AuthenticationMethods publickey
MaxAuthTries 3
MaxSessions 10
ClientAliveInterval 300
ClientAliveCountMax 2
X11Forwarding no
AllowTcpForwarding yes
AllowAgentForwarding yes
PermitTunnel no
# Logging
LogLevel VERBOSE
SyslogFacility AUTH
# Crypto
KexAlgorithms curve25519-sha256,curve25519-sha256@libssh.org
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com
HostKeyAlgorithms ssh-ed25519
SSHD
systemctl restart sshd
log "SSH hardened: key-only, ed25519, verbose logging"
# ── Phase 3: Session recording ───────────────────────────────────────────
# Record all SSH sessions to ZFS-backed audit log
case "${KLDLOAD_DISTRO:-centos}" in
centos|rhel|rocky|fedora) dnf install -y tlog ;;
debian|ubuntu) apt-get update && apt-get install -y tlog ;;
*) log "tlog not available for this distro — skipping session recording" ;;
esac
if command -v tlog-rec >/dev/null 2>&1; then
cat > /etc/tlog/tlog-rec-session.conf <<'TLOG'
{
"shell": "/bin/bash",
"writer": "file",
"file": {
"path": "/data/audit/tlog-sessions.log"
}
}
TLOG
log "tlog session recording enabled"
fi
# ── Phase 4: Strict firewall ─────────────────────────────────────────────
cat > /etc/nftables.conf <<'NFT'
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
iif "lo" accept
ct state established,related accept
ct state invalid drop
tcp dport 22 accept
icmp type echo-request limit rate 5/second accept
}
chain forward { type filter hook forward priority 0; policy drop; }
chain output { type filter hook output priority 0; policy accept; }
}
NFT
systemctl enable --now nftables
log "Firewall: SSH only, ICMP rate-limited"
# ── Phase 5: Audit log snapshots (immutable history) ────────────────────
cat > /etc/cron.d/zfs-audit-snapshots <<'CRON'
# Hourly snapshots of audit logs — NEVER auto-delete these
0 * * * * root zfs snapshot rpool/data/audit@hourly-$(date +\%Y\%m\%d-\%H00) 2>/dev/null
CRON
log "Audit log snapshots: hourly, never deleted"
log "Bastion postinstaller complete — SSH-only jump host ready"
Composing postinstallers — combining multiple scripts
Real infrastructure is rarely a single role. A monitoring server might also need Docker for running Grafana. A web server might need monitoring agents. Instead of writing one monolithic script, compose multiple focused postinstallers into a pipeline.
The dispatcher pattern
Instead of putting everything in postinstall.sh, use it as a dispatcher
that sources role-specific scripts in order. Each script is self-contained and testable
independently.
#!/bin/bash
# postinstall.sh — dispatcher
# Sources role-specific scripts from /root/darksite/roles/
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall] %s\n' -1 "$*"; }
trap 'log "FAILED at line ${LINENO}: ${BASH_COMMAND}"; exit 1' ERR
ROLE_DIR="/root/darksite/roles"
source /etc/kldload/install-manifest.env 2>/dev/null || true
# Define the role pipeline — order matters
ROLES=(
"00-base-hardening.sh" # Always: sysctl, firewall, SSH hardening
"10-zfs-datasets.sh" # Always: create application ZFS datasets
"20-docker.sh" # If needed: Docker + ZFS storage driver
"30-monitoring-agent.sh" # Always: node_exporter + promtail
"40-application.sh" # Role-specific: web server, database, etc.
"90-verification.sh" # Always: verify all services are running
)
for role in "${ROLES[@]}"; do
script="${ROLE_DIR}/${role}"
if [[ -f "${script}" ]]; then
log "Running role: ${role}"
source "${script}"
log "Completed role: ${role}"
else
log "Skipping role: ${role} (not found)"
fi
done
log "All roles complete"
Directory layout for composed postinstallers
live-build/config/includes.chroot/root/darksite/
├── postinstall.sh # Dispatcher (sources roles in order)
├── roles/
│ ├── 00-base-hardening.sh # sysctl, SSH, nftables
│ ├── 10-zfs-datasets.sh # Create rpool/data/* datasets
│ ├── 20-docker.sh # Docker + ZFS storage driver
│ ├── 30-monitoring-agent.sh # node_exporter, promtail
│ ├── 40-web-server.sh # nginx + certbot (web role)
│ ├── 40-database.sh # PostgreSQL (db role)
│ ├── 40-k8s-node.sh # kubeadm + containerd (k8s role)
│ └── 90-verification.sh # Verify services, log summary
└── config/
├── nginx.conf # Pre-built config files
├── postgresql.conf # (rather than heredocs in scripts)
├── prometheus.yml
└── nftables.conf
The numbering convention controls execution order. The 40-* prefix indicates
application-layer scripts — only include the one matching your target role. The
00-* through 30-* scripts are shared across all roles.
Dependency management between roles
# 40-application.sh — check that dependencies ran first
# Require Docker (role 20)
if ! command -v docker >/dev/null 2>&1; then
log "ERROR: Docker not found — did 20-docker.sh run?"
exit 1
fi
# Require ZFS datasets (role 10)
if ! zfs list rpool/data/app >/dev/null 2>&1; then
log "ERROR: rpool/data/app not found — did 10-zfs-datasets.sh run?"
exit 1
fi
# Require monitoring agent (role 30)
if ! systemctl is-active --quiet node_exporter; then
log "WARNING: node_exporter not running — monitoring will be incomplete"
fi
# All dependencies satisfied — proceed
log "Dependencies verified, configuring application"
Each role checks its prerequisites before proceeding. This catches misconfigurations early — you know immediately if a role was missing from the pipeline instead of finding out an hour later when the application fails to start.
The composed postinstaller pattern is the infrastructure equivalent of Unix philosophy: each script does one thing well, and they compose through a simple interface (source in order, check dependencies). The numbering scheme is borrowed from SysV init scripts and systemd wants/requires ordering. It works because the contract is simple: each script expects certain state to exist (packages installed, datasets created) and produces certain state (services running, configs written). The dispatcher is dumb. The roles are smart. The result is modular infrastructure that you can mix and match.
Testing postinstallers — KVM before hardware
Never deploy a postinstaller to hardware without testing it in a VM first. kldload's KVM deployment path gives you a fast feedback loop: build the ISO, boot it in a VM, watch the postinstaller run, verify the result, iterate.
The test loop
#!/bin/bash
# test-postinstaller.sh — build ISO and test in KVM
# Run from the kldload-free repo root
# 1. Place your postinstaller
cp my-postinstall.sh \
live-build/config/includes.chroot/root/darksite/postinstall.sh
chmod +x live-build/config/includes.chroot/root/darksite/postinstall.sh
# 2. Build the ISO (incremental — only rebuilds what changed)
PROFILE=server ./deploy.sh build
# 3. Deploy to KVM
./deploy.sh kvm-deploy
# 4. Watch the install (connects to VM serial console)
virsh console kldload-test
# 5. After install completes, SSH in and verify
ssh root@<vm-ip>
# 6. Check postinstall log
cat /var/log/kldload/postinstall.log
# 7. Verify services
systemctl status nginx postgresql docker
# 8. When done testing, destroy and try again
virsh destroy kldload-test
virsh undefine kldload-test --nvram
Using ZFS snapshots for rapid iteration
# After the base install finishes but BEFORE the postinstaller runs,
# snapshot the base state. Then you can revert and re-run the
# postinstaller without reinstalling from scratch.
# On the test VM, after first boot:
zfs snapshot rpool/ROOT/kldload@pre-postinstall
# Run your postinstaller manually
bash /root/darksite/postinstall.sh
# Something broke? Rollback and try again
zfs rollback rpool/ROOT/kldload@pre-postinstall
# Edit the script, re-run. No reinstall needed.
# Works? Snapshot the result
zfs snapshot rpool/ROOT/kldload@post-install
# This is your golden image. Test it for a few days.
# If anything is wrong, rollback to @pre-postinstall and fix the script.
This is the fastest way to iterate on postinstallers. The ZFS rollback takes less than a second regardless of how much the postinstaller changed. No reinstall, no rebuild, no waiting. Just rewind and try again.
Verifying the postinstaller programmatically
#!/bin/bash
# 90-verification.sh — run after all roles to verify the system
set -euo pipefail
log() { printf '[%(%F %T)T] [verify] %s\n' -1 "$*"; }
ERRORS=0
# Check required services
for svc in nginx postgresql docker node_exporter; do
if systemctl is-active --quiet "${svc}" 2>/dev/null; then
log "OK: ${svc} is running"
else
log "FAIL: ${svc} is NOT running"
((ERRORS++))
fi
done
# Check required ZFS datasets
for ds in rpool/data/www rpool/data/postgres rpool/data/docker; do
if zfs list "${ds}" >/dev/null 2>&1; then
log "OK: ${ds} exists"
else
log "FAIL: ${ds} does not exist"
((ERRORS++))
fi
done
# Check required ports are listening
for port in 22 80 443 5432 9090 9100; do
if ss -tlnp | grep -q ":${port} "; then
log "OK: port ${port} is listening"
else
log "FAIL: port ${port} is NOT listening"
((ERRORS++))
fi
done
# Check ZFS pool health
POOL_STATE=$(zpool status -x 2>&1)
if [[ "${POOL_STATE}" == "all pools are healthy" ]]; then
log "OK: ZFS pool healthy"
else
log "FAIL: ZFS pool issue: ${POOL_STATE}"
((ERRORS++))
fi
# Summary
if (( ERRORS == 0 )); then
log "ALL CHECKS PASSED"
else
log "FAILED: ${ERRORS} check(s) failed"
exit 1
fi
Put this in your role pipeline as the last step. If any check fails, the postinstaller exits with an error code and the log tells you exactly what is wrong. Automate the verification so you never deploy a broken system.
The ZFS rollback test loop is the single biggest productivity gain for postinstaller development. Without it, every failed test means a full reinstall — partition the disk, install the base, wait for DKMS, wait for packages. With ZFS snapshots, a failed postinstaller costs you one second of rollback time. You can iterate 50 times in the time a single reinstall takes. This is why ZFS on root matters even during development — the development workflow itself is faster.
Distributing postinstallers
Once you have a working postinstaller, you need to get it into the ISO. There are several approaches depending on your workflow and how many variants you maintain.
Method 1: Bake into the ISO (recommended)
The simplest and most reliable approach. Drop your script into the ISO build tree and it ships with every ISO you build.
# Place the postinstaller in the ISO build tree
cp postinstall.sh \
live-build/config/includes.chroot/root/darksite/postinstall.sh
chmod +x live-build/config/includes.chroot/root/darksite/postinstall.sh
# Include supporting files (configs, scripts, data)
mkdir -p live-build/config/includes.chroot/root/darksite/roles
cp roles/*.sh live-build/config/includes.chroot/root/darksite/roles/
cp -r config/ live-build/config/includes.chroot/root/darksite/config/
# Build the ISO — the postinstaller and all supporting files
# are embedded in the squashfs and end up at /root/darksite/ on the target
PROFILE=server ./deploy.sh build
Everything under live-build/config/includes.chroot/ is mirrored into the ISO's
root filesystem. When kldload installs the base system, it copies
/root/darksite/ from the live environment to the target. Your postinstaller
and all its supporting files arrive on the target system automatically.
Method 2: Git repository (for teams)
# Maintain postinstallers in a separate git repo
# Clone into the build tree at build time
# In your CI pipeline or build script:
git clone https://gitlab.internal/infra/postinstallers.git \
live-build/config/includes.chroot/root/darksite/postinstallers
# The main postinstall.sh dispatches to the cloned repo
cat > live-build/config/includes.chroot/root/darksite/postinstall.sh <<'DISPATCH'
#!/bin/bash
set -euo pipefail
ROLE="${KLDLOAD_HOSTNAME%%-*}" # web-01 → web, db-01 → db
SCRIPT="/root/darksite/postinstallers/${ROLE}/postinstall.sh"
if [[ -x "${SCRIPT}" ]]; then
exec "${SCRIPT}"
else
echo "No postinstaller found for role: ${ROLE}" >&2
exit 1
fi
DISPATCH
PROFILE=server ./deploy.sh build
This pattern lets your infrastructure team maintain postinstallers in version control
while the ISO build pulls the latest version at build time. The hostname convention
(web-01, db-01, k8s-cp-01) determines which
postinstaller runs.
Method 3: Answers file embedding
# For unattended installs, the answers file can specify a postinstaller URL
# The installer downloads it before the first boot
# answers.env
KLDLOAD_DISTRO=debian
KLDLOAD_PROFILE=server
KLDLOAD_HOSTNAME=web-prod-01
KLDLOAD_DISK=/dev/vda
# Postinstaller — downloaded during install and placed at /root/darksite/
KLDLOAD_POSTINSTALL_URL=http://10.0.0.1:8080/postinstallers/web-server.sh
# Or embed the script directly (base64-encoded)
KLDLOAD_POSTINSTALL_B64="$(base64 -w0 < postinstall.sh)"
Useful for environments where you want one ISO image but different postinstallers per machine. The answers file is per-machine; the ISO is shared.
Method 4: Darksite HTTP server
# Host postinstallers on the darksite HTTP server (port 3142/3143)
# Place scripts in the darksite directory before building
mkdir -p live-build/config/includes.chroot/root/darksite/postinstallers/
cp web-server.sh live-build/config/includes.chroot/root/darksite/postinstallers/
cp database.sh live-build/config/includes.chroot/root/darksite/postinstallers/
cp k8s-node.sh live-build/config/includes.chroot/root/darksite/postinstallers/
cp monitoring.sh live-build/config/includes.chroot/root/darksite/postinstallers/
# During install, the live ISO serves these via HTTP
# Target systems can curl from the live ISO at install time:
# curl http://10.0.0.1:3142/postinstallers/web-server.sh
# Useful for multi-node deploys where each node picks its own role
Method 1 (bake into ISO) is the right choice for 90% of deployments. It is self-contained, offline-capable, and deterministic. Methods 2-4 add flexibility but also add dependencies (git server, HTTP server, network connectivity). The darksite philosophy is: if it is not in the image, it does not exist. Every external dependency is a failure mode. That said, for teams managing 50+ machine types, the git repo approach (method 2) scales better than maintaining 50 separate ISO builds. Pick the right tool for your scale.
Debugging postinstallers
Postinstallers fail. Networks are unreliable, packages change names, configs have typos, services refuse to start. The difference between a 3am debugging nightmare and a 5-minute fix is the quality of your logging and the knowledge of where to look.
Log locations
# Primary postinstaller log (your script's output)
/var/log/kldload/postinstall.log
# kldload firstboot service log (the wrapper that runs your script)
/var/log/kldload/firstboot.log
# kldload installer log (base install — before postinstaller)
/var/log/installer/kldload-installer.log
/var/log/installer/bootstrap.log
# systemd journal for the firstboot service
journalctl -u kldload-firstboot.service --no-pager
# Individual service logs (when a service fails to start)
journalctl -u nginx --no-pager -n 50
journalctl -u postgresql --no-pager -n 50
journalctl -u docker --no-pager -n 50
# Install manifest (the environment your postinstaller ran in)
cat /etc/kldload/install-manifest.env
# ZFS pool status (if storage-related issues)
zpool status
zfs list
Common failures and fixes
- Package not found: The darksite does not have the package you requested. Either add it to the package set files in
build/darksite/config/package-sets/and rebuild, or ensure network access for online installs. - Service fails to start: Check
journalctl -u servicename. Usually a config syntax error. Runnginx -torpostgresql --checkbefore enabling. - Permission denied: ZFS datasets mount at
/data/datasetnamebut are owned by root.chownto the service user after creation. - Network not available: The postinstaller runs on first boot. DHCP might not have completed yet. Use the retry wrapper and wait for network.
- Script exits silently: You forgot
set -euo pipefailor the ERR trap. A command failed but the script continued. Always use strict mode. - Wrong distro commands: Used
dnfon a Debian install. Always checkKLDLOAD_DISTROand branch to the right package manager. - ZFS dataset already exists: The postinstaller ran twice (reboot during execution). Use
zfs createwith a guard:zfs list rpool/data/app 2>/dev/null || zfs create rpool/data/app. - Disk space: The darksite payload uses space on the root dataset. If your ISO is large, the root dataset might be tight. Create separate datasets early and put data there.
Interactive debugging on a failed install
# The system booted but the postinstaller failed partway through.
# SSH in (or use the console) and debug interactively.
# 1. Check what failed
cat /var/log/kldload/postinstall.log | tail -30
# Look for "FAILED at line XX" — that is the exact failure point
# 2. Check the manifest to understand the environment
cat /etc/kldload/install-manifest.env
# 3. Check ZFS state
zfs list
zpool status
# 4. Snapshot the current (broken) state so you can come back to it
zfs snapshot rpool/ROOT/kldload@debug-$(date +%H%M)
# 5. Fix the postinstaller script
vim /root/darksite/postinstall.sh
# 6. Rollback to pre-postinstall state (if you have the snapshot)
zfs rollback rpool/ROOT/kldload@pre-postinstall
# 7. Re-run the fixed script
bash /root/darksite/postinstall.sh
# 8. If it works, update the ISO build tree with the fixed script
# and rebuild the ISO for future deploys
The nuclear option: manual chroot debugging
# If the system will not boot at all after install, boot the ISO again
# and manually import the ZFS pool to inspect the target system.
# 1. Boot the kldload ISO (live environment)
# 2. Import the target pool
zpool import -f rpool
# 3. Mount the root dataset
zfs mount rpool/ROOT/kldload
# 4. Check logs from the failed install
cat /target/var/log/kldload/postinstall.log
cat /target/var/log/kldload/firstboot.log
# 5. Chroot in to fix things manually
mount --bind /dev /target/dev
mount --bind /proc /target/proc
mount --bind /sys /target/sys
chroot /target /bin/bash
# Now you are "inside" the broken system
# Fix configs, reinstall packages, whatever is needed
# 6. Exit chroot and clean up
exit
umount /target/sys /target/proc /target/dev
zpool export rpool
# 7. Remove the ISO and reboot from disk
This is the last resort. It works because ZFS pools are self-describing — you can import them on any system with ZFS, inspect the filesystem, and fix things. The live ISO always has ZFS, so you always have a recovery environment.
The debugging story is why ZFS on root matters for postinstallers. On ext4, a failed postinstaller means reinstalling from scratch. On ZFS, you snapshot before the postinstaller, rollback if it fails, fix the script, and re-run. The entire debug cycle takes seconds instead of minutes. The nuclear option (booting the ISO and importing the pool) means you are never locked out of a broken system. The pool is portable. The data is always accessible. This is the safety net that makes it sane to ship postinstallers to production hardware.
The darksite pattern — baking everything in
What is a darksite?
A "darksite" is an air-gapped deployment — no internet, no upstream repos, no cloud APIs. Everything the system needs must be baked into the ISO or carried on the USB drive. This includes:
- APT/DNF packages — a complete local repository snapshot
- Container images — OCI tarballs loaded into containerd/Docker on first boot
- Ansible playbooks — the entire orchestration tree
- Helm charts — bundled for offline Kubernetes deployments
- TLS certificates — pre-generated PKI for etcd, API server, etc.
- WireGuard keys — hub keypairs for mesh networking
- Configuration files — per-node or per-role configs baked in
The darksite pattern comes from classified environments where network access is physically impossible. Ships, air-gapped facilities, SCIF rooms, factory floors. The concept is old — people have been building self-contained install media since the BBS era. What is new is doing it with modern infrastructure: Kubernetes, containers, WireGuard mesh, PKI certificates, Helm charts — all offline, all baked in, all verified at build time. kldload makes the same technique accessible to anyone who wants a deployment that does not depend on the internet being up.
Baking container images into the ISO
# At build time: pull and save container images as tarballs
mkdir -p live-build/config/includes.chroot/root/darksite/images
# Save each image as an OCI tarball
docker pull nginx:1.25-alpine
docker save nginx:1.25-alpine -o \
live-build/config/includes.chroot/root/darksite/images/nginx-1.25-alpine.tar
docker pull postgres:16-alpine
docker save postgres:16-alpine -o \
live-build/config/includes.chroot/root/darksite/images/postgres-16-alpine.tar
docker pull grafana/grafana:10.4.1
docker save grafana/grafana:10.4.1 -o \
live-build/config/includes.chroot/root/darksite/images/grafana-10.4.1.tar
# In the postinstaller: load the pre-saved images
for img in /root/darksite/images/*.tar; do
docker load -i "${img}"
log "Loaded container image: $(basename "${img}")"
done
# Now docker run nginx:1.25-alpine works — no pull needed
This is how you run containers offline. Every image is pre-pulled at build time, saved as a tarball, embedded in the ISO, and loaded on first boot. The container runtime never contacts a registry. The images are verified at build time and identical at deploy time.
Payload directory structure
live-build/config/includes.chroot/root/darksite/
├── postinstall.sh # Entry point
├── roles/ # Composed role scripts
│ ├── 00-base-hardening.sh
│ ├── 10-zfs-datasets.sh
│ ├── 20-docker.sh
│ └── 40-application.sh
├── config/ # Pre-built configuration files
│ ├── nginx.conf
│ ├── postgresql.conf
│ └── nftables.conf
├── images/ # Pre-pulled container images (OCI tarballs)
│ ├── nginx-1.25-alpine.tar
│ └── postgres-16-alpine.tar
├── certs/ # Pre-generated TLS certificates
│ ├── ca.crt
│ ├── server.crt
│ └── server.key
├── keys/ # WireGuard keys, SSH keys
│ ├── wg-hub.key
│ └── deploy.pub
└── helm/ # Helm charts for K8s deployments
├── ingress-nginx-4.10.0.tgz
└── prometheus-25.11.0.tgz
Advanced patterns
The two-poweroff pattern
Why the system powers off twice
Boot 1: ISO installer
+-- kldload installs base OS to disk
+-- Darksite payload copied to target
+-- kldload-firstboot.service enabled
+-- REBOOT (installer done, boots from disk)
Boot 2: First boot from disk
+-- kldload-firstboot.service runs
+-- Reads install manifest
+-- Runs /root/darksite/postinstall.sh
+-- System configured, services started
+-- (Optional) POWEROFF for golden image snapshot
Boot 3+: Production
+-- Normal boot, all services running
+-- firstboot does not run again
This separation is deliberate. The first boot proves the base install worked. The postinstaller proves the customization worked. Each phase is independently verifiable. If any phase fails, you know exactly where.
The two-poweroff pattern is a debugging strategy disguised as a deployment pattern. If the machine does not come up after the first boot, the base install is broken — check the installer logs. If the postinstaller fails, check the postinstall logs. If it comes up on the second boot with everything running, it worked. Each phase is independently verifiable because each phase has its own logs and its own failure modes. No ambiguity about where a failure occurred.
This also means you can snapshot between phases. zfs snapshot rpool@post-base after the first boot. zfs snapshot rpool@post-install after the postinstaller. If the postinstaller breaks something, roll back to @post-base and try again. You do not rebuild from scratch. You rewind to the last good state. This is why ZFS on root matters for deployment — the deployment itself is recoverable.
The golden image pattern
Snapshot, clone, and replicate
Once you have a working system (post-postinstall), snapshot it. That snapshot becomes your golden image. Clone it for every new node. Each clone takes milliseconds and uses zero extra space.
# After postinstall completes, snapshot the golden state
zfs snapshot rpool/ROOT/kldload-node@golden
# Clone for each new node (instant, zero space)
zfs clone rpool/ROOT/kldload-node@golden rpool/ROOT/worker-01
zfs clone rpool/ROOT/kldload-node@golden rpool/ROOT/worker-02
zfs clone rpool/ROOT/kldload-node@golden rpool/ROOT/worker-03
# Or replicate to another machine
zfs send rpool/ROOT/kldload-node@golden | ssh kvm-host zfs recv tank/golden/worker
# Create a ZVOL from the golden image for KVM
zfs send rpool/ROOT/kldload-node@golden | zfs recv rpool/vms/worker-01
# Boot as a VM — instant deployment
This is the golden image pattern that every cloud provider uses internally. AWS does not install EC2 instances from an ISO. They snapshot a golden AMI and stamp out copies. Google, Azure, Oracle — same thing. The difference: they do it on proprietary storage with proprietary tooling. You are doing it on ZFS with zfs clone. Same pattern. Open source. On your hardware. Each clone is instant and uses zero space until it diverges. You can clone 1,000 nodes from one snapshot and the pool barely notices.
Multi-node cluster deployment
Role-based postinstallers for cluster nodes
For multi-node deployments, build a separate ISO per role or use hostname-based dispatch. Each node runs its role-specific postinstaller.
#!/bin/bash
# postinstall.sh — hostname-based role dispatch
set -euo pipefail
LOGFILE="/var/log/kldload/postinstall.log"
mkdir -p "$(dirname "${LOGFILE}")"
exec > >(tee -a "${LOGFILE}") 2>&1
log() { printf '[%(%F %T)T] [postinstall] %s\n' -1 "$*"; }
HOSTNAME="$(hostname -s)"
ROLE_DIR="/root/darksite/roles"
# Dispatch based on hostname prefix
case "${HOSTNAME}" in
cp-*) ROLE="control-plane" ;;
worker-*) ROLE="worker" ;;
lb-*) ROLE="loadbalancer" ;;
mon-*) ROLE="monitoring" ;;
db-*) ROLE="database" ;;
web-*) ROLE="webserver" ;;
*) ROLE="base" ;;
esac
log "Hostname: ${HOSTNAME} → Role: ${ROLE}"
# Run shared roles first, then role-specific
for script in \
"${ROLE_DIR}/00-base-hardening.sh" \
"${ROLE_DIR}/10-zfs-datasets-${ROLE}.sh" \
"${ROLE_DIR}/30-monitoring-agent.sh" \
"${ROLE_DIR}/40-${ROLE}.sh" \
"${ROLE_DIR}/90-verification.sh"; do
if [[ -f "${script}" ]]; then
log "Running: $(basename "${script}")"
source "${script}"
fi
done
log "Role ${ROLE} deployment complete"
One ISO, many roles. The hostname determines the role. Each node picks up the right set of scripts at boot time. Use this with unattended install to deploy entire clusters from a single ISO image.
The drop-off points
A postinstaller has natural "drop-off points" where you can stop and use the system as-is, or continue adding more layers. Each point is a valid, working system.
dnf install in postinstall.sh. Snapshot. Done.zfs send to replicate across sites. Multi-site, multi-cloud, from one snapshot.postinstall.sh is bash. The configs are text files. The services are systemd units.
ZFS datasets are one command. You can audit every step. You can modify every step.
You can build every step yourself. That is the point.