AI Admin Assistant — already installed, ready to use.
If you selected the AI profile during install, everything below is already running on your machine. No setup required. No cloud. No API key. No data leaves your hardware.
zpool status, wg show, virsh list output), it gives you answers that are specific to YOUR system, not generic advice from the internet. The "AI for X" pages below each teach a domain-specific workflow: AI for ZFS, eBPF, WireGuard, Docker, KVM, Kubernetes.What’s already installed
Everything was set up on first boot
When you checked the AI Assistant checkbox during install, kldload automatically:
- Installs Ollama on a dedicated ZFS dataset (
/srv/ollama, compressed, snapshotable) - Pulls llama3.1:8b (4.9 GB)
- Creates kldload-ai — a custom model trained on 1,000+ pages of documentation and 126 recipes crafted from real-world builds — not theory, not abstractions, actual infrastructure someone built and documented
- Builds whisper.cpp from source for voice-to-text
- Installs 4 commands:
kai,kai-voice,kai-do,kai-remote
ollama pull <model>.What you get
# Ask questions about your system — kai injects live context (kst, zpool, ARC stats)
kai "why is my ARC hit ratio low?"
# Voice control — press Enter, speak, press Enter
kai-voice
# Generate and execute infrastructure commands with confirmation
kai-do "create a new ZFS dataset for postgres with 128k recordsize and compression"
# Query and manage remote hosts over SSH
kai-remote db-server "check disk health and ZFS pool status"
How the model knows your infrastructure
kldload-ai is not a generic chatbot. On first boot, the system:
- Scrapes kldload.com documentation (ZFS, WireGuard, eBPF, tools, tutorials)
- Embeds it as a system prompt in the model
- Knows every kldload tool by name with exact usage
- Knows ZFS deeply — pools, snapshots, encryption, ARC tuning, boot environments
- Always recommends a snapshot before changes
- Gives exact commands, not abstract advice
GPU acceleration
If you checked the NVIDIA checkbox during install, Ollama uses the GPU automatically. No configuration needed.
# CPU inference: ~10 tokens/sec (usable but slow)
# GPU inference: ~80+ tokens/sec (conversational)
# Check GPU utilization while kai is running
nvidia-smi
# Multiple AI workloads can share the GPU simultaneously
Upgrade your model
The default is llama3.1:8b (~5 GB). Swap in anything bigger.
# See what you have
ollama list
# Pull a larger or specialized model
ollama pull llama3.1:70b # much smarter, needs ~40GB RAM
ollama pull codellama # optimized for code
ollama pull mistral # fast and capable
ollama pull deepseek-coder # code generation
# kai automatically uses kldload-ai, but you can test any model directly
ollama run mistral
Reference — manual setup on any Linux system
Don’t have kldload? You can set this up manually on any Linux system. Ollama runs open-source LLMs locally. ZFS gives you the storage backend.
Quick start — Ollama in 5 minutes
Just want to chat with an LLM locally?
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model and start chatting
ollama pull llama3.1:8b
ollama run llama3.1:8b
# That's it. Local AI. No cloud. No API key. No data leaves your machine.
With NVIDIA GPU
# If NVIDIA drivers are installed (see NVIDIA tutorial), Ollama uses the GPU automatically
ollama run llama3.1:8b
# Watch GPU utilization — inference runs on CUDA cores
nvidia-smi
# Multiple models can share the GPU simultaneously
# Run Ollama API + Stable Diffusion + Whisper — all on one GPU
Ollama as an API server
# Ollama exposes an OpenAI-compatible API on port 11434
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "Explain ZFS snapshots in one sentence"
}'
# Use it from any app that supports the OpenAI API
# Just point OPENAI_BASE_URL to http://your-server:11434
# Run as a Docker container with GPU sharing (see NVIDIA tutorial)
docker run -d --name ollama --gpus all \
-p 11434:11434 \
-v /srv/ollama:/root/.ollama \
ollama/ollama
8B models
8GB RAM
General chat, coding, analysis
Fast on CPU, instant on GPU
13B–34B models
16–32GB RAM
Complex reasoning, long context
GPU recommended
70B+ models
64GB+ RAM or 24GB+ VRAM
Near-GPT-4 quality
GPU required
The full recipe — AI infrastructure admin
Step 1: Install Ollama
#!/bin/bash
# postinstall-ai-admin.sh
# Create a ZFS dataset for models (compressed, snapshotable)
kdir -o compression=zstd -o recordsize=1M /srv/ollama
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Configure Ollama to use ZFS-backed storage
mkdir -p /etc/systemd/system/ollama.service.d
cat > /etc/systemd/system/ollama.service.d/override.conf <<EOF
[Service]
Environment="OLLAMA_MODELS=/srv/ollama/models"
Environment="OLLAMA_HOST=0.0.0.0:11434"
EOF
systemctl daemon-reload
systemctl enable --now ollama
Step 2: Pull a model
# Pull a capable model — llama3.1 8B is a good starting point
ollama pull llama3.1:8b
# For more capability (needs 16GB+ RAM)
ollama pull llama3.1:70b
# For code-focused tasks
ollama pull codellama:13b
# Snapshot the clean model state
ksnap /srv/ollama
Step 3: Create your infrastructure Modelfile
This is where it gets powerful. You create a custom model persona that knows your infrastructure:
# /srv/ollama/Modelfile.infra-admin
FROM llama3.1:8b
SYSTEM """
You are an infrastructure administration assistant for a kldload-based
ZFS-on-root Linux environment. You help diagnose issues, suggest tuning,
and automate common tasks.
You know:
- ZFS: pools, datasets, snapshots, replication, ARC tuning, scrubs
- systemd: services, timers, journal, unit files
- Networking: WireGuard, nftables, NetworkManager
- Storage: RAID-Z, mirrors, special vdevs, SLOG, L2ARC
- kldload tools: kst, ksnap, kbe, kdf, kdir, kpkg, kupgrade, krecovery
When diagnosing issues:
1. Ask for relevant output (zpool status, kst, journalctl)
2. Identify the root cause
3. Suggest the fix with exact commands
4. Warn about risks before destructive operations
5. Always recommend a snapshot before changes
Current environment:
- Distro: CentOS Stream 9
- Pool: rpool (ZFS on root)
- Boot: ZFSBootMenu
- Tools: kldload CLI suite
"""
PARAMETER temperature 0.3
PARAMETER num_ctx 8192
# Build the custom model
ollama create infra-admin -f /srv/ollama/Modelfile.infra-admin
# Test it
ollama run infra-admin "my zpool status shows a DEGRADED vdev, what should I do?"
Step 4: Feed it live system data
The real power: pipe your actual system state into the LLM and let it analyze:
#!/bin/bash
# ai-diagnose.sh — pipe system state to the AI assistant
CONTEXT=$(cat <<EOF
=== SYSTEM STATUS ===
$(kst 2>/dev/null)
=== POOL STATUS ===
$(zpool status 2>/dev/null)
=== RECENT ERRORS ===
$(journalctl -p err --since "1 hour ago" --no-pager 2>/dev/null | tail -20)
=== DISK HEALTH ===
$(smartctl -H /dev/vda 2>/dev/null)
=== MEMORY ===
$(free -h)
=== ARC STATS ===
$(cat /proc/spl/kstat/zfs/arcstats 2>/dev/null | grep -E "^size|^c |^hits|^misses")
EOF
)
echo "${CONTEXT}
Based on the above, are there any issues I should address? What optimizations would you recommend?" | \
ollama run infra-admin
Step 5: Automate with cron
# Daily health check — AI reviews your infrastructure every morning
# crontab -e
0 6 * * * /usr/local/bin/ai-diagnose.sh > /var/log/ai-health-report.txt 2>&1
# Weekly deep analysis
0 8 * * 1 /usr/local/bin/ai-deep-analysis.sh | mail -s "Weekly AI Infra Report" admin@example.com
Step 6: Interactive terminal assistant
# Add to .bashrc — type 'ai' to get help anytime
ai() {
local question="$*"
local context="$(kst 2>/dev/null; echo '---'; zpool status 2>/dev/null)"
echo -e "Current system state:\n${context}\n\nQuestion: ${question}" | \
ollama run infra-admin
}
# Usage:
ai "how do I add a mirror to my pool?"
ai "what recordsize should I use for postgres?"
ai "my ARC hit rate is low, what should I tune?"
ai "create a snapshot schedule for /srv/database"
Why ZFS makes this better
Snapshot before fine-tuning
Training a custom model? ksnap /srv/ollama first.
If the fine-tuned model is worse, ksnap rollback /srv/ollama. Instant.
Try that with ext4.
Clone models for testing
kclone /srv/ollama /srv/ollama-experiment.
Test a different system prompt. Compare outputs.
Zero extra disk space until the models diverge.
Replicate to other nodes
zfs send rpool/srv/ollama@trained | ssh node-2 zfs recv rpool/srv/ollama.
Your trained AI admin assistant, deployed to every node in your fleet.
Block-level replication. Only changed data transferred.
Compressed model storage
LLM model files are large but compress well.
compression=zstd on the dataset typically saves 15-25%.
A 7GB model takes 5.5GB on disk. Free performance.
Advanced: self-healing infrastructure
The AI that fixes things while you sleep
#!/bin/bash
# ai-auto-heal.sh — AI reviews and acts on critical issues
# Run via cron or systemd timer — WITH CAUTION
STATUS=$(zpool status -x 2>/dev/null)
if [[ "$STATUS" != "all pools are healthy" ]]; then
# Ask the AI what to do
RESPONSE=$(echo "zpool status output: ${STATUS}
Is this critical? What's the safest remediation?
Respond with ONLY a bash command if safe to run, or ALERT if human needed." | \
ollama run infra-admin)
if echo "$RESPONSE" | grep -q "^ALERT"; then
# AI says human needed — send notification
echo "$RESPONSE" | mail -s "ALERT: ZFS pool issue" admin@example.com
else
# AI suggests a safe command — log and execute
echo "$(date): AI auto-heal: $RESPONSE" >> /var/log/ai-actions.log
# Uncomment below to actually execute (use with extreme caution)
# eval "$RESPONSE"
fi
fi
kldload-ai gives exact commands, not abstract advice. It will confirm before running anything destructive. But it’s an 8b model running on your hardware — read the command before you hit yes. We lost a Proxmox node in testing. BSD-3-Clause means you own everything, including the consequences. “Don’t blame the tools. Read before you confirm. We warned you.”