AI Admin Assistant

Build Your Own

AI Admin Assistant — already installed, ready to use.

If you selected the AI profile during install, everything below is already running on your machine. No setup required. No cloud. No API key. No data leaves your hardware.

This is local AI for infrastructure management. Not a chatbot. Not a coding assistant. An AI that understands your specific system — your ZFS pools, your WireGuard mesh, your container fleet — because it reads live system state before every query. It runs on your hardware, on your network, with your data. Nothing leaves the machine. The kldload-ai model is trained on 1,000+ pages of infrastructure documentation and 126 real-world recipes. Combined with live context injection (your actual zpool status, wg show, virsh list output), it gives you answers that are specific to YOUR system, not generic advice from the internet. The "AI for X" pages below each teach a domain-specific workflow: AI for ZFS, eBPF, WireGuard, Docker, KVM, Kubernetes.

What’s already installed

Everything was set up on first boot

When you checked the AI Assistant checkbox during install, kldload automatically:

Installs Ollama on a dedicated ZFS dataset (/srv/ollama, compressed, snapshotable)
Pulls llama3.1:8b (4.9 GB)
Creates kldload-ai — a custom model trained on 1,000+ pages of documentation and 126 recipes crafted from real-world builds — not theory, not abstractions, actual infrastructure someone built and documented
Builds whisper.cpp from source for voice-to-text
Installs 4 commands: kai, kai-voice, kai-do, kai-remote

Requirements: internet on first boot (model download) + 16 GB RAM. The default model is ~5 GB — first boot will take a while depending on your connection. After that, it works fully offline. Ollama supports hundreds of models — you’re free to pull any LLM you want with ollama pull <model>.

What you get

# Ask questions about your system — kai injects live context (kst, zpool, ARC stats)
kai "why is my ARC hit ratio low?"

# Voice control — press Enter, speak, press Enter
kai-voice

# Generate and execute infrastructure commands with confirmation
kai-do "create a new ZFS dataset for postgres with 128k recordsize and compression"

# Query and manage remote hosts over SSH
kai-remote db-server "check disk health and ZFS pool status"

How the model knows your infrastructure

kldload-ai is not a generic chatbot. On first boot, the system:

Scrapes kldload.com documentation (ZFS, WireGuard, eBPF, tools, tutorials)
Embeds it as a system prompt in the model
Knows every kldload tool by name with exact usage
Knows ZFS deeply — pools, snapshots, encryption, ARC tuning, boot environments
Always recommends a snapshot before changes
Gives exact commands, not abstract advice

It's an SRE that read all the docs and never forgets them.

GPU acceleration

If you checked the NVIDIA checkbox during install, Ollama uses the GPU automatically. No configuration needed.

# CPU inference: ~10 tokens/sec (usable but slow)
# GPU inference: ~80+ tokens/sec (conversational)

# Check GPU utilization while kai is running
nvidia-smi

# Multiple AI workloads can share the GPU simultaneously

Upgrade your model

The default is llama3.1:8b (~5 GB). Swap in anything bigger.

# See what you have
ollama list

# Pull a larger or specialized model
ollama pull llama3.1:70b      # much smarter, needs ~40GB RAM
ollama pull codellama          # optimized for code
ollama pull mistral            # fast and capable
ollama pull deepseek-coder     # code generation

# kai automatically uses kldload-ai, but you can test any model directly
ollama run mistral

Reference — manual setup on any Linux system

Don’t have kldload? You can set this up manually on any Linux system. Ollama runs open-source LLMs locally. ZFS gives you the storage backend.

Quick start — Ollama in 5 minutes

Just want to chat with an LLM locally?

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model and start chatting
ollama pull llama3.1:8b
ollama run llama3.1:8b

# That's it. Local AI. No cloud. No API key. No data leaves your machine.

With NVIDIA GPU

# If NVIDIA drivers are installed (see NVIDIA tutorial), Ollama uses the GPU automatically
ollama run llama3.1:8b
# Watch GPU utilization — inference runs on CUDA cores
nvidia-smi

# Multiple models can share the GPU simultaneously
# Run Ollama API + Stable Diffusion + Whisper — all on one GPU

CPU inference: ~10 tokens/sec. GPU inference: ~80+ tokens/sec. The difference between waiting and conversing.

Ollama as an API server

# Ollama exposes an OpenAI-compatible API on port 11434
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Explain ZFS snapshots in one sentence"
}'

# Use it from any app that supports the OpenAI API
# Just point OPENAI_BASE_URL to http://your-server:11434

# Run as a Docker container with GPU sharing (see NVIDIA tutorial)
docker run -d --name ollama --gpus all \
  -p 11434:11434 \
  -v /srv/ollama:/root/.ollama \
  ollama/ollama

8B models

8GB RAM
General chat, coding, analysis
Fast on CPU, instant on GPU

13B–34B models

16–32GB RAM
Complex reasoning, long context
GPU recommended

70B+ models

64GB+ RAM or 24GB+ VRAM
Near-GPT-4 quality
GPU required

The full recipe — AI infrastructure admin

Step 1: Install Ollama

#!/bin/bash
# postinstall-ai-admin.sh

# Create a ZFS dataset for models (compressed, snapshotable)
kdir -o compression=zstd -o recordsize=1M /srv/ollama

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Configure Ollama to use ZFS-backed storage
mkdir -p /etc/systemd/system/ollama.service.d
cat > /etc/systemd/system/ollama.service.d/override.conf <<EOF
[Service]
Environment="OLLAMA_MODELS=/srv/ollama/models"
Environment="OLLAMA_HOST=0.0.0.0:11434"
EOF

systemctl daemon-reload
systemctl enable --now ollama

Models stored on ZFS = snapshotable, compressible, replicable. Snapshot before fine-tuning. Roll back if the model degrades. Clone to test variants.

Step 2: Pull a model

# Pull a capable model — llama3.1 8B is a good starting point
ollama pull llama3.1:8b

# For more capability (needs 16GB+ RAM)
ollama pull llama3.1:70b

# For code-focused tasks
ollama pull codellama:13b

# Snapshot the clean model state
ksnap /srv/ollama

Step 3: Create your infrastructure Modelfile

This is where it gets powerful. You create a custom model persona that knows your infrastructure:

# /srv/ollama/Modelfile.infra-admin
FROM llama3.1:8b

SYSTEM """
You are an infrastructure administration assistant for a kldload-based
ZFS-on-root Linux environment. You help diagnose issues, suggest tuning,
and automate common tasks.

You know:
- ZFS: pools, datasets, snapshots, replication, ARC tuning, scrubs
- systemd: services, timers, journal, unit files
- Networking: WireGuard, nftables, NetworkManager
- Storage: RAID-Z, mirrors, special vdevs, SLOG, L2ARC
- kldload tools: kst, ksnap, kbe, kdf, kdir, kpkg, kupgrade, krecovery

When diagnosing issues:
1. Ask for relevant output (zpool status, kst, journalctl)
2. Identify the root cause
3. Suggest the fix with exact commands
4. Warn about risks before destructive operations
5. Always recommend a snapshot before changes

Current environment:
- Distro: CentOS Stream 9
- Pool: rpool (ZFS on root)
- Boot: ZFSBootMenu
- Tools: kldload CLI suite
"""

PARAMETER temperature 0.3
PARAMETER num_ctx 8192

# Build the custom model
ollama create infra-admin -f /srv/ollama/Modelfile.infra-admin

# Test it
ollama run infra-admin "my zpool status shows a DEGRADED vdev, what should I do?"

Step 4: Feed it live system data

The real power: pipe your actual system state into the LLM and let it analyze:

#!/bin/bash
# ai-diagnose.sh — pipe system state to the AI assistant

CONTEXT=$(cat <<EOF
=== SYSTEM STATUS ===
$(kst 2>/dev/null)

=== POOL STATUS ===
$(zpool status 2>/dev/null)

=== RECENT ERRORS ===
$(journalctl -p err --since "1 hour ago" --no-pager 2>/dev/null | tail -20)

=== DISK HEALTH ===
$(smartctl -H /dev/vda 2>/dev/null)

=== MEMORY ===
$(free -h)

=== ARC STATS ===
$(cat /proc/spl/kstat/zfs/arcstats 2>/dev/null | grep -E "^size|^c |^hits|^misses")
EOF
)

echo "${CONTEXT}

Based on the above, are there any issues I should address? What optimizations would you recommend?" | \
    ollama run infra-admin

Instead of reading 50 lines of output yourself, the AI reads it and tells you what matters. "Your ARC hit rate is 72% — you should increase zfs_arc_max. Your last scrub was 45 days ago — schedule one. Disk vda has 3 reallocated sectors — monitor it."

Step 5: Automate with cron

# Daily health check — AI reviews your infrastructure every morning
# crontab -e
0 6 * * * /usr/local/bin/ai-diagnose.sh > /var/log/ai-health-report.txt 2>&1

# Weekly deep analysis
0 8 * * 1 /usr/local/bin/ai-deep-analysis.sh | mail -s "Weekly AI Infra Report" admin@example.com

Step 6: Interactive terminal assistant

# Add to .bashrc — type 'ai' to get help anytime
ai() {
    local question="$*"
    local context="$(kst 2>/dev/null; echo '---'; zpool status 2>/dev/null)"
    echo -e "Current system state:\n${context}\n\nQuestion: ${question}" | \
        ollama run infra-admin
}

# Usage:
ai "how do I add a mirror to my pool?"
ai "what recordsize should I use for postgres?"
ai "my ARC hit rate is low, what should I tune?"
ai "create a snapshot schedule for /srv/database"

Type 'ai' followed by any question. It sees your actual system state and gives answers specific to YOUR infrastructure. Not generic docs. YOUR pool. YOUR datasets. YOUR memory.

Why ZFS makes this better

Snapshot before fine-tuning

Training a custom model? ksnap /srv/ollama first. If the fine-tuned model is worse, ksnap rollback /srv/ollama. Instant. Try that with ext4.

Clone models for testing

kclone /srv/ollama /srv/ollama-experiment. Test a different system prompt. Compare outputs. Zero extra disk space until the models diverge.

Replicate to other nodes

zfs send rpool/srv/ollama@trained | ssh node-2 zfs recv rpool/srv/ollama. Your trained AI admin assistant, deployed to every node in your fleet. Block-level replication. Only changed data transferred.

Compressed model storage

LLM model files are large but compress well. compression=zstd on the dataset typically saves 15-25%. A 7GB model takes 5.5GB on disk. Free performance.

Advanced: self-healing infrastructure

The AI that fixes things while you sleep

#!/bin/bash
# ai-auto-heal.sh — AI reviews and acts on critical issues
# Run via cron or systemd timer — WITH CAUTION

STATUS=$(zpool status -x 2>/dev/null)

if [[ "$STATUS" != "all pools are healthy" ]]; then
    # Ask the AI what to do
    RESPONSE=$(echo "zpool status output: ${STATUS}

    Is this critical? What's the safest remediation?
    Respond with ONLY a bash command if safe to run, or ALERT if human needed." | \
        ollama run infra-admin)

    if echo "$RESPONSE" | grep -q "^ALERT"; then
        # AI says human needed — send notification
        echo "$RESPONSE" | mail -s "ALERT: ZFS pool issue" admin@example.com
    else
        # AI suggests a safe command — log and execute
        echo "$(date): AI auto-heal: $RESPONSE" >> /var/log/ai-actions.log
        # Uncomment below to actually execute (use with extreme caution)
        # eval "$RESPONSE"
    fi
fi

Start with alerts only. Read the AI's suggestions for a few weeks. When you trust it, let it run the safe ones. Never let it run destructive commands without human approval. The AI is an assistant, not an operator — yet.

All of this runs locally. No cloud. No API keys. No data leaves your machine. Ollama runs the model on your hardware. Your logs, your configs, your infrastructure data — none of it touches the internet. The AI assistant is as air-gapped as your kldload install.

⚠ A note about responsibility
kldload-ai gives exact commands, not abstract advice. It will confirm before running anything destructive. But it’s an 8b model running on your hardware — read the command before you hit yes. We lost a Proxmox node in testing. BSD-3-Clause means you own everything, including the consequences. “Don’t blame the tools. Read before you confirm. We warned you.”

← Quick health check with kst Train AI on Your Infrastructure — turn a generic LLM into YOUR sysadmin. →