| your Linux construction kit
Source
← Back to AI Admin Assistant

AI for Docker & Container Management — a local model that knows your containers.

Generic LLMs know Docker syntax. This model knows your running containers, your Compose stacks, your network topology, your volume mounts, and your resource consumption. It reads docker ps and docker stats before every answer. It sees which containers are restarting. It knows which images are stale. It generates Compose files from a description and audits running containers for security issues.

On kldloadOS, Docker volumes live on ZFS datasets. That means every volume is snapshotable, cloneable, and replicable. The AI knows this and uses it.

1. The Docker Modelfile

This system prompt encodes Docker, Compose, networking, volumes, registries, security best practices, and the ZFS-backed storage layer that kldloadOS provides underneath.

Complete Docker expert Modelfile

# /srv/ollama/Modelfile.docker-expert
FROM llama3.1:8b

SYSTEM """
You are a Docker and container management expert for this kldload-based infrastructure.
You give precise commands, reference actual container names from context, and always
recommend snapshots (ksnap) before destructive operations on volumes.

=== CONTAINER LIFECYCLE ===
List running:           docker ps
List all:               docker ps -a
Start/stop/restart:     docker start|stop|restart CONTAINER
Remove container:       docker rm CONTAINER (add -f for running)
Remove all stopped:     docker container prune
Logs:                   docker logs -f --tail=100 CONTAINER
Exec into container:    docker exec -it CONTAINER /bin/sh
Inspect:                docker inspect CONTAINER
Stats (live):           docker stats --no-stream
Top (processes):        docker top CONTAINER

=== IMAGES ===
Pull image:             docker pull IMAGE:TAG
List images:            docker images
Remove image:           docker rmi IMAGE
Remove dangling:        docker image prune
Remove all unused:      docker image prune -a
Build from Dockerfile:  docker build -t NAME:TAG .
Tag for registry:       docker tag NAME:TAG registry.local:5000/NAME:TAG
Push to registry:       docker push registry.local:5000/NAME:TAG

=== COMPOSE ===
Start stack:            docker compose up -d
Stop stack:             docker compose down
Rebuild and start:      docker compose up -d --build
View logs:              docker compose logs -f SERVICE
Scale service:          docker compose up -d --scale web=3
Override file:          docker compose -f compose.yml -f compose.prod.yml up -d
Environment:            Use .env file or environment: section in compose.yml
Depends-on:             Use depends_on with condition: service_healthy for startup order
Health checks:          healthcheck: test, interval, timeout, retries in compose.yml

=== NETWORKING ===
List networks:          docker network ls
Inspect network:        docker network inspect NETWORK
Create network:         docker network create --driver bridge mynet
Connect container:      docker network connect mynet CONTAINER
Macvlan (LAN access):   docker network create -d macvlan --subnet=192.168.1.0/24 --gateway=192.168.1.1 -o parent=eth0 lannet
DNS:                    Containers on same user-defined network resolve by name
Port mapping:           -p HOST:CONTAINER or ports: in compose
No host networking:     --network=host bypasses isolation. Avoid in production.

=== VOLUMES (ZFS-BACKED) ===
Docker on kldloadOS uses ZFS storage driver. Each container layer is a ZFS dataset.
List volumes:           docker volume ls
Create volume:          docker volume create mydata
Inspect volume:         docker volume inspect mydata
Remove volume:          docker volume rm mydata (ksnap the dataset first!)
ZFS advantage:          Volumes live at rpool/docker/volumes/VOLUME_NAME
Snapshot volume:        zfs snapshot rpool/docker/volumes/mydata@before-upgrade
Clone volume:           zfs clone rpool/docker/volumes/mydata@snap rpool/docker/volumes/mydata-test
Rollback volume:        zfs rollback rpool/docker/volumes/mydata@before-upgrade
Replicate volume:       syncoid rpool/docker/volumes/mydata root@backup:tank/docker/volumes/mydata

=== PRIVATE REGISTRY ===
Run local registry:     docker run -d -p 5000:5000 -v registry:/var/lib/registry --name registry registry:2
Tag for local:          docker tag myapp:latest localhost:5000/myapp:latest
Push to local:          docker push localhost:5000/myapp:latest
Pull from local:        docker pull localhost:5000/myapp:latest
Registry on ZFS:        Volume is a dataset — snapshot the entire registry, clone for testing
TLS for registry:       Mount certs, set REGISTRY_HTTP_TLS_CERTIFICATE and KEY

=== SECURITY ===
Never run as root:      USER nonroot in Dockerfile, or user: "1000:1000" in compose
No privileged:          --privileged gives full host access. Almost never needed.
Read-only rootfs:       --read-only with tmpfs for /tmp
Drop capabilities:      --cap-drop=ALL --cap-add=NET_BIND_SERVICE (add only what's needed)
No new privileges:      --security-opt=no-new-privileges
Seccomp profile:        --security-opt seccomp=profile.json
Scan images:            trivy image myapp:latest (find vulnerabilities)
Limit resources:        --memory=512m --cpus=1.0 (prevent resource exhaustion)

=== TROUBLESHOOTING ===
Container crash loop:   docker logs CONTAINER | tail -50 (read the error)
                        docker inspect CONTAINER | grep -A5 State (check exit code)
OOM killed:             docker inspect CONTAINER | grep OOMKilled
                        Increase --memory or optimize the application
Network issues:         docker exec CONTAINER ping OTHER_CONTAINER
                        docker network inspect NETWORK (check subnet, gateway)
Disk full:              docker system df (show disk usage by component)
                        docker system prune (remove unused data)
                        kdf (check ZFS dataset usage)
Slow container:         docker stats CONTAINER (check CPU, memory, I/O)
                        docker top CONTAINER (check processes)

=== PHILOSOPHY ===
Snapshot before destructive operations. ksnap is instant. Rebuilding from scratch is not.
Use Compose for everything. Single docker run commands are not reproducible.
Pin image tags. :latest is not a deployment strategy.
One process per container. If you need cron + app, use two containers.
Logs to stdout. Let Docker handle log rotation and collection.
ZFS under Docker means every layer, volume, and image is checksummed and compressible.
"""

PARAMETER temperature 0.3
PARAMETER num_ctx 16384
# Build the Docker expert model
ollama create docker-expert -f /srv/ollama/Modelfile.docker-expert

# Verify it
ollama run docker-expert "How do I set up a private registry with TLS on ZFS?"
A sysadmin who memorized every Docker man page is useful. A sysadmin who also reads docker ps before answering is indispensable. This model does both.

2. Live context script

The Modelfile gives the AI Docker expertise. The context script gives it your Docker state. Every query includes running containers, resource usage, networks, volumes, and recent events.

kai-docker — query with live container state

#!/bin/bash
# /usr/local/bin/kai-docker — query the Docker AI with live container context

build_docker_context() {
    echo "=== LIVE DOCKER STATE ($(date -Iseconds)) ==="

    echo -e "\n--- Running containers ---"
    docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}" 2>/dev/null

    echo -e "\n--- All containers (including stopped) ---"
    docker ps -a --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Size}}" 2>/dev/null

    echo -e "\n--- Resource usage ---"
    docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}\t{{.BlockIO}}" 2>/dev/null

    echo -e "\n--- Images ---"
    docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}\t{{.CreatedSince}}" 2>/dev/null

    echo -e "\n--- Networks ---"
    docker network ls --format "table {{.Name}}\t{{.Driver}}\t{{.Scope}}" 2>/dev/null

    echo -e "\n--- Volumes ---"
    docker volume ls --format "table {{.Name}}\t{{.Driver}}" 2>/dev/null

    echo -e "\n--- Docker disk usage ---"
    docker system df 2>/dev/null

    echo -e "\n--- Recent events (last 30 min) ---"
    docker events --since 30m --until 0s --format '{{.Time}} {{.Action}} {{.Actor.Attributes.name}}' 2>/dev/null | tail -20

    echo -e "\n--- ZFS Docker datasets ---"
    zfs list -r rpool/docker -o name,used,avail,compressratio 2>/dev/null | head -30

    echo -e "\n--- Restart counts (crash detection) ---"
    docker inspect $(docker ps -aq 2>/dev/null) --format '{{.Name}} restarts={{.RestartCount}} exitCode={{.State.ExitCode}}' 2>/dev/null | \
        grep -v 'restarts=0 exitCode=0'
}

QUESTION="$*"
if [ -z "$QUESTION" ]; then
    echo "Usage: kai-docker <question>"
    echo ""
    echo "Examples:"
    echo "  kai-docker 'why is my nginx container restarting?'"
    echo "  kai-docker 'which container is using the most memory?'"
    echo "  kai-docker 'write a compose file for wordpress + mariadb + redis'"
    echo "  kai-docker 'audit my running containers for security issues'"
    echo "  kai-docker 'set up a private registry on ZFS'"
    exit 1
fi

CONTEXT=$(build_docker_context)

echo -e "${CONTEXT}\n\n=== QUESTION ===\n${QUESTION}" | ollama run docker-expert
Asking "why is my container crashing?" without docker ps is like asking "why does my car make noise?" without opening the hood. This script opens the hood first, every time.

3. ZFS-backed Docker

On kldloadOS, Docker uses the ZFS storage driver. Every container layer is a ZFS dataset. Every volume is a dataset. That means you get snapshots, clones, compression, checksums, and send/recv on every piece of container data. For free.

ZFS + Docker workflow

# Snapshot a volume before upgrading the database
ksnap /var/lib/docker/volumes/postgres_data
docker compose pull db
docker compose up -d db

# Something went wrong? Rollback in seconds
docker compose stop db
zfs rollback rpool/docker/volumes/postgres_data@auto-2026-03-23-1400
docker compose start db

# Clone a production volume for testing (instant, zero disk cost)
zfs snapshot rpool/docker/volumes/postgres_data@test-clone
zfs clone rpool/docker/volumes/postgres_data@test-clone rpool/docker/volumes/postgres_data_test
# Now mount postgres_data_test as a volume in a test container

# Replicate container volumes to another node
syncoid rpool/docker/volumes/postgres_data root@node2:rpool/docker/volumes/postgres_data

# Check compression savings
zfs get compressratio rpool/docker
# NAME           PROPERTY       VALUE  SOURCE
# rpool/docker   compressratio  2.41x  -
overlay2 is a filing cabinet. ZFS under Docker is a filing cabinet with a time machine, a photocopier, and a courier service built in. Same files, completely different capabilities.

4. Auto-generate Compose files

Describe what you want in plain English. The AI writes the docker-compose.yml. It knows your network topology, your ZFS volumes, your existing containers, and it generates files that actually work on your machine.

Compose generation examples

# "Give me a web app stack with nginx, Node.js, PostgreSQL, and Redis"
kai-docker "write a docker-compose.yml for nginx reverse proxy, \
node.js app on port 3000, postgresql with ZFS volume, and redis for caching. \
Use health checks. Pin all image versions."

# "Set up Gitea with PostgreSQL"
kai-docker "compose file for Gitea git server with PostgreSQL backend. \
Store repos on a ZFS volume. Expose on port 3000 and SSH on 2222."

# "Private registry with authentication"
kai-docker "compose file for a Docker registry with htpasswd auth and TLS. \
Store images on a ZFS volume. Include the htpasswd generation commands."
You don't hire a contractor and hand them a blank blueprint. You describe what you want and they draw it. The AI draws Compose files the same way — from your description, for your infrastructure.

5. Container security audit

The AI inspects every running container and flags security issues: privileged mode, containers running as root, exposed ports, missing resource limits, host network access, writable rootfs. It gives you the exact fix for each finding.

Security audit script

#!/bin/bash
# /usr/local/bin/kai-docker-audit — AI security audit of running containers

AUDIT=$(cat <<AUDIT_DATA
=== DOCKER SECURITY AUDIT — $(hostname) — $(date) ===

--- Container Security Details ---
$(for c in $(docker ps -q 2>/dev/null); do
    echo "=== $(docker inspect "$c" --format '{{.Name}}') ==="
    docker inspect "$c" --format '
  Image:        {{.Config.Image}}
  User:         {{.Config.User}}
  Privileged:   {{.HostConfig.Privileged}}
  ReadonlyRoot: {{.HostConfig.ReadonlyRootfs}}
  NetworkMode:  {{.HostConfig.NetworkMode}}
  PidMode:      {{.HostConfig.PidMode}}
  CapAdd:       {{.HostConfig.CapAdd}}
  CapDrop:      {{.HostConfig.CapDrop}}
  SecurityOpt:  {{.HostConfig.SecurityOpt}}
  Memory:       {{.HostConfig.Memory}}
  CPUs:         {{.HostConfig.NanoCpus}}
  Ports:        {{range $p, $b := .NetworkSettings.Ports}}{{$p}}->{{$b}} {{end}}
  Mounts:       {{range .Mounts}}{{.Type}}:{{.Source}}->{{.Destination}}({{.Mode}}) {{end}}
  RestartPolicy:{{.HostConfig.RestartPolicy.Name}}'
    echo ""
done)
AUDIT_DATA
)

echo "$AUDIT" | ollama run docker-expert \
    "Analyze this security audit. For each container, report:
1. CRITICAL — privileged mode, host network/PID, running as root
2. WARNING — no memory limit, no CPU limit, writable rootfs, excessive capabilities
3. INFO — good practices already in place
4. FIX — exact docker run or compose changes to remediate each finding
Be specific. Reference container names."
A security audit by hand means running docker inspect on every container and reading JSON. This script does that, then hands the JSON to an expert that reads faster than you do.

6. Container log analysis

Feed container logs to the AI for pattern analysis. It finds errors you missed, correlates timestamps across containers, and identifies the root cause of cascading failures.

Log analysis

# Analyze a crashing container's logs
docker logs --tail 200 my-app 2>&1 | \
    ollama run docker-expert "Analyze these container logs. \
    Find errors, warnings, and patterns. Suggest fixes."

# Correlate logs across a Compose stack
docker compose logs --tail=100 --no-color 2>&1 | \
    ollama run docker-expert "These are logs from multiple containers \
    in a Compose stack. Correlate timestamps. Find the root cause \
    of any errors. Which container failed first?"

# Monitor and alert
docker logs -f my-app 2>&1 | tail -50 | \
    ollama run docker-expert "Any critical errors in these last 50 lines?"

7. Fleet replication via syncoid

Train the Docker expert on one node. Replicate the model — and your entire Docker volume state — to every node in the fleet. Same knowledge everywhere. Same container data replicated via ZFS send/recv.

Replicate Docker AI and volumes across fleet

#!/bin/bash
# replicate-docker-ai.sh — push Docker model + volumes to all nodes

NODES="node-2 node-3 node-4"

# Replicate the Ollama model
zfs snapshot rpool/srv/ollama@docker-expert-$(date +%F)
for node in $NODES; do
    syncoid --no-sync-snap rpool/srv/ollama "root@${node}:rpool/srv/ollama"
done

# Replicate Docker volumes (e.g., shared registry)
zfs snapshot rpool/docker/volumes/registry@sync-$(date +%F)
for node in $NODES; do
    syncoid --no-sync-snap rpool/docker/volumes/registry "root@${node}:rpool/docker/volumes/registry"
done

# Deploy the kai-docker script everywhere
for node in $NODES; do
    scp /usr/local/bin/kai-docker "root@${node}:/usr/local/bin/kai-docker"
    ssh "root@${node}" "chmod +x /usr/local/bin/kai-docker && systemctl restart ollama"
done
One brain, many bodies. The Docker expert on node-2 knows exactly as much as the one on node-4. But each one reads its own docker ps, its own stats, its own logs.

Containers are supposed to make things simpler. Then you have 47 of them, 3 Compose files, a private registry, 2 custom networks, and a volume that someone mounted from the host 6 months ago and nobody remembers why. The AI does not judge. It reads docker inspect, finds the answer, and tells you.

ZFS underneath means every container mistake is reversible. Snapshot before you pull. Clone before you test. Send/recv your volumes to backup. The containers are ephemeral. Your data doesn't have to be.