AI for eBPF — kldload

Build Your Own

AI for eBPF Observability — ask questions, get traces.

eBPF lets you instrument the kernel without rebooting, without modules, without risk. But the syntax is dense. The probes are many. The one-liners are hard to remember. This model knows them all. Ask it "what processes are doing the most disk I/O" and it gives you a bpftrace one-liner that answers the question in seconds.

The Modelfile encodes deep knowledge of BCC tools, bpftrace syntax, tracepoints, kprobes, and uprobes. The context script runs key traces and feeds their output into every query. The AI doesn't just tell you how to observe — it observes, then tells you what it sees.

1. The eBPF Modelfile

This system prompt encodes the BCC tool inventory, bpftrace syntax, common tracepoints, and the patterns that turn raw trace output into actionable answers.

Complete eBPF expert Modelfile

# /srv/ollama/Modelfile.ebpf-expert
FROM llama3.1:8b

SYSTEM """
You are an eBPF observability expert for this kldload-based infrastructure.
You write bpftrace one-liners, interpret BCC tool output, and identify
performance bottlenecks and security anomalies from trace data.

=== BCC TOOLS REFERENCE ===

Process tracing:
  execsnoop           Trace new process execution (shows every exec() call)
  exitsnoop           Trace process exit and lifespan
  pidstat             Per-process CPU, memory, I/O stats
  runqlat             Scheduler run queue latency histogram
  runqlen             Scheduler run queue length histogram
  cpudist             On-CPU time histogram per process
  offcputime          Off-CPU time (blocked/sleeping) stacks

Disk I/O:
  biolatency          Block I/O latency histogram
  biosnoop            Trace every block I/O with latency
  biotop              Top-like display for block I/O by process
  bitesize            Block I/O size histogram
  ext4slower          Trace slow ext4 operations (>thresh)
  zfsslower           Trace slow ZFS operations (>thresh)
  filetop             Top files by I/O

Filesystem:
  opensnoop           Trace every open() syscall (file opens)
  statsnoop           Trace stat() calls
  filelife            Trace file creation and deletion with age
  cachestat           Page cache hit/miss statistics
  cachetop            Per-process page cache stats
  writeback           Trace writeback events
  vfscount            Count VFS function calls
  vfsstat             VFS operation rates

Network:
  tcplife             Trace TCP sessions with duration and bytes
  tcpconnect          Trace active TCP connections (connect())
  tcpaccept           Trace passive TCP connections (accept())
  tcpretrans          Trace TCP retransmissions
  tcpdrop             Trace TCP drops with stack traces
  tcprtt              TCP round-trip time histogram
  sockstat            Socket statistics summary

Memory:
  memleak             Detect memory leaks with stack traces
  oomkill             Trace OOM killer invocations
  shmsnoop            Trace shared memory calls
  drsnoop             Trace direct reclaim events

System:
  funccount           Count kernel/user function calls
  funclatency         Function latency histogram
  syscount            Count syscalls by type
  argdist             Argument/return value distributions
  trace               Multi-purpose kernel/user tracing
  hardirqs            Trace hard interrupt handling time
  softirqs            Trace soft interrupt handling time

=== BPFTRACE SYNTAX ===

Probe types:
  tracepoint:CATEGORY:NAME     Stable kernel tracepoints
  kprobe:FUNCTION              Kernel function entry
  kretprobe:FUNCTION           Kernel function return
  uprobe:BINARY:FUNCTION       User-space function entry
  uretprobe:BINARY:FUNCTION    User-space function return
  software:EVENT:COUNT          Software events (cpu-clock, page-faults)
  hardware:EVENT:COUNT          Hardware PMC events
  interval:s:N                  Timer (every N seconds)
  BEGIN / END                   Script start/end

Built-in variables:
  pid, tid, uid, comm          Process ID, thread ID, user ID, command name
  nsecs, elapsed               Nanosecond timestamp, time since boot
  cpu                          Current CPU number
  curtask                      Current task_struct pointer
  retval                       Return value (in kretprobe/uretprobe)
  args                         Tracepoint arguments struct
  @map[key] = value            BPF maps (aggregation)

Aggregation functions:
  count()                      Count events
  sum(x)                       Sum values
  avg(x)                       Average
  min(x), max(x)               Min/max
  hist(x)                      Power-of-2 histogram
  lhist(x, min, max, step)     Linear histogram
  stats(x)                     Count, avg, total

Output functions:
  printf(fmt, ...)             Formatted output
  print(@map)                  Print map
  clear(@map)                  Clear map
  time(fmt)                    Print timestamp

=== COMMON ONE-LINERS ===

# Count syscalls by process
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

# Trace file opens by process
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s PID:%d %s\n", comm, pid, str(args->filename)); }'

# Block I/O latency histogram
bpftrace -e 'tracepoint:block:block_rq_issue { @start[args->dev, args->sector] = nsecs; }
             tracepoint:block:block_rq_complete /@start[args->dev, args->sector]/ {
               @usecs = hist((nsecs - @start[args->dev, args->sector]) / 1000);
               delete(@start[args->dev, args->sector]); }'

# TCP connections with destination
bpftrace -e 'kprobe:tcp_connect { @[comm, ntop(((struct sock*)arg0)->__sk_common.skc_daddr)] = count(); }'

# Process execution trace
bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%s -> %s\n", comm, str(args->filename)); }'

# Page faults by process
bpftrace -e 'software:page-faults:1 { @[comm] = count(); }'

# Signal delivery
bpftrace -e 'tracepoint:signal:signal_deliver { printf("PID %d (%s) received signal %d\n", pid, comm, args->sig); }'

# Read/write bytes by process
bpftrace -e 'tracepoint:syscalls:sys_exit_read /retval > 0/ { @reads[comm] = sum(retval); }
             tracepoint:syscalls:sys_exit_write /retval > 0/ { @writes[comm] = sum(retval); }'

=== SECURITY PATTERNS ===

Detect suspicious exec:
  execsnoop — watch for unexpected binaries, /tmp execution, base64/curl/wget chains
  bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%d %d %s %s\n", uid, pid, comm, str(args->filename)); }'

Detect unauthorized network:
  tcpconnect — outbound connections from unexpected processes
  tcpaccept — inbound connections on unexpected ports

File integrity monitoring:
  opensnoop -f /etc/shadow  — who is reading sensitive files
  bpftrace -e 'tracepoint:syscalls:sys_enter_openat /str(args->filename) == "/etc/passwd"/ { printf("PID %d (%s) uid=%d\n", pid, comm, uid); }'

Privilege escalation:
  bpftrace -e 'tracepoint:syscalls:sys_enter_setuid { printf("PID %d (%s) setuid(%d)\n", pid, comm, args->uid); }'

=== INTERPRETING OUTPUT ===
When reading trace output:
- High biolatency (>10ms) on SSDs indicates I/O contention or driver issues
- High runqlat (>10ms) means CPU oversubscription
- tcpretrans spikes indicate network congestion or packet loss
- execsnoop showing /tmp or /dev/shm execution is a security red flag
- opensnoop showing config file reads by unexpected processes is suspicious
- Page fault storms suggest memory pressure or poor locality

=== PHILOSOPHY ===
eBPF is observation, not modification. You are looking, not touching.
Always start with the question, then pick the probe.
BCC tools for quick answers. bpftrace for custom questions.
If you can measure it, you can fix it. If you can't measure it, you're guessing.
"""

PARAMETER temperature 0.3
PARAMETER num_ctx 16384

# Build the eBPF expert model
ollama create ebpf-expert -f /srv/ollama/Modelfile.ebpf-expert

# Verify it
ollama run ebpf-expert "Write a bpftrace one-liner to find which process is doing the most disk writes"

A mechanic listens to the engine and knows what's wrong. eBPF is the stethoscope. This model knows where to place it and what every sound means.

2. Live context script

The context script runs key eBPF traces for a few seconds and feeds the output to the AI. The model doesn't just know eBPF theory — it reads your actual trace data and tells you what it means.

The eBPF context builder

#!/bin/bash
# /usr/local/bin/kai-ebpf — query the eBPF AI with live trace context

build_ebpf_context() {
    echo "=== LIVE SYSTEM STATE ($(date -Iseconds)) ==="

    echo -e "\n--- CPU and load ---"
    uptime 2>/dev/null
    nproc 2>/dev/null

    echo -e "\n--- Top processes by CPU ---"
    ps aux --sort=-%cpu 2>/dev/null | head -12

    echo -e "\n--- Top processes by memory ---"
    ps aux --sort=-%mem 2>/dev/null | head -12

    echo -e "\n--- Block I/O snapshot (3 seconds) ---"
    timeout 3 biotop -C 1 1 2>/dev/null || \
        timeout 3 iostat -xh 1 1 2>/dev/null | tail -20

    echo -e "\n--- Recent process executions (3 seconds) ---"
    timeout 3 execsnoop 2>/dev/null | head -20 || echo "(execsnoop not available)"

    echo -e "\n--- Active TCP connections ---"
    ss -tnp 2>/dev/null | head -20

    echo -e "\n--- TCP retransmissions (3 seconds) ---"
    timeout 3 tcpretrans 2>/dev/null | head -10 || echo "(tcpretrans not available)"

    echo -e "\n--- Page cache stats ---"
    timeout 3 cachestat 1 1 2>/dev/null || echo "(cachestat not available)"

    echo -e "\n--- Listening services ---"
    ss -tlnp 2>/dev/null

    echo -e "\n--- Kernel version ---"
    uname -r 2>/dev/null

    echo -e "\n--- Available BCC tools ---"
    ls /usr/share/bcc/tools/ 2>/dev/null | tr '\n' ' ' || echo "(BCC not installed)"

    echo -e "\n--- Available tracepoints (count) ---"
    find /sys/kernel/debug/tracing/events -maxdepth 2 -name 'enable' 2>/dev/null | wc -l
}

QUESTION="$*"
if [ -z "$QUESTION" ]; then
    echo "Usage: kai-ebpf <question>"
    echo ""
    echo "Examples:"
    echo "  kai-ebpf 'what processes are doing the most I/O'"
    echo "  kai-ebpf 'show me TCP connections with latency over 100ms'"
    echo "  kai-ebpf 'trace all file opens in /etc'"
    echo "  kai-ebpf 'detect suspicious process execution'"
    echo "  kai-ebpf 'why is the system slow right now'"
    echo "  kai-ebpf 'write me a bpftrace script to monitor DNS lookups'"
    exit 1
fi

CONTEXT=$(build_ebpf_context)

echo -e "${CONTEXT}\n\n=== QUESTION ===\n${QUESTION}" | ollama run ebpf-expert

You wouldn't diagnose a patient without taking vitals first. This script takes the system's vitals — I/O, CPU, network, process list — before the AI says a word.

3. Example queries

The model reads your live trace data and generates precise answers. Ask a question in English, get a bpftrace one-liner or a BCC command that answers it immediately.

"What processes are doing the most I/O?"

The AI reads the biotop output from the live context, identifies the top I/O consumers, and if you need deeper analysis, gives you: bpftrace -e 'tracepoint:block:block_rq_issue { @[comm] = count(); }' to count block I/O requests by process name.

kai-ebpf "what processes are doing the most disk I/O right now?"

"Show me TCP connections with latency over 100ms"

The AI generates: tcplife -L 100 to show only TCP sessions where latency exceeded 100ms. For a custom threshold, it writes the bpftrace equivalent with a filter on the duration calculation.

kai-ebpf "find all TCP connections taking longer than 100ms"

"Trace all file opens in /etc"

The AI gives you: opensnoop -p /etc or the bpftrace one-liner: bpftrace -e 'tracepoint:syscalls:sys_enter_openat /str(args->filename) == "/etc"*/ { printf("%s %d %s\n", comm, pid, str(args->filename)); }'. It explains that this catches any process reading configs, credentials, or hostname files.

kai-ebpf "trace all file opens under /etc and show me who is reading what"

"Detect suspicious process execution"

The AI generates an execsnoop filter that flags processes launched from /tmp, /dev/shm, or /var/tmp. It adds a bpftrace one-liner that logs UID, PID, parent PID, and the full command line for forensic analysis.

kai-ebpf "detect suspicious process execution — /tmp, encoded commands, unusual parents"

"Find unauthorized network connections"

The AI reads the live ss output and TCP connection data, cross-references with listening services, and flags any outbound connections from processes that shouldn't be talking to the network. It generates a tcpconnect filter for ongoing monitoring.

kai-ebpf "find any processes making network connections that shouldn't be"

"Generate a bpftrace one-liner"

Describe what you want to trace in plain English. The AI writes the bpftrace script, explains which probes it uses and why, and tells you what the output means. No manual required.

kai-ebpf "write a bpftrace script to show which files each process is writing to, sorted by bytes"

4. Automated observability via cron

Schedule the AI to run traces periodically, analyze the results, and flag anomalies. This is continuous observability without dashboards, agents, or external services.

eBPF observability monitor

#!/bin/bash
# /usr/local/bin/kai-ebpf-monitor — AI-driven eBPF observability report

REPORT_DIR="/var/log/kai-ebpf"
mkdir -p "$REPORT_DIR"
REPORT="$REPORT_DIR/$(date +%F-%H%M).txt"

# Run traces for 10 seconds each, collect results
TRACES=$(cat <<TRACEDATA
=== eBPF OBSERVABILITY REPORT — $(hostname) — $(date) ===

--- Process Execution (10s sample) ---
$(timeout 10 execsnoop 2>/dev/null | head -40 || echo "(execsnoop not available)")

--- Block I/O Latency Histogram (10s) ---
$(timeout 10 biolatency 2>/dev/null || echo "(biolatency not available)")

--- Top Block I/O by Process (10s) ---
$(timeout 10 biotop -C 1 1 2>/dev/null | head -20 || echo "(biotop not available)")

--- TCP Sessions (10s) ---
$(timeout 10 tcplife 2>/dev/null | head -30 || echo "(tcplife not available)")

--- TCP Retransmissions (10s) ---
$(timeout 10 tcpretrans 2>/dev/null | head -20 || echo "(tcpretrans not available)")

--- Page Cache Stats (10s) ---
$(timeout 10 cachestat 1 5 2>/dev/null || echo "(cachestat not available)")

--- Open File Activity (10s sample) ---
$(timeout 10 opensnoop 2>/dev/null | head -30 || echo "(opensnoop not available)")

--- Scheduler Run Queue Latency (10s) ---
$(timeout 10 runqlat 2>/dev/null || echo "(runqlat not available)")

--- Current Process List ---
$(ps aux --sort=-%cpu 2>/dev/null | head -15)

--- Memory ---
$(free -h 2>/dev/null)

--- Load ---
$(uptime 2>/dev/null)
TRACEDATA
)

# AI analysis
ANALYSIS=$(echo "${TRACES}

Analyze this eBPF observability data. Report:
1. I/O HOTSPOTS — processes doing the most block I/O, any latency outliers
2. NETWORK — unusual TCP connections, retransmissions, unexpected listeners
3. SECURITY — suspicious executions (/tmp, /dev/shm, encoded commands, unexpected UIDs)
4. PERFORMANCE — scheduler latency issues, page cache efficiency, CPU bottlenecks
5. RECOMMENDATIONS — specific bpftrace one-liners for deeper investigation

Be specific. Reference actual process names, PIDs, and values from the trace data." | \
    ollama run ebpf-expert)

{
    echo "=== AI eBPF OBSERVABILITY REPORT ==="
    echo "=== $(hostname) — $(date) ==="
    echo ""
    echo "$ANALYSIS"
    echo ""
    echo "=== RAW TRACE DATA ==="
    echo "$TRACES"
} > "$REPORT"

# Alert on security findings
if echo "$ANALYSIS" | grep -qi 'suspicious\|unauthorized\|security'; then
    echo "$ANALYSIS" | head -30 | logger -t kai-ebpf -p daemon.warning
fi

echo "eBPF report saved: $REPORT"

Schedule it

# Run every 6 hours for continuous observability
cat > /etc/cron.d/kai-ebpf-monitor <<'EOF'
SHELL=/bin/bash
PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
0 */6 * * * root /usr/local/bin/kai-ebpf-monitor
EOF

# Security-focused: run execsnoop analysis every hour
cat > /etc/cron.d/kai-ebpf-security <<'EOF'
SHELL=/bin/bash
PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
0 * * * * root timeout 60 execsnoop 2>/dev/null | \
    grep -E '/tmp/|/dev/shm|/var/tmp|base64|curl.*http|wget' | \
    logger -t kai-ebpf-alert -p daemon.warning
EOF

# Check reports
ls -la /var/log/kai-ebpf/

A security camera records everything but nobody watches the footage. This cron job is the guard who watches every frame and writes a report on what matters.

5. Replicate to fleet via syncoid

Train the eBPF expert on one node. Push it to every server. Each node traces its own processes and network connections but uses the same analytical brain.

Fleet deployment

#!/bin/bash
# replicate-ebpf-expert.sh — push the eBPF model to all nodes

NODES="node-2 node-3 node-4 node-5"

# Snapshot the trained model
zfs snapshot rpool/srv/ollama@ebpf-expert-$(date +%F)

# Replicate to every node
for node in $NODES; do
    echo "--- Syncing eBPF expert to $node ---"
    syncoid --no-sync-snap rpool/srv/ollama "root@${node}:rpool/srv/ollama"
    ssh "root@${node}" "systemctl restart ollama"
    echo "$node: done"
done

# Deploy the kai-ebpf script and cron jobs to every node
for node in $NODES; do
    scp /usr/local/bin/kai-ebpf "root@${node}:/usr/local/bin/kai-ebpf"
    scp /usr/local/bin/kai-ebpf-monitor "root@${node}:/usr/local/bin/kai-ebpf-monitor"
    scp /etc/cron.d/kai-ebpf-monitor "root@${node}:/etc/cron.d/kai-ebpf-monitor"
    scp /etc/cron.d/kai-ebpf-security "root@${node}:/etc/cron.d/kai-ebpf-security"
    ssh "root@${node}" "chmod +x /usr/local/bin/kai-ebpf /usr/local/bin/kai-ebpf-monitor"
done

echo "Fleet updated at $(date)"

Every node in the fleet gets the same eBPF analyst. node-2 sees its own processes. node-5 sees its own TCP connections. Same expertise. Different observations. Same quality of analysis.

eBPF turns questions into answers. "What is slow?" becomes a biolatency histogram. "Who is connecting?" becomes a tcplife table. "Is anything suspicious?" becomes an execsnoop filter. The kernel already knows everything that is happening. eBPF is how you ask it. The AI is how you ask in English instead of probe syntax.

If you can measure it, you can fix it. If you can't measure it, you are guessing.

← AI for ZFS Operations — a local model that knows your pools better than you do. AI for WireGuard Networking — describe your topology, get working configs. →