AI for eBPF Observability — ask questions, get traces.
eBPF lets you instrument the kernel without rebooting, without modules, without risk.
But the syntax is dense. The probes are many. The one-liners are hard to remember.
This model knows them all. Ask it "what processes are doing the most disk I/O" and it gives you
a bpftrace one-liner that answers the question in seconds.
The Modelfile encodes deep knowledge of BCC tools, bpftrace syntax, tracepoints, kprobes, and uprobes. The context script runs key traces and feeds their output into every query. The AI doesn't just tell you how to observe — it observes, then tells you what it sees.
1. The eBPF Modelfile
This system prompt encodes the BCC tool inventory, bpftrace syntax, common tracepoints, and the patterns that turn raw trace output into actionable answers.
Complete eBPF expert Modelfile
# /srv/ollama/Modelfile.ebpf-expert
FROM llama3.1:8b
SYSTEM """
You are an eBPF observability expert for this kldload-based infrastructure.
You write bpftrace one-liners, interpret BCC tool output, and identify
performance bottlenecks and security anomalies from trace data.
=== BCC TOOLS REFERENCE ===
Process tracing:
execsnoop Trace new process execution (shows every exec() call)
exitsnoop Trace process exit and lifespan
pidstat Per-process CPU, memory, I/O stats
runqlat Scheduler run queue latency histogram
runqlen Scheduler run queue length histogram
cpudist On-CPU time histogram per process
offcputime Off-CPU time (blocked/sleeping) stacks
Disk I/O:
biolatency Block I/O latency histogram
biosnoop Trace every block I/O with latency
biotop Top-like display for block I/O by process
bitesize Block I/O size histogram
ext4slower Trace slow ext4 operations (>thresh)
zfsslower Trace slow ZFS operations (>thresh)
filetop Top files by I/O
Filesystem:
opensnoop Trace every open() syscall (file opens)
statsnoop Trace stat() calls
filelife Trace file creation and deletion with age
cachestat Page cache hit/miss statistics
cachetop Per-process page cache stats
writeback Trace writeback events
vfscount Count VFS function calls
vfsstat VFS operation rates
Network:
tcplife Trace TCP sessions with duration and bytes
tcpconnect Trace active TCP connections (connect())
tcpaccept Trace passive TCP connections (accept())
tcpretrans Trace TCP retransmissions
tcpdrop Trace TCP drops with stack traces
tcprtt TCP round-trip time histogram
sockstat Socket statistics summary
Memory:
memleak Detect memory leaks with stack traces
oomkill Trace OOM killer invocations
shmsnoop Trace shared memory calls
drsnoop Trace direct reclaim events
System:
funccount Count kernel/user function calls
funclatency Function latency histogram
syscount Count syscalls by type
argdist Argument/return value distributions
trace Multi-purpose kernel/user tracing
hardirqs Trace hard interrupt handling time
softirqs Trace soft interrupt handling time
=== BPFTRACE SYNTAX ===
Probe types:
tracepoint:CATEGORY:NAME Stable kernel tracepoints
kprobe:FUNCTION Kernel function entry
kretprobe:FUNCTION Kernel function return
uprobe:BINARY:FUNCTION User-space function entry
uretprobe:BINARY:FUNCTION User-space function return
software:EVENT:COUNT Software events (cpu-clock, page-faults)
hardware:EVENT:COUNT Hardware PMC events
interval:s:N Timer (every N seconds)
BEGIN / END Script start/end
Built-in variables:
pid, tid, uid, comm Process ID, thread ID, user ID, command name
nsecs, elapsed Nanosecond timestamp, time since boot
cpu Current CPU number
curtask Current task_struct pointer
retval Return value (in kretprobe/uretprobe)
args Tracepoint arguments struct
@map[key] = value BPF maps (aggregation)
Aggregation functions:
count() Count events
sum(x) Sum values
avg(x) Average
min(x), max(x) Min/max
hist(x) Power-of-2 histogram
lhist(x, min, max, step) Linear histogram
stats(x) Count, avg, total
Output functions:
printf(fmt, ...) Formatted output
print(@map) Print map
clear(@map) Clear map
time(fmt) Print timestamp
=== COMMON ONE-LINERS ===
# Count syscalls by process
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
# Trace file opens by process
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s PID:%d %s\n", comm, pid, str(args->filename)); }'
# Block I/O latency histogram
bpftrace -e 'tracepoint:block:block_rq_issue { @start[args->dev, args->sector] = nsecs; }
tracepoint:block:block_rq_complete /@start[args->dev, args->sector]/ {
@usecs = hist((nsecs - @start[args->dev, args->sector]) / 1000);
delete(@start[args->dev, args->sector]); }'
# TCP connections with destination
bpftrace -e 'kprobe:tcp_connect { @[comm, ntop(((struct sock*)arg0)->__sk_common.skc_daddr)] = count(); }'
# Process execution trace
bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%s -> %s\n", comm, str(args->filename)); }'
# Page faults by process
bpftrace -e 'software:page-faults:1 { @[comm] = count(); }'
# Signal delivery
bpftrace -e 'tracepoint:signal:signal_deliver { printf("PID %d (%s) received signal %d\n", pid, comm, args->sig); }'
# Read/write bytes by process
bpftrace -e 'tracepoint:syscalls:sys_exit_read /retval > 0/ { @reads[comm] = sum(retval); }
tracepoint:syscalls:sys_exit_write /retval > 0/ { @writes[comm] = sum(retval); }'
=== SECURITY PATTERNS ===
Detect suspicious exec:
execsnoop — watch for unexpected binaries, /tmp execution, base64/curl/wget chains
bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%d %d %s %s\n", uid, pid, comm, str(args->filename)); }'
Detect unauthorized network:
tcpconnect — outbound connections from unexpected processes
tcpaccept — inbound connections on unexpected ports
File integrity monitoring:
opensnoop -f /etc/shadow — who is reading sensitive files
bpftrace -e 'tracepoint:syscalls:sys_enter_openat /str(args->filename) == "/etc/passwd"/ { printf("PID %d (%s) uid=%d\n", pid, comm, uid); }'
Privilege escalation:
bpftrace -e 'tracepoint:syscalls:sys_enter_setuid { printf("PID %d (%s) setuid(%d)\n", pid, comm, args->uid); }'
=== INTERPRETING OUTPUT ===
When reading trace output:
- High biolatency (>10ms) on SSDs indicates I/O contention or driver issues
- High runqlat (>10ms) means CPU oversubscription
- tcpretrans spikes indicate network congestion or packet loss
- execsnoop showing /tmp or /dev/shm execution is a security red flag
- opensnoop showing config file reads by unexpected processes is suspicious
- Page fault storms suggest memory pressure or poor locality
=== PHILOSOPHY ===
eBPF is observation, not modification. You are looking, not touching.
Always start with the question, then pick the probe.
BCC tools for quick answers. bpftrace for custom questions.
If you can measure it, you can fix it. If you can't measure it, you're guessing.
"""
PARAMETER temperature 0.3
PARAMETER num_ctx 16384
# Build the eBPF expert model
ollama create ebpf-expert -f /srv/ollama/Modelfile.ebpf-expert
# Verify it
ollama run ebpf-expert "Write a bpftrace one-liner to find which process is doing the most disk writes"
2. Live context script
The context script runs key eBPF traces for a few seconds and feeds the output to the AI. The model doesn't just know eBPF theory — it reads your actual trace data and tells you what it means.
The eBPF context builder
#!/bin/bash
# /usr/local/bin/kai-ebpf — query the eBPF AI with live trace context
build_ebpf_context() {
echo "=== LIVE SYSTEM STATE ($(date -Iseconds)) ==="
echo -e "\n--- CPU and load ---"
uptime 2>/dev/null
nproc 2>/dev/null
echo -e "\n--- Top processes by CPU ---"
ps aux --sort=-%cpu 2>/dev/null | head -12
echo -e "\n--- Top processes by memory ---"
ps aux --sort=-%mem 2>/dev/null | head -12
echo -e "\n--- Block I/O snapshot (3 seconds) ---"
timeout 3 biotop -C 1 1 2>/dev/null || \
timeout 3 iostat -xh 1 1 2>/dev/null | tail -20
echo -e "\n--- Recent process executions (3 seconds) ---"
timeout 3 execsnoop 2>/dev/null | head -20 || echo "(execsnoop not available)"
echo -e "\n--- Active TCP connections ---"
ss -tnp 2>/dev/null | head -20
echo -e "\n--- TCP retransmissions (3 seconds) ---"
timeout 3 tcpretrans 2>/dev/null | head -10 || echo "(tcpretrans not available)"
echo -e "\n--- Page cache stats ---"
timeout 3 cachestat 1 1 2>/dev/null || echo "(cachestat not available)"
echo -e "\n--- Listening services ---"
ss -tlnp 2>/dev/null
echo -e "\n--- Kernel version ---"
uname -r 2>/dev/null
echo -e "\n--- Available BCC tools ---"
ls /usr/share/bcc/tools/ 2>/dev/null | tr '\n' ' ' || echo "(BCC not installed)"
echo -e "\n--- Available tracepoints (count) ---"
find /sys/kernel/debug/tracing/events -maxdepth 2 -name 'enable' 2>/dev/null | wc -l
}
QUESTION="$*"
if [ -z "$QUESTION" ]; then
echo "Usage: kai-ebpf <question>"
echo ""
echo "Examples:"
echo " kai-ebpf 'what processes are doing the most I/O'"
echo " kai-ebpf 'show me TCP connections with latency over 100ms'"
echo " kai-ebpf 'trace all file opens in /etc'"
echo " kai-ebpf 'detect suspicious process execution'"
echo " kai-ebpf 'why is the system slow right now'"
echo " kai-ebpf 'write me a bpftrace script to monitor DNS lookups'"
exit 1
fi
CONTEXT=$(build_ebpf_context)
echo -e "${CONTEXT}\n\n=== QUESTION ===\n${QUESTION}" | ollama run ebpf-expert
3. Example queries
The model reads your live trace data and generates precise answers. Ask a question in English, get a bpftrace one-liner or a BCC command that answers it immediately.
"What processes are doing the most I/O?"
The AI reads the biotop output from the live context, identifies the top I/O consumers,
and if you need deeper analysis, gives you:
bpftrace -e 'tracepoint:block:block_rq_issue { @[comm] = count(); }'
to count block I/O requests by process name.
kai-ebpf "what processes are doing the most disk I/O right now?"
"Show me TCP connections with latency over 100ms"
The AI generates: tcplife -L 100 to show only TCP sessions where latency
exceeded 100ms. For a custom threshold, it writes the bpftrace equivalent with
a filter on the duration calculation.
kai-ebpf "find all TCP connections taking longer than 100ms"
"Trace all file opens in /etc"
The AI gives you: opensnoop -p /etc or the bpftrace one-liner:
bpftrace -e 'tracepoint:syscalls:sys_enter_openat /str(args->filename) == "/etc"*/ { printf("%s %d %s\n", comm, pid, str(args->filename)); }'.
It explains that this catches any process reading configs, credentials, or hostname files.
kai-ebpf "trace all file opens under /etc and show me who is reading what"
"Detect suspicious process execution"
The AI generates an execsnoop filter that flags processes launched from
/tmp, /dev/shm, or /var/tmp. It adds a bpftrace
one-liner that logs UID, PID, parent PID, and the full command line for forensic analysis.
kai-ebpf "detect suspicious process execution — /tmp, encoded commands, unusual parents"
"Find unauthorized network connections"
The AI reads the live ss output and TCP connection data, cross-references with
listening services, and flags any outbound connections from processes that shouldn't be talking to the network.
It generates a tcpconnect filter for ongoing monitoring.
kai-ebpf "find any processes making network connections that shouldn't be"
"Generate a bpftrace one-liner"
Describe what you want to trace in plain English. The AI writes the bpftrace script, explains which probes it uses and why, and tells you what the output means. No manual required.
kai-ebpf "write a bpftrace script to show which files each process is writing to, sorted by bytes"
4. Automated observability via cron
Schedule the AI to run traces periodically, analyze the results, and flag anomalies. This is continuous observability without dashboards, agents, or external services.
eBPF observability monitor
#!/bin/bash
# /usr/local/bin/kai-ebpf-monitor — AI-driven eBPF observability report
REPORT_DIR="/var/log/kai-ebpf"
mkdir -p "$REPORT_DIR"
REPORT="$REPORT_DIR/$(date +%F-%H%M).txt"
# Run traces for 10 seconds each, collect results
TRACES=$(cat <<TRACEDATA
=== eBPF OBSERVABILITY REPORT — $(hostname) — $(date) ===
--- Process Execution (10s sample) ---
$(timeout 10 execsnoop 2>/dev/null | head -40 || echo "(execsnoop not available)")
--- Block I/O Latency Histogram (10s) ---
$(timeout 10 biolatency 2>/dev/null || echo "(biolatency not available)")
--- Top Block I/O by Process (10s) ---
$(timeout 10 biotop -C 1 1 2>/dev/null | head -20 || echo "(biotop not available)")
--- TCP Sessions (10s) ---
$(timeout 10 tcplife 2>/dev/null | head -30 || echo "(tcplife not available)")
--- TCP Retransmissions (10s) ---
$(timeout 10 tcpretrans 2>/dev/null | head -20 || echo "(tcpretrans not available)")
--- Page Cache Stats (10s) ---
$(timeout 10 cachestat 1 5 2>/dev/null || echo "(cachestat not available)")
--- Open File Activity (10s sample) ---
$(timeout 10 opensnoop 2>/dev/null | head -30 || echo "(opensnoop not available)")
--- Scheduler Run Queue Latency (10s) ---
$(timeout 10 runqlat 2>/dev/null || echo "(runqlat not available)")
--- Current Process List ---
$(ps aux --sort=-%cpu 2>/dev/null | head -15)
--- Memory ---
$(free -h 2>/dev/null)
--- Load ---
$(uptime 2>/dev/null)
TRACEDATA
)
# AI analysis
ANALYSIS=$(echo "${TRACES}
Analyze this eBPF observability data. Report:
1. I/O HOTSPOTS — processes doing the most block I/O, any latency outliers
2. NETWORK — unusual TCP connections, retransmissions, unexpected listeners
3. SECURITY — suspicious executions (/tmp, /dev/shm, encoded commands, unexpected UIDs)
4. PERFORMANCE — scheduler latency issues, page cache efficiency, CPU bottlenecks
5. RECOMMENDATIONS — specific bpftrace one-liners for deeper investigation
Be specific. Reference actual process names, PIDs, and values from the trace data." | \
ollama run ebpf-expert)
{
echo "=== AI eBPF OBSERVABILITY REPORT ==="
echo "=== $(hostname) — $(date) ==="
echo ""
echo "$ANALYSIS"
echo ""
echo "=== RAW TRACE DATA ==="
echo "$TRACES"
} > "$REPORT"
# Alert on security findings
if echo "$ANALYSIS" | grep -qi 'suspicious\|unauthorized\|security'; then
echo "$ANALYSIS" | head -30 | logger -t kai-ebpf -p daemon.warning
fi
echo "eBPF report saved: $REPORT"
Schedule it
# Run every 6 hours for continuous observability
cat > /etc/cron.d/kai-ebpf-monitor <<'EOF'
SHELL=/bin/bash
PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
0 */6 * * * root /usr/local/bin/kai-ebpf-monitor
EOF
# Security-focused: run execsnoop analysis every hour
cat > /etc/cron.d/kai-ebpf-security <<'EOF'
SHELL=/bin/bash
PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
0 * * * * root timeout 60 execsnoop 2>/dev/null | \
grep -E '/tmp/|/dev/shm|/var/tmp|base64|curl.*http|wget' | \
logger -t kai-ebpf-alert -p daemon.warning
EOF
# Check reports
ls -la /var/log/kai-ebpf/
5. Replicate to fleet via syncoid
Train the eBPF expert on one node. Push it to every server. Each node traces its own processes and network connections but uses the same analytical brain.
Fleet deployment
#!/bin/bash
# replicate-ebpf-expert.sh — push the eBPF model to all nodes
NODES="node-2 node-3 node-4 node-5"
# Snapshot the trained model
zfs snapshot rpool/srv/ollama@ebpf-expert-$(date +%F)
# Replicate to every node
for node in $NODES; do
echo "--- Syncing eBPF expert to $node ---"
syncoid --no-sync-snap rpool/srv/ollama "root@${node}:rpool/srv/ollama"
ssh "root@${node}" "systemctl restart ollama"
echo "$node: done"
done
# Deploy the kai-ebpf script and cron jobs to every node
for node in $NODES; do
scp /usr/local/bin/kai-ebpf "root@${node}:/usr/local/bin/kai-ebpf"
scp /usr/local/bin/kai-ebpf-monitor "root@${node}:/usr/local/bin/kai-ebpf-monitor"
scp /etc/cron.d/kai-ebpf-monitor "root@${node}:/etc/cron.d/kai-ebpf-monitor"
scp /etc/cron.d/kai-ebpf-security "root@${node}:/etc/cron.d/kai-ebpf-security"
ssh "root@${node}" "chmod +x /usr/local/bin/kai-ebpf /usr/local/bin/kai-ebpf-monitor"
done
echo "Fleet updated at $(date)"
eBPF turns questions into answers. "What is slow?" becomes a biolatency histogram. "Who is connecting?" becomes a tcplife table. "Is anything suspicious?" becomes an execsnoop filter. The kernel already knows everything that is happening. eBPF is how you ask it. The AI is how you ask in English instead of probe syntax.
If you can measure it, you can fix it. If you can't measure it, you are guessing.