Documentation

Writing Custom eBPF Programs

The BCC tools and bpftrace one-liners cover 80% of use cases. The other 20% requires writing your own programs. This page teaches you the bpftrace language from scratch, walks through three real custom scripts, shows you how to graduate to BCC Python when bpftrace is not enough, and then takes you to the modern frontier: libbpf + CO-RE with skeleton files, ring buffers, every map type, tail calls, BPF helpers, the verifier, testing, packaging, and two complete production-grade projects.

The power: bpftrace is a domain-specific language designed for kernel tracing. It has the terseness of awk, the probing power of DTrace, and the safety of eBPF's in-kernel verifier. You write a few lines, the kernel executes them at wire speed, and you get answers that no other tool can give you. The overhead is measured in nanoseconds, not milliseconds.

The bpftrace language

Every bpftrace program has the same structure: one or more probe blocks, each with an optional filter and action.

probe /filter/ { action }

That is it. The probe fires when an event occurs. The filter decides whether to process it. The action does the work.

Your first program

bpftrace -e 'BEGIN { printf("tracing started\n"); }'

BEGIN fires once when the program starts. printf works like C. Press Ctrl+C to exit.

A real probe

bpftrace -e '
tracepoint:syscalls:sys_enter_openat {
  printf("%s opened %s\n", comm, str(args.filename));
}'

This fires every time any process calls openat(). comm is the process name. args.filename is the syscall argument. str() converts the kernel pointer to a string.

Probe types

bpftrace supports 8 probe types. Each attaches to a different kind of kernel or userspace event.

Probe type	What it traces	Example
`tracepoint`	Static kernel tracepoints	`tracepoint:syscalls:sys_enter_read`
`kprobe`	Any kernel function entry	`kprobe:vfs_read`
`kretprobe`	Any kernel function return	`kretprobe:vfs_read`
`uprobe`	Userspace function entry	`uprobe:/usr/bin/bash:readline`
`uretprobe`	Userspace function return	`uretprobe:/usr/bin/bash:readline`
`usdt`	User-level static tracing	`usdt:/usr/sbin/mysqld:query__start`
`software`	Software events (page faults, etc.)	`software:page-faults:100`
`hardware`	Hardware PMC events	`hardware:cache-misses:1000000`

tracepoint vs. kprobe

Tracepoints are stable, documented hook points in the kernel. They survive kernel upgrades. Use them when available.
Kprobes can attach to any kernel function, but function names change between kernel versions. Use them when no tracepoint exists for what you need.

Analogy: tracepoints are labeled doors with signs. Kprobes are grappling hooks — you can attach anywhere, but the building layout might change.

List available probes

# List all tracepoints
bpftrace -l 'tracepoint:*' | head -50

# List syscall tracepoints
bpftrace -l 'tracepoint:syscalls:*'

# List ZFS-related kprobes
bpftrace -l 'kprobe:zfs_*'
bpftrace -l 'kprobe:arc_*'
bpftrace -l 'kprobe:zio_*'

# List probes in a userspace binary
bpftrace -l 'uprobe:/usr/sbin/postgres:*' | head -20

Built-in variables

Variable	Type	Description
`pid`	uint64	Process ID
`tid`	uint64	Thread ID
`uid`	uint64	User ID
`gid`	uint64	Group ID
`comm`	string	Process name (16 char max)
`nsecs`	uint64	Nanosecond timestamp
`kstack`	string	Kernel stack trace
`ustack`	string	Userspace stack trace
`arg0`-`argN`	uint64	Probe arguments (kprobe/uprobe)
`retval`	uint64	Return value (kretprobe/uretprobe)
`args`	struct	Tracepoint arguments (named fields)
`curtask`	struct	Current task_struct pointer
`cpu`	uint32	Current CPU number

Maps and aggregations

Maps are bpftrace's data structures. They are hash tables stored in kernel memory, keyed by anything you choose. The @ prefix declares a map.

# Count syscalls by process name
bpftrace -e '
tracepoint:raw_syscalls:sys_enter {
  @[comm] = count();
}'

Press Ctrl+C and bpftrace prints the map sorted by value:

@[systemd]: 234
@[sshd]: 567
@[postgres]: 12847
@[nginx]: 34521

Aggregation functions

Function	Description	Example
`count()`	Count occurrences	`@[comm] = count();`
`sum(val)`	Sum values	`@bytes[comm] = sum(args.ret);`
`avg(val)`	Average value	`@lat[comm] = avg($dur);`
`min(val)`	Minimum value	`@min_lat = min($dur);`
`max(val)`	Maximum value	`@max_lat = max($dur);`
`hist(val)`	Power-of-2 histogram	`@latency = hist($dur);`
`lhist(val, min, max, step)`	Linear histogram	`@size = lhist(args.ret, 0, 1024, 64);`

Timing pattern

The most common bpftrace pattern: save a timestamp on entry, compute duration on exit.

bpftrace -e '
kprobe:vfs_read {
  @start[tid] = nsecs;
}

kretprobe:vfs_read /@start[tid]/ {
  $duration_us = (nsecs - @start[tid]) / 1000;
  @read_latency = hist($duration_us);
  delete(@start[tid]);
}'

$duration_us is a scratch variable (local to this probe). @start is a map (persists across probes). delete() frees the entry to prevent memory leaks.

Maps are kernel memory

Every @map entry consumes kernel memory. Always delete() entries when done, especially in timing patterns. If you trace a function that fires millions of times per second and store an entry per tid, you need to clean up or the map grows unbounded. bpftrace will warn you if a map gets too large.

Analogy: maps are sticky notes on a whiteboard. Start timing = stick a note. End timing = read the note and throw it away. If you forget to throw them away, the whiteboard fills up.

Custom script: monitor all DNS queries leaving this machine

This script traces UDP packets to port 53 and extracts the DNS query name from the packet payload. Save it as dnswatch.bt.

#!/usr/bin/env bpftrace

/*
 * dnswatch.bt - Monitor all DNS queries leaving this machine
 *
 * Traces UDP sendmsg/sendto to port 53 and logs the query.
 * Works with any DNS resolver (systemd-resolved, unbound, direct queries).
 *
 * Run: bpftrace dnswatch.bt
 */

BEGIN
{
  printf("%-8s %-6s %-16s %s\n", "TIME", "PID", "COMM", "DNS QUERY");
}

/* Trace UDP sendto - catches most DNS queries */
tracepoint:syscalls:sys_enter_sendto
/args.addr != 0/
{
  /* Read the sockaddr_in structure to check for port 53 */
  $sa = (struct sockaddr_in *)args.addr;
  $port = ($sa->sin_port >> 8) | (($sa->sin_port & 0xff) << 8);  /* ntohs */

  if ($port == 53) {
    time("%H:%M:%S ");
    printf("%-6d %-16s [DNS query to %s]\n",
      pid, comm,
      ntop(AF_INET, $sa->sin_addr.s_addr));
  }
}

/* Also trace connect() to port 53 for DNS-over-TCP and DoT */
tracepoint:syscalls:sys_enter_connect
/args.uservaddr != 0/
{
  $sa = (struct sockaddr_in *)args.uservaddr;
  $port = ($sa->sin_port >> 8) | (($sa->sin_port & 0xff) << 8);

  if ($port == 53 || $port == 853) {
    time("%H:%M:%S ");
    printf("%-6d %-16s [DNS-%s connect to %s]\n",
      pid, comm,
      $port == 53 ? "TCP" : "TLS",
      ntop(AF_INET, $sa->sin_addr.s_addr));
  }
}

A simpler alternative using the BCC tool:

# Quick one-liner: trace all UDP traffic to port 53
bpftrace -e '
tracepoint:net:net_dev_queue
/args.len > 0/
{
  @dns_by_process[comm] = count();
}

interval:s:10 {
  printf("\n--- DNS-like packets by process (10s) ---\n");
  print(@dns_by_process, 10);
  clear(@dns_by_process);
}'

For a quick-and-dirty approach, BCC includes gethostlatency which traces the glibc resolver:

# Trace DNS resolution latency via glibc
gethostlatency

Output:

TIME      PID    COMM          LATms HOST
14:02:01  3245   curl           12.4 api.example.com
14:02:01  3245   curl            0.1 api.example.com   # cached
14:02:03  8821   wget           45.2 packages.debian.org
14:02:05  1234   python3         8.7 evil-c2-server.ru

Custom script: track ZFS snapshot creation and deletion

This script traces ZFS snapshot operations with timestamps, giving you an audit trail of who is creating and destroying snapshots. Save as zfs-snap-audit.bt.

#!/usr/bin/env bpftrace

/*
 * zfs-snap-audit.bt - Track ZFS snapshot create/destroy operations
 *
 * Traces the kernel-side ZFS snapshot path to catch all snapshot
 * operations regardless of which tool initiated them (zfs CLI,
 * sanoid, ksnap, or direct ioctl).
 *
 * Run: bpftrace zfs-snap-audit.bt
 */

BEGIN
{
  printf("ZFS Snapshot Auditor started. Ctrl+C to stop.\n");
  printf("%-10s %-8s %-6s %-16s %s\n",
    "TIME", "ACTION", "PID", "COMM", "DETAILS");
}

/* Trace snapshot creation */
kprobe:dsl_dataset_snapshot
{
  time("%H:%M:%S  ");
  printf("%-8s %-6d %-16s snapshot create initiated\n",
    "CREATE", pid, comm);
  @create_count = count();
}

/* Trace snapshot destruction */
kprobe:dsl_destroy_snapshot
{
  time("%H:%M:%S  ");
  printf("%-8s %-6d %-16s snapshot destroy initiated\n",
    "DESTROY", pid, comm);
  @destroy_count = count();
}

/* Trace zfs_ioc_snapshot (ioctl path - catches zfs CLI commands) */
kprobe:zfs_ioc_snapshot
{
  time("%H:%M:%S  ");
  printf("%-8s %-6d %-16s snapshot via ioctl\n",
    "IOCTL", pid, comm);
}

/* Trace zfs_ioc_destroy_snaps */
kprobe:zfs_ioc_destroy_snaps
{
  time("%H:%M:%S  ");
  printf("%-8s %-6d %-16s destroy via ioctl\n",
    "IOCTL", pid, comm);
}

interval:s:60
{
  printf("\n--- 60s summary: creates=%d destroys=%d ---\n",
    @create_count, @destroy_count);
  clear(@create_count);
  clear(@destroy_count);
}

END
{
  printf("\nZFS Snapshot Auditor stopped.\n");
  clear(@create_count);
  clear(@destroy_count);
}

Run it and then create a snapshot in another terminal:

# Terminal 1: start the auditor
bpftrace zfs-snap-audit.bt

# Terminal 2: create and destroy a snapshot
zfs snapshot rpool/ROOT@test-audit
zfs destroy rpool/ROOT@test-audit

Output in Terminal 1:

ZFS Snapshot Auditor started. Ctrl+C to stop.
TIME       ACTION   PID    COMM             DETAILS
15:23:01   CREATE   9421   zfs              snapshot create initiated
15:23:01   IOCTL    9421   zfs              snapshot via ioctl
15:23:15   DESTROY  9434   zfs              snapshot destroy initiated
15:23:15   IOCTL    9434   zfs              destroy via ioctl

Custom script: measure WireGuard tunnel latency per peer

This script traces WireGuard packet send/receive paths to measure per-peer tunnel latency. Save as wg-latency.bt.

#!/usr/bin/env bpftrace

/*
 * wg-latency.bt - Measure WireGuard tunnel processing latency
 *
 * Traces the WireGuard send/receive paths to measure how long
 * the kernel spends encrypting, encapsulating, and transmitting
 * packets through WireGuard tunnels.
 *
 * Run: bpftrace wg-latency.bt
 */

BEGIN
{
  printf("WireGuard latency tracer started. Ctrl+C for results.\n");
}

/* Trace packet encryption (send path) */
kprobe:wg_packet_encrypt
{
  @encrypt_start[tid] = nsecs;
}

kretprobe:wg_packet_encrypt /@encrypt_start[tid]/
{
  $dur_us = (nsecs - @encrypt_start[tid]) / 1000;
  @encrypt_latency = hist($dur_us);
  @encrypt_count = count();
  delete(@encrypt_start[tid]);
}

/* Trace packet decryption (receive path) */
kprobe:wg_packet_decrypt
{
  @decrypt_start[tid] = nsecs;
}

kretprobe:wg_packet_decrypt /@decrypt_start[tid]/
{
  $dur_us = (nsecs - @decrypt_start[tid]) / 1000;
  @decrypt_latency = hist($dur_us);
  @decrypt_count = count();
  delete(@decrypt_start[tid]);
}

/* Trace the send path (queuing + transmission) */
kprobe:wg_xmit
{
  @xmit_start[tid] = nsecs;
}

kretprobe:wg_xmit /@xmit_start[tid]/
{
  $dur_us = (nsecs - @xmit_start[tid]) / 1000;
  @xmit_latency = hist($dur_us);
  delete(@xmit_start[tid]);
}

/* Trace peer handshake events */
kprobe:wg_packet_handshake_send_worker
{
  time("%H:%M:%S ");
  printf("WG handshake initiated by %s (pid=%d)\n", comm, pid);
  @handshakes = count();
}

interval:s:30
{
  printf("\n=== WireGuard tunnel stats (30s window) ===\n");
  printf("Encrypt operations: "); print(@encrypt_count);
  printf("Decrypt operations: "); print(@decrypt_count);
  printf("Handshakes: "); print(@handshakes);
  printf("\nEncrypt latency (us):\n"); print(@encrypt_latency);
  printf("\nDecrypt latency (us):\n"); print(@decrypt_latency);
  printf("\nXmit latency (us):\n"); print(@xmit_latency);

  clear(@encrypt_count);
  clear(@decrypt_count);
  clear(@handshakes);
  clear(@encrypt_latency);
  clear(@decrypt_latency);
  clear(@xmit_latency);
}

END
{
  clear(@encrypt_start);
  clear(@decrypt_start);
  clear(@xmit_start);
  clear(@encrypt_count);
  clear(@decrypt_count);
  clear(@handshakes);
}

Expected output after 30 seconds of WireGuard traffic:

=== WireGuard tunnel stats (30s window) ===
Encrypt operations: 4521
Decrypt operations: 4498
Handshakes: 0

Encrypt latency (us):
[0]                   12     |*                               |
[1]                  3891    |****************************************|
[2, 4)                567    |*****                           |
[4, 8)                 48    |                                |
[8, 16)                 3    |                                |

Decrypt latency (us):
[0]                    8     |                                |
[1]                  3712    |****************************************|
[2, 4)                689    |*******                         |
[4, 8)                 82    |                                |
[8, 16)                 7    |                                |

WireGuard encryption and decryption should complete in 1-4 microseconds on modern hardware. If you see a tail extending into milliseconds, check for CPU contention (runqlat) or interrupt coalescing issues.

BCC Python programs: when bpftrace is not enough

bpftrace is fast to write but limited in its control flow and data processing. When you need complex logic, persistent state, or integration with other Python libraries, use BCC's Python API.

bpftrace vs. BCC Python

Use bpftrace when: you need a quick answer, the logic is simple (trace + count/histogram), or you are exploring interactively.
Use BCC Python when: you need complex filtering logic, want to correlate events across multiple probes with state machines, need to write data to files/databases, or want to integrate with existing Python tools.

Analogy: bpftrace is a calculator — fast, focused, does one thing well. BCC Python is a spreadsheet — more setup, but handles complex analysis and automation.

BCC Python structure

cat > /usr/local/bin/ebpf-connection-monitor.py << 'PYEOF'
#!/usr/bin/env python3
"""
eBPF connection monitor - tracks all TCP connections and alerts on suspicious ones.
Uses BCC to attach to tcp_v4_connect and log connection details.
"""

from bcc import BPF
from time import strftime
import sys

# BPF program (C code that runs in the kernel)
bpf_program = """
#include 
#include 

struct conn_event {
    u32 pid;
    u32 uid;
    u32 daddr;
    u16 dport;
    char comm[16];
};

BPF_PERF_OUTPUT(events);

int trace_connect(struct pt_regs *ctx, struct sock *sk)
{
    struct conn_event evt = {};

    evt.pid = bpf_get_current_pid_tgid() >> 32;
    evt.uid = bpf_get_current_uid_gid() & 0xffffffff;
    bpf_get_current_comm(&evt.comm, sizeof(evt.comm));

    evt.daddr = sk->__sk_common.skc_daddr;
    evt.dport = sk->__sk_common.skc_dport;

    events.perf_submit(ctx, &evt, sizeof(evt));
    return 0;
}
"""

# Load and attach
b = BPF(text=bpf_program)
b.attach_kprobe(event="tcp_v4_connect", fn_name="trace_connect")

# Suspicious ports (common C2, backdoor, and mining ports)
SUSPICIOUS_PORTS = {4444, 5555, 6666, 8888, 9999, 1337, 31337, 12345}

def handle_event(cpu, data, size):
    event = b["events"].event(data)
    dport = ((event.dport & 0xff) << 8) | (event.dport >> 8)  # ntohs

    # Convert IP
    daddr = "%d.%d.%d.%d" % (
        event.daddr & 0xff,
        (event.daddr >> 8) & 0xff,
        (event.daddr >> 16) & 0xff,
        (event.daddr >> 24) & 0xff,
    )

    prefix = "ALERT" if dport in SUSPICIOUS_PORTS else "     "
    print("%s %s pid=%-6d uid=%-5d %-16s -> %s:%d" % (
        prefix,
        strftime("%H:%M:%S"),
        event.pid,
        event.uid,
        event.comm.decode('utf-8', errors='replace'),
        daddr,
        dport,
    ))

b["events"].open_perf_buffer(handle_event)

print("Tracing TCP connections. Ctrl+C to stop.")
print("%-5s %-8s %-10s %-7s %-16s    %s" % (
    "FLAG", "TIME", "PID", "UID", "COMM", "DESTINATION"))

try:
    while True:
        b.perf_buffer_poll()
except KeyboardInterrupt:
    print("\nDone.")
    sys.exit(0)
PYEOF
chmod +x /usr/local/bin/ebpf-connection-monitor.py

Run it:

python3 /usr/local/bin/ebpf-connection-monitor.py

Output:

Tracing TCP connections. Ctrl+C to stop.
FLAG  TIME     PID        UID     COMM                DESTINATION
      14:30:01 pid=3245   uid=1000  curl             -> 93.184.216.34:443
      14:30:02 pid=8821   uid=0     sshd             -> 10.0.0.12:22
ALERT 14:30:03 pid=3315   uid=33    python3          -> 45.33.32.156:4444
      14:30:05 pid=1234   uid=0     dnf              -> 192.168.1.100:443

Compiling and running BCC tools

# BCC tools are Python scripts — no compilation needed
# On Debian (kldload installs these):
dpkg -L bpfcc-tools | head -20

# On CentOS/RHEL:
rpm -ql bcc-tools | head -20

# Run any BCC tool directly:
python3 /usr/share/bcc/tools/execsnoop

# Or if installed to PATH:
execsnoop-bpfcc   # Debian
/usr/share/bcc/tools/execsnoop  # CentOS

Creating systemd services for persistent eBPF monitors

Any bpftrace script or BCC Python program can be wrapped in a systemd service for continuous monitoring.

bpftrace script as a service

# Save your bpftrace script
mkdir -p /usr/local/share/bpftrace
cp zfs-snap-audit.bt /usr/local/share/bpftrace/

# Create the service
cat > /etc/systemd/system/zfs-snap-audit.service << 'EOF'
[Unit]
Description=eBPF ZFS snapshot auditor
After=zfs.target

[Service]
Type=simple
ExecStart=/usr/bin/bpftrace /usr/local/share/bpftrace/zfs-snap-audit.bt
StandardOutput=append:/var/log/zfs-snap-audit.log
StandardError=append:/var/log/zfs-snap-audit.err
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now zfs-snap-audit

BCC Python program as a service

cat > /etc/systemd/system/ebpf-connmonitor.service << 'EOF'
[Unit]
Description=eBPF TCP connection monitor
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/python3 /usr/local/bin/ebpf-connection-monitor.py
StandardOutput=append:/var/log/ebpf-connections.log
StandardError=append:/var/log/ebpf-connections.err
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now ebpf-connmonitor

Log rotation for all eBPF services

cat > /etc/logrotate.d/ebpf << 'EOF'
/var/log/ebpf-*.log /var/log/zfs-snap-audit.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    copytruncate
}
EOF

Performance impact: nanoseconds, not milliseconds

eBPF programs run inside the kernel's BPF virtual machine. The verifier guarantees they terminate, do not access invalid memory, and complete in bounded time. The overhead per probe hit is typically 50-200 nanoseconds.

Operation	Overhead per event	Context
Tracepoint hit (no action)	~50 ns	Just counting events
Tracepoint + map update	~100-150 ns	Incrementing a counter
Kprobe + printf	~200-500 ns	Logging each event
Kprobe + stack trace	~1-5 us	Walking the call stack
BCC perf_submit	~500 ns-1 us	Sending event to userspace

For comparison: a single disk I/O takes 50-10,000 microseconds. A network round-trip takes 200-100,000 microseconds. eBPF overhead is invisible in any real workload.

Safe by design

The BPF verifier in the kernel checks every eBPF program before it runs. It proves the program terminates (no infinite loops), accesses only valid memory, and uses bounded stack space. You cannot crash the kernel with a buggy bpftrace script. The worst that happens is the verifier rejects your program with an error message. This is a fundamental difference from kernel modules, where a bug means a kernel panic.

Analogy: writing a kernel module is like performing open-heart surgery — one mistake and the patient dies. Writing an eBPF program is like using an exercise machine with safety stops — it physically cannot let you hurt yourself.

libbpf + C: the modern way

BCC compiles your C code at runtime on every machine. That means every target needs kernel headers, LLVM, and Clang installed. It also means your program takes 5-15 seconds to start while BCC compiles and loads. For production tools that ship as binaries, this is unacceptable.

libbpf + CO-RE (Compile Once, Run Everywhere) solves this completely. You compile your eBPF C program once on your build machine. The resulting object file runs on any kernel from 5.8 onwards, with no headers, no compiler, and sub-second startup. This is how every serious eBPF tool is built today: Cilium, Falco, Tetragon, bpftrace itself (internally), and all the Meta/Google/Netflix production tools.

The shift: BCC was the right tool in 2016. libbpf is the right tool today. BCC programs are Python scripts that embed C strings and compile on every run. libbpf programs are native C programs that compile once, ship as static binaries, and start in milliseconds. The BPF CO-RE mechanism reads BTF (BPF Type Format) information from the running kernel to relocate struct field accesses at load time — so your compiled program adapts to any kernel's struct layouts automatically.

I resisted switching from BCC to libbpf for a year because BCC was comfortable. The day I shipped my first libbpf tool as a static binary and watched it start in 40ms on a machine with no development tools installed, I deleted every BCC script I had and never looked back. The startup time alone is worth the switch. The portability is the bonus.

Why libbpf over BCC

No runtime compilation

BCC embeds Clang/LLVM and compiles your C code on every run. That is 5-15 seconds of startup, plus you need kernel-devel headers on every machine. libbpf loads a pre-compiled .o file in milliseconds.

CO-RE portability

BPF CO-RE uses BTF type information to relocate struct field accesses at load time. Compile on kernel 5.15, run on 6.1 — field offsets adapt automatically. No per-kernel builds.

Static binary deployment

Compile your loader with -static and ship a single binary. No Python, no BCC, no headers, no runtime dependencies. Drop it on any Linux box and run.

Skeleton files

bpftool gen skeleton generates a C header from your compiled eBPF object. The header provides type-safe access to maps, programs, and global variables. No more stringly-typed map lookups.

Prerequisites: installing the development toolchain

# CentOS Stream 9 / RHEL 9 / Rocky 9
dnf install -y clang llvm bpftool libbpf-devel elfutils-libelf-devel \
  kernel-devel-$(uname -r) make gcc

# Fedora 41
dnf install -y clang llvm bpftool libbpf-devel elfutils-libelf-devel \
  kernel-devel make gcc

# Debian 13 / Ubuntu 24.04
apt install -y clang llvm bpftool libbpf-dev libelf-dev \
  linux-headers-$(uname -r) make gcc

# Verify BTF is available (required for CO-RE)
ls -la /sys/kernel/btf/vmlinux
# Should exist on any kernel 5.4+ with CONFIG_DEBUG_INFO_BTF=y
# All kldload-supported distros have this enabled.

Project structure

A libbpf project has three files minimum: the eBPF C program (runs in kernel), the userspace loader (runs in userland), and a Makefile. The skeleton header is generated during the build.

my-ebpf-tool/
  fileopen.bpf.c        # eBPF C program (kernel side)
  fileopen.c            # Userspace loader (user side)
  fileopen.h            # Shared type definitions
  Makefile
  # Generated during build:
  fileopen.bpf.o        # Compiled eBPF object
  fileopen.skel.h       # Skeleton header (from bpftool)
  fileopen              # Final binary

Step 1: Shared header — type definitions

Define the event struct that both sides share. This struct travels from kernel to userspace through the ring buffer.

cat > fileopen.h << 'EOF'
/* fileopen.h - shared types between BPF and userspace */
#ifndef __FILEOPEN_H
#define __FILEOPEN_H

#define MAX_FILENAME_LEN 256
#define MAX_COMM_LEN     16

struct file_event {
    __u32 pid;
    __u32 uid;
    __u64 ts_ns;
    __u64 latency_ns;
    __s32 ret;
    char  comm[MAX_COMM_LEN];
    char  filename[MAX_FILENAME_LEN];
};

#endif /* __FILEOPEN_H */
EOF

Step 2: eBPF C program — kernel side

This program traces every openat() system call, captures the filename, process info, and latency, then sends the event through a ring buffer.

cat > fileopen.bpf.c << 'EOF'
/* fileopen.bpf.c - trace all file opens with process/filename/latency */

#include "vmlinux.h"           /* kernel types from BTF - generated by bpftool */
#include    /* BPF helper macros */
#include    /* PT_REGS macros */
#include  /* CO-RE read helpers */
#include "fileopen.h"

/* Global variable: filter by UID (0 = trace all) */
const volatile __u32 target_uid = 0;

/* Global variable: minimum latency to report (ns) */
const volatile __u64 min_latency_ns = 0;

/* Ring buffer for sending events to userspace */
struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 256 * 1024);  /* 256 KB */
} events SEC(".maps");

/* Hash map: tid -> entry timestamp (for latency calculation) */
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 8192);
    __type(key, __u32);
    __type(value, __u64);
} start_ts SEC(".maps");

/* Hash map: tid -> filename (saved on entry, used on exit) */
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 8192);
    __type(key, __u32);
    __type(value, char[MAX_FILENAME_LEN]);
} filenames SEC(".maps");

SEC("tracepoint/syscalls/sys_enter_openat")
int tracepoint__syscalls__sys_enter_openat(struct trace_event_raw_sys_enter *ctx)
{
    __u64 pid_tgid = bpf_get_current_pid_tgid();
    __u32 tid = (__u32)pid_tgid;
    __u32 uid = bpf_get_current_uid_gid() & 0xffffffff;

    /* Filter by UID if set */
    if (target_uid && uid != target_uid)
        return 0;

    /* Save entry timestamp */
    __u64 ts = bpf_ktime_get_ns();
    bpf_map_update_elem(&start_ts, &tid, &ts, BPF_ANY);

    /* Save filename (arg1 of openat is the pathname) */
    char fname[MAX_FILENAME_LEN] = {};
    const char *fname_ptr = (const char *)ctx->args[1];
    bpf_probe_read_user_str(fname, sizeof(fname), fname_ptr);
    bpf_map_update_elem(&filenames, &tid, fname, BPF_ANY);

    return 0;
}

SEC("tracepoint/syscalls/sys_exit_openat")
int tracepoint__syscalls__sys_exit_openat(struct trace_event_raw_sys_exit *ctx)
{
    __u64 pid_tgid = bpf_get_current_pid_tgid();
    __u32 tid = (__u32)pid_tgid;
    __u32 pid = pid_tgid >> 32;

    /* Look up entry timestamp */
    __u64 *tsp = bpf_map_lookup_elem(&start_ts, &tid);
    if (!tsp)
        return 0;

    __u64 latency = bpf_ktime_get_ns() - *tsp;
    bpf_map_delete_elem(&start_ts, &tid);

    /* Apply minimum latency filter */
    if (min_latency_ns && latency < min_latency_ns)
        goto cleanup;

    /* Reserve space in ring buffer */
    struct file_event *evt;
    evt = bpf_ringbuf_reserve(&events, sizeof(*evt), 0);
    if (!evt)
        goto cleanup;

    /* Fill the event */
    evt->pid = pid;
    evt->uid = bpf_get_current_uid_gid() & 0xffffffff;
    evt->ts_ns = bpf_ktime_get_ns();
    evt->latency_ns = latency;
    evt->ret = (int)ctx->ret;
    bpf_get_current_comm(evt->comm, sizeof(evt->comm));

    /* Copy saved filename */
    char *fname = bpf_map_lookup_elem(&filenames, &tid);
    if (fname)
        __builtin_memcpy(evt->filename, fname, MAX_FILENAME_LEN);
    else
        evt->filename[0] = '\0';

    /* Submit event to userspace */
    bpf_ringbuf_submit(evt, 0);

cleanup:
    bpf_map_delete_elem(&filenames, &tid);
    return 0;
}

char LICENSE[] SEC("license") = "GPL";
EOF

vmlinux.h: the kernel in a header

vmlinux.h is generated from the kernel's BTF data. It contains every struct, enum, and typedef in the running kernel — tens of thousands of types. You include it instead of individual kernel headers. Generate it with: bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h

Analogy: vmlinux.h is a complete blueprint of the kernel's data structures. Instead of importing headers one by one and hoping they match, you get a single file that is guaranteed to match because it was extracted from the kernel itself.

Step 3: Userspace loader

The loader opens the compiled eBPF object, sets global variables, attaches programs to their hooks, and polls the ring buffer for events.

cat > fileopen.c << 'EOF'
/* fileopen.c - userspace loader for the file-open tracer */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include "fileopen.h"
#include "fileopen.skel.h"   /* generated by bpftool gen skeleton */

static volatile sig_atomic_t exiting = 0;

static void sig_handler(int sig)
{
    exiting = 1;
}

/* Callback: called for each event in the ring buffer */
static int handle_event(void *ctx, void *data, size_t data_sz)
{
    struct file_event *evt = data;
    struct tm *tm;
    char ts_buf[32];
    time_t t;

    /* Format timestamp */
    t = time(NULL);
    tm = localtime(&t);
    strftime(ts_buf, sizeof(ts_buf), "%H:%M:%S", tm);

    /* Print event */
    printf("%-8s %-7d %-5d %-16s %8.3f ms  fd=%-4d %s\n",
        ts_buf,
        evt->pid,
        evt->uid,
        evt->comm,
        (double)evt->latency_ns / 1e6,
        evt->ret,
        evt->filename);

    return 0;
}

int main(int argc, char **argv)
{
    struct fileopen_bpf *skel;
    struct ring_buffer *rb = NULL;
    int err;

    /* Parse optional arguments */
    __u32 uid_filter = 0;
    __u64 min_lat = 0;

    for (int i = 1; i < argc; i++) {
        if (strncmp(argv[i], "--uid=", 6) == 0)
            uid_filter = atoi(argv[i] + 6);
        else if (strncmp(argv[i], "--min-lat-ms=", 13) == 0)
            min_lat = (__u64)(atof(argv[i] + 13) * 1e6);
        else {
            fprintf(stderr, "Usage: %s [--uid=N] [--min-lat-ms=N]\n", argv[0]);
            return 1;
        }
    }

    /* Set up signal handlers */
    signal(SIGINT, sig_handler);
    signal(SIGTERM, sig_handler);

    /* Open the skeleton (parses the embedded ELF) */
    skel = fileopen_bpf__open();
    if (!skel) {
        fprintf(stderr, "Failed to open BPF skeleton\n");
        return 1;
    }

    /* Set global variables BEFORE loading (they go into .rodata) */
    skel->rodata->target_uid = uid_filter;
    skel->rodata->min_latency_ns = min_lat;

    /* Load the BPF programs into the kernel */
    err = fileopen_bpf__load(skel);
    if (err) {
        fprintf(stderr, "Failed to load BPF programs: %d\n", err);
        goto cleanup;
    }

    /* Attach programs to their hooks */
    err = fileopen_bpf__attach(skel);
    if (err) {
        fprintf(stderr, "Failed to attach BPF programs: %d\n", err);
        goto cleanup;
    }

    /* Set up ring buffer polling */
    rb = ring_buffer__new(bpf_map__fd(skel->maps.events), handle_event, NULL, NULL);
    if (!rb) {
        fprintf(stderr, "Failed to create ring buffer\n");
        err = -1;
        goto cleanup;
    }

    /* Print header */
    printf("Tracing file opens");
    if (uid_filter)
        printf(" for UID %d", uid_filter);
    if (min_lat)
        printf(" with latency >= %.1f ms", (double)min_lat / 1e6);
    printf(". Ctrl+C to stop.\n");
    printf("%-8s %-7s %-5s %-16s %11s  %-6s %s\n",
        "TIME", "PID", "UID", "COMM", "LATENCY", "FD", "FILENAME");

    /* Event loop */
    while (!exiting) {
        err = ring_buffer__poll(rb, 100 /* timeout ms */);
        if (err == -EINTR) {
            err = 0;
            break;
        }
        if (err < 0) {
            fprintf(stderr, "Error polling ring buffer: %d\n", err);
            break;
        }
    }

cleanup:
    ring_buffer__free(rb);
    fileopen_bpf__destroy(skel);
    return err < 0 ? 1 : 0;
}
EOF

Step 4: Makefile

cat > Makefile << 'MAKEEOF'
# Makefile for libbpf CO-RE eBPF program

CLANG      ?= clang
BPFTOOL    ?= bpftool
CC         ?= gcc
ARCH       := $(shell uname -m | sed 's/x86_64/x86/' | sed 's/aarch64/arm64/')
LIBBPF_DIR ?= /usr/lib64
INCLUDES   := -I. -I/usr/include

# Generate vmlinux.h from running kernel's BTF
vmlinux.h:
	$(BPFTOOL) btf dump file /sys/kernel/btf/vmlinux format c > $@

# Compile eBPF C to BPF bytecode
fileopen.bpf.o: fileopen.bpf.c fileopen.h vmlinux.h
	$(CLANG) -g -O2 -target bpf -D__TARGET_ARCH_$(ARCH) \
		$(INCLUDES) -c $< -o $@

# Generate skeleton header from compiled BPF object
fileopen.skel.h: fileopen.bpf.o
	$(BPFTOOL) gen skeleton $< > $@

# Compile userspace loader
fileopen: fileopen.c fileopen.h fileopen.skel.h
	$(CC) -g -O2 -Wall $(INCLUDES) $< -lbpf -lelf -lz -o $@

# Static binary (for deployment to machines without dev tools)
fileopen-static: fileopen.c fileopen.h fileopen.skel.h
	$(CC) -g -O2 -Wall -static $(INCLUDES) $< \
		-lbpf -lelf -lz -lzstd -o $@

.PHONY: clean
clean:
	rm -f vmlinux.h fileopen.bpf.o fileopen.skel.h fileopen fileopen-static

.DEFAULT_GOAL := fileopen
MAKEEOF

Step 5: Build and run

# Build the tool
make clean && make

# Run it — trace all file opens
sudo ./fileopen

# Filter by UID
sudo ./fileopen --uid=1000

# Only show opens that take longer than 1ms (slow filesystem?)
sudo ./fileopen --min-lat-ms=1.0

# Build static binary for deployment
make fileopen-static
file fileopen-static
# fileopen-static: ELF 64-bit LSB executable, x86-64, statically linked

# Copy to target machine (no dev tools needed)
scp fileopen-static root@target:/usr/local/bin/fileopen

Example output:

Tracing file opens. Ctrl+C to stop.
TIME     PID     UID   COMM                LATENCY  FD     FILENAME
14:22:01 3245    1000  bash                  0.012 ms  fd=3   /etc/bash_completion.d
14:22:01 3245    1000  bash                  0.008 ms  fd=3   /etc/profile.d/colorls.sh
14:22:03 8821    0     sshd                  0.015 ms  fd=4   /etc/ssh/sshd_config
14:22:03 8821    0     sshd                  2.341 ms  fd=5   /var/log/auth.log
14:22:05 1102    0     systemd-journal       0.087 ms  fd=22  /run/log/journal/...
14:22:05 9912    33    nginx                 0.004 ms  fd=12  /var/www/html/index.html

The difference between a BCC tool and a libbpf tool in production is night and day. BCC: install Python, install bcc-tools, install kernel-devel, install clang, wait 8 seconds for it to compile, hope the kernel headers match. libbpf: scp one binary, run it. Done. When you are deploying to 500 servers at 3am during an incident, that difference matters.

Skeleton files: bpftool gen skeleton

The skeleton is the bridge between your compiled eBPF object and your userspace C code. bpftool gen skeleton takes a compiled .bpf.o file and generates a C header that embeds the entire ELF object as a byte array, plus type-safe accessor functions for every map, program, and global variable.

What the skeleton provides

# Generate skeleton from compiled object
bpftool gen skeleton fileopen.bpf.o > fileopen.skel.h

# Examine what it generated
head -80 fileopen.skel.h

The generated header defines a struct and lifecycle functions:

/* Simplified view of what bpftool gen skeleton produces */

struct fileopen_bpf {
    struct bpf_object_skeleton *skeleton;
    struct bpf_object *obj;

    /* Maps — direct access to map file descriptors */
    struct {
        struct bpf_map *events;       /* ring buffer */
        struct bpf_map *start_ts;     /* hash map */
        struct bpf_map *filenames;    /* hash map */
    } maps;

    /* Programs — direct access to program file descriptors */
    struct {
        struct bpf_program *tracepoint__syscalls__sys_enter_openat;
        struct bpf_program *tracepoint__syscalls__sys_exit_openat;
    } progs;

    /* Links — attachment handles */
    struct {
        struct bpf_link *tracepoint__syscalls__sys_enter_openat;
        struct bpf_link *tracepoint__syscalls__sys_exit_openat;
    } links;

    /* Global variables — .rodata section (read-only config) */
    struct fileopen_bpf__rodata {
        __u32 target_uid;
        __u64 min_latency_ns;
    } *rodata;

    /* Global variables — .bss section (mutable state) */
    struct fileopen_bpf__bss {
        /* any non-const global variables go here */
    } *bss;
};

/* Lifecycle functions */
struct fileopen_bpf *fileopen_bpf__open(void);
int fileopen_bpf__load(struct fileopen_bpf *obj);
int fileopen_bpf__attach(struct fileopen_bpf *obj);
void fileopen_bpf__destroy(struct fileopen_bpf *obj);

The skeleton lifecycle

1. open()

Parses the embedded ELF. At this point the eBPF programs exist as data structures in userspace memory. Nothing is loaded into the kernel yet. This is where you modify global variables (.rodata, .bss) and adjust map sizes.

2. load()

Creates maps in the kernel, loads programs through the verifier. If the verifier rejects a program, this is where you get the error. After load, maps exist and programs are verified but not yet active.

3. attach()

Attaches programs to their hook points (tracepoints, kprobes, etc.). After attach, your eBPF code is live and processing events. The links struct holds attachment handles.

4. destroy()

Detaches programs, closes maps, frees memory. Always call this on cleanup. If your process crashes, the kernel automatically detaches and cleans up — eBPF programs cannot outlive their loader unless explicitly pinned.

Advanced: open_opts for customization

/* Use open_opts to customize object name, logging, BTF path */
LIBBPF_OPTS(bpf_object_open_opts, open_opts,
    .btf_custom_path = "/path/to/min_core_btf",  /* for minimal BTF */
);

struct fileopen_bpf *skel = fileopen_bpf__open_opts(&open_opts);
if (!skel) {
    fprintf(stderr, "Failed to open\n");
    return 1;
}

/* Resize a map before loading */
bpf_map__set_max_entries(skel->maps.start_ts, 65536);

/* Disable a program you do not want to attach */
bpf_program__set_autoattach(
    skel->progs.tracepoint__syscalls__sys_exit_openat, false);

/* Now load and selectively attach */
fileopen_bpf__load(skel);
fileopen_bpf__attach(skel);  /* only attaches programs with autoattach=true */

Ring buffers vs. perf buffers

Both ring buffers and perf buffers move data from kernel eBPF programs to userspace. Ring buffers (kernel 5.8+) are strictly superior for almost every use case. Use perf buffers only if you need to support kernels older than 5.8.

Perf buffer (BPF_MAP_TYPE_PERF_EVENT_ARRAY)

Per-CPU: Each CPU has its own ring. Events from CPU 0 go to ring 0, events from CPU 3 go to ring 3. Userspace must poll all rings.
Waste: With 128 CPUs and 64 KB per ring, you allocate 8 MB even if most CPUs are idle.
Ordering: No global ordering. Events from different CPUs arrive out of order.
Copy: Data is copied from the BPF stack into the perf ring (bpf_perf_event_output).

Ring buffer (BPF_MAP_TYPE_RINGBUF)

Shared: One ring shared across all CPUs. Single producer, single consumer.
Efficient: Allocate exactly the memory you need. 256 KB total, not 256 KB per CPU.
Ordered: Events arrive in global order. No cross-CPU sorting needed.
Zero-copy: bpf_ringbuf_reserve returns a pointer directly into the ring. Write in place, then bpf_ringbuf_submit. No intermediate copy.

Ring buffer: complete example

This is the modern approach. The fileopen example above already uses it, but here is the pattern isolated:

/* === BPF side (kernel) === */

struct event {
    __u32 pid;
    char msg[64];
};

struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 256 * 1024);  /* 256 KB */
} rb SEC(".maps");

SEC("tracepoint/sched/sched_process_exec")
int trace_exec(struct trace_event_raw_sched_process_exec *ctx)
{
    struct event *evt;

    /* Reserve space — returns NULL if ring is full */
    evt = bpf_ringbuf_reserve(&rb, sizeof(*evt), 0);
    if (!evt)
        return 0;

    /* Write directly into the ring buffer (zero-copy) */
    evt->pid = bpf_get_current_pid_tgid() >> 32;
    bpf_get_current_comm(evt->msg, sizeof(evt->msg));

    /* Submit — makes the event visible to userspace */
    bpf_ringbuf_submit(evt, 0);
    return 0;
}

/* === Userspace side === */

static int handle_event(void *ctx, void *data, size_t data_sz)
{
    struct event *evt = data;
    printf("pid=%d comm=%s\n", evt->pid, evt->msg);
    return 0;
}

/* In main(): */
struct ring_buffer *rb = ring_buffer__new(
    bpf_map__fd(skel->maps.rb),
    handle_event,
    NULL,    /* context pointer */
    NULL     /* options */
);

while (!exiting) {
    ring_buffer__poll(rb, 100);  /* 100ms timeout */
}

ring_buffer__free(rb);

Perf buffer: complete example (for older kernels)

/* === BPF side (kernel) === */

struct event {
    __u32 pid;
    char msg[64];
};

struct {
    __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
    __uint(key_size, sizeof(__u32));
    __uint(value_size, sizeof(__u32));
} pb SEC(".maps");

SEC("tracepoint/sched/sched_process_exec")
int trace_exec(struct trace_event_raw_sched_process_exec *ctx)
{
    struct event evt = {};

    evt.pid = bpf_get_current_pid_tgid() >> 32;
    bpf_get_current_comm(evt.msg, sizeof(evt.msg));

    /* Copy event into per-CPU perf ring */
    bpf_perf_event_output(ctx, &pb, BPF_F_CURRENT_CPU,
                          &evt, sizeof(evt));
    return 0;
}

/* === Userspace side === */

static void handle_event(void *ctx, int cpu, void *data, __u32 data_sz)
{
    struct event *evt = data;
    printf("[cpu%d] pid=%d comm=%s\n", cpu, evt->pid, evt->msg);
}

static void handle_lost(void *ctx, int cpu, __u64 lost_cnt)
{
    fprintf(stderr, "Lost %llu events on CPU %d\n", lost_cnt, cpu);
}

/* In main(): */
struct perf_buffer *pb = perf_buffer__new(
    bpf_map__fd(skel->maps.pb),
    64,             /* pages per CPU (64 * 4096 = 256 KB per CPU) */
    handle_event,
    handle_lost,
    NULL,           /* context */
    NULL            /* options */
);

while (!exiting) {
    perf_buffer__poll(pb, 100);
}

perf_buffer__free(pb);

When to use which

Ring buffer: Always, on kernel 5.8+. It is faster, uses less memory, preserves ordering, and the API is simpler.
Perf buffer: Only if you must support kernel < 5.8, or if you specifically need per-CPU isolation for extremely high-throughput scenarios where contention on the shared ring becomes measurable (rare — the ring buffer uses a lock-free algorithm that scales well to 256+ CPUs).

Analogy: perf buffer is like having a separate mailbox for each postman. Ring buffer is like having one shared outbox. The shared outbox is simpler and cheaper unless you have 200 postmen all trying to drop letters at the same instant.

Map types deep dive

BPF maps are the primary data structures for eBPF programs. They live in kernel memory, persist across eBPF program invocations, and can be read/written from both kernel and userspace. There are over 30 map types. Here are the ones you will actually use, with complete examples for each.

BPF_MAP_TYPE_HASH

The general-purpose key-value store. Fixed-size keys and values, hash table implementation. O(1) lookup, insert, delete.

/* Definition */
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10240);
    __type(key, __u32);            /* PID */
    __type(value, __u64);          /* timestamp */
} pid_start SEC(".maps");

/* Insert/Update */
__u32 pid = bpf_get_current_pid_tgid() >> 32;
__u64 ts = bpf_ktime_get_ns();
bpf_map_update_elem(&pid_start, &pid, &ts, BPF_ANY);
/*   BPF_ANY     = create or update
 *   BPF_NOEXIST = create only (fail if exists)
 *   BPF_EXIST   = update only (fail if missing) */

/* Lookup */
__u64 *tsp = bpf_map_lookup_elem(&pid_start, &pid);
if (tsp) {
    __u64 latency = bpf_ktime_get_ns() - *tsp;
    /* use latency */
}

/* Delete */
bpf_map_delete_elem(&pid_start, &pid);

/* Userspace: iterate all entries */
/* (in your C loader program) */
__u32 key, next_key;
__u64 value;
int map_fd = bpf_map__fd(skel->maps.pid_start);

key = 0;
while (bpf_map_get_next_key(map_fd, &key, &next_key) == 0) {
    bpf_map_lookup_elem(map_fd, &next_key, &value);
    printf("pid=%u ts=%llu\n", next_key, value);
    key = next_key;
}

BPF_MAP_TYPE_ARRAY

Fixed-size array indexed by integer. All entries pre-allocated at map creation. Cannot delete entries (only zero them). Faster than hash for sequential integer keys.

/* Definition: 256 counters, one per return code */
struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 256);
    __type(key, __u32);
    __type(value, __u64);
} retcode_counts SEC(".maps");

/* Increment a counter (atomic via __sync_fetch_and_add) */
__u32 idx = (ctx->ret >= 0) ? 0 : (-ctx->ret & 0xff);
__u64 *count = bpf_map_lookup_elem(&retcode_counts, &idx);
if (count)
    __sync_fetch_and_add(count, 1);

/* Userspace: read all counters */
for (__u32 i = 0; i < 256; i++) {
    __u64 val;
    bpf_map_lookup_elem(map_fd, &i, &val);
    if (val > 0)
        printf("retcode[%d] = %llu\n", i, val);
}

BPF_MAP_TYPE_PERCPU_HASH and BPF_MAP_TYPE_PERCPU_ARRAY

Same as HASH and ARRAY, but each CPU gets its own copy of every value. No locking, no contention, maximum throughput. The tradeoff: userspace sees an array of values (one per CPU) and must sum them.

/* Definition: per-CPU packet counter */
struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_HASH);
    __uint(max_entries, 10240);
    __type(key, __u32);    /* source IP */
    __type(value, __u64);  /* packet count */
} pkt_count SEC(".maps");

/* BPF side: just increment, no locking needed */
__u32 saddr = iph->saddr;
__u64 *count = bpf_map_lookup_elem(&pkt_count, &saddr);
if (count)
    (*count)++;
else {
    __u64 init = 1;
    bpf_map_update_elem(&pkt_count, &saddr, &init, BPF_NOEXIST);
}

/* Userspace: sum across CPUs */
int ncpus = libbpf_num_possible_cpus();
__u64 values[ncpus];  /* one value per CPU */
__u32 key, next_key;

key = 0;
while (bpf_map_get_next_key(map_fd, &key, &next_key) == 0) {
    bpf_map_lookup_elem(map_fd, &next_key, values);
    __u64 total = 0;
    for (int i = 0; i < ncpus; i++)
        total += values[i];
    printf("ip=%u.%u.%u.%u packets=%llu\n",
        next_key & 0xff, (next_key>>8) & 0xff,
        (next_key>>16) & 0xff, (next_key>>24) & 0xff, total);
    key = next_key;
}

BPF_MAP_TYPE_LRU_HASH

Hash map with LRU eviction. When the map is full and you insert a new entry, the least-recently-used entry is evicted. Perfect for caches and connection tracking where you cannot guarantee cleanup.

/* Definition: cache of recent DNS queries */
struct dns_entry {
    char domain[64];
    __u64 last_seen;
    __u32 query_count;
};

struct {
    __uint(type, BPF_MAP_TYPE_LRU_HASH);
    __uint(max_entries, 4096);
    __type(key, __u32);              /* hash of domain name */
    __type(value, struct dns_entry);
} dns_cache SEC(".maps");

/* Insert: if map is full, LRU entry is automatically evicted */
struct dns_entry entry = {};
__builtin_memcpy(entry.domain, domain, sizeof(entry.domain));
entry.last_seen = bpf_ktime_get_ns();
entry.query_count = 1;
bpf_map_update_elem(&dns_cache, &hash, &entry, BPF_ANY);

/* No need to worry about cleanup — the map manages its own size */

BPF_MAP_TYPE_RINGBUF

Already covered in detail above. Shared ring buffer for high-performance kernel-to-userspace event streaming. Summary of the API:

/* BPF side */
struct event *evt = bpf_ringbuf_reserve(&rb, sizeof(*evt), 0);
if (!evt) return 0;       /* ring full */
evt->field = value;
bpf_ringbuf_submit(evt, 0);  /* or bpf_ringbuf_discard(evt, 0) */

/* Alternatively, for simpler cases (copies data, less efficient): */
struct event evt = { .field = value };
bpf_ringbuf_output(&rb, &evt, sizeof(evt), 0);

/* Userspace side */
struct ring_buffer *rb = ring_buffer__new(fd, callback, ctx, NULL);
ring_buffer__poll(rb, timeout_ms);
ring_buffer__free(rb);

BPF_MAP_TYPE_BLOOM_FILTER

Probabilistic set membership test. Answers "definitely not in set" or "probably in set." Zero false negatives, tunable false positive rate. Perfect for fast-path filtering: check if an IP is in a blocklist before doing expensive map lookups.

/* Definition: bloom filter for blocked IPs */
struct {
    __uint(type, BPF_MAP_TYPE_BLOOM_FILTER);
    __uint(max_entries, 10000);
    __type(value, __u32);  /* IPv4 address */
    __uint(map_extra, 3);  /* number of hash functions (tuning) */
} blocked_ips SEC(".maps");

/* Populate from userspace (e.g., load a blocklist at startup) */
/* In your loader: */
__u32 ip = inet_addr("10.0.0.1");
bpf_map_update_elem(map_fd, NULL, &ip, BPF_ANY);

/* Check membership in BPF */
__u32 saddr = iph->saddr;
if (bpf_map_lookup_elem(&blocked_ips, &saddr) == 0) {
    /* IP is (probably) in the blocklist — drop or log */
    return TC_ACT_SHOT;
}
/* IP is definitely NOT in the blocklist — fast path continues */

BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK

FIFO queue and LIFO stack. No keys — just push and pop values. Useful for work queues between eBPF programs or for collecting events in order.

/* Queue: FIFO */
struct {
    __uint(type, BPF_MAP_TYPE_QUEUE);
    __uint(max_entries, 1024);
    __type(value, struct event);
} event_queue SEC(".maps");

/* Push (enqueue) — fails if queue is full */
struct event evt = { .pid = pid, .ts = ts };
bpf_map_push_elem(&event_queue, &evt, BPF_EXIST);

/* Pop (dequeue) — from userspace */
struct event evt;
while (bpf_map_lookup_and_delete_elem(map_fd, NULL, &evt) == 0) {
    process_event(&evt);
}

/* Stack: LIFO — same API, different order */
struct {
    __uint(type, BPF_MAP_TYPE_STACK);
    __uint(max_entries, 1024);
    __type(value, struct event);
} event_stack SEC(".maps");

/* Push onto stack */
bpf_map_push_elem(&event_stack, &evt, BPF_EXIST);

/* Pop from stack (most recent first) */
bpf_map_lookup_and_delete_elem(map_fd, NULL, &evt);

BPF_MAP_TYPE_LPM_TRIE

Longest-prefix-match trie. Designed for IP routing and CIDR matching. Given an IP address, finds the most specific matching prefix in the trie.

/* Key structure for LPM trie */
struct lpm_key {
    __u32 prefixlen;    /* number of significant bits */
    __u32 addr;         /* IPv4 address in network byte order */
};

/* Definition */
struct {
    __uint(type, BPF_MAP_TYPE_LPM_TRIE);
    __uint(max_entries, 10000);
    __type(key, struct lpm_key);
    __type(value, __u32);          /* action: 0=allow, 1=deny */
    __uint(map_flags, BPF_F_NO_PREALLOC);
} cidr_policy SEC(".maps");

/* Populate from userspace */
struct lpm_key key;
__u32 action;

/* 10.0.0.0/8 -> allow */
key.prefixlen = 8;
key.addr = htonl(0x0A000000);  /* 10.0.0.0 */
action = 0;  /* allow */
bpf_map_update_elem(map_fd, &key, &action, BPF_ANY);

/* 10.0.0.0/24 -> deny (more specific, wins for 10.0.0.x) */
key.prefixlen = 24;
key.addr = htonl(0x0A000000);
action = 1;  /* deny */
bpf_map_update_elem(map_fd, &key, &action, BPF_ANY);

/* Lookup in BPF (finds longest matching prefix) */
struct lpm_key lookup = {
    .prefixlen = 32,          /* exact IP */
    .addr = iph->saddr,
};
__u32 *act = bpf_map_lookup_elem(&cidr_policy, &lookup);
if (act && *act == 1)
    return TC_ACT_SHOT;  /* denied by CIDR policy */

Map type selection guide

Key-value with known keys: HASH (or PERCPU_HASH for high throughput).
Sequential integer keys: ARRAY (pre-allocated, fast).
Bounded cache, no cleanup: LRU_HASH (auto-evicts oldest).
Kernel-to-user events: RINGBUF (or PERF_EVENT_ARRAY for < 5.8).
Fast set membership: BLOOM_FILTER (probabilistic, fast).
FIFO/LIFO processing: QUEUE/STACK.
IP/CIDR matching: LPM_TRIE.
Chaining programs: PROG_ARRAY (see tail calls below).

Global variables: .rodata and .bss

Global variables in eBPF programs are the cleanest way to pass configuration from userspace to kernel. They map to ELF sections: const volatile variables go into .rodata (read-only after load), non-const globals go into .bss (read-write, accessible from both sides).

.rodata: read-only configuration

/* BPF side: declare const volatile globals */
const volatile __u32 target_pid = 0;        /* 0 = trace all */
const volatile __u32 target_uid = 0;        /* 0 = trace all */
const volatile __u64 min_duration_ns = 0;   /* 0 = no filter */
const volatile bool verbose = false;

SEC("kprobe/vfs_read")
int trace_read(struct pt_regs *ctx)
{
    __u32 pid = bpf_get_current_pid_tgid() >> 32;
    __u32 uid = bpf_get_current_uid_gid() & 0xffffffff;

    /* These comparisons use values set by userspace before load */
    if (target_pid && pid != target_pid)
        return 0;
    if (target_uid && uid != target_uid)
        return 0;

    /* ... rest of probe logic ... */
    return 0;
}

/* Userspace: set rodata BEFORE load */
skel = my_tool_bpf__open();

/* Set config values — these are baked into .rodata at load time */
skel->rodata->target_pid = 1234;
skel->rodata->target_uid = 1000;
skel->rodata->min_duration_ns = 1000000;  /* 1ms */
skel->rodata->verbose = true;

/* After this call, rodata is frozen — cannot change */
my_tool_bpf__load(skel);
my_tool_bpf__attach(skel);

.bss: read-write global state

/* BPF side: mutable globals (NOT const volatile) */
__u64 total_bytes = 0;
__u64 total_calls = 0;
bool enabled = true;

SEC("kprobe/vfs_write")
int trace_write(struct pt_regs *ctx)
{
    if (!enabled)
        return 0;

    __u64 count = PT_REGS_PARM3(ctx);
    __sync_fetch_and_add(&total_bytes, count);
    __sync_fetch_and_add(&total_calls, 1);
    return 0;
}

/* Userspace: read and write .bss at any time */
my_tool_bpf__load(skel);
my_tool_bpf__attach(skel);

/* Read counters from userspace */
printf("bytes=%llu calls=%llu\n",
    skel->bss->total_bytes, skel->bss->total_calls);

/* Toggle tracing on/off from userspace */
skel->bss->enabled = false;   /* pauses the BPF program's logic */
sleep(5);
skel->bss->enabled = true;    /* resumes */

const volatile: why both keywords?

const tells the compiler the value does not change after initialization, enabling optimizations (dead code elimination if the value is 0). volatile prevents the compiler from optimizing away the variable entirely — without it, the compiler might inline the initial value and never emit a relocation for libbpf to patch. You need both: const for optimization, volatile for relocation.

Analogy: const says "this value is fixed after install." volatile says "but do not hardcode it during manufacturing — leave a blank for the installer to fill in."

Tail calls: chaining eBPF programs

A single eBPF program has a complexity limit (1 million verified instructions as of kernel 5.2+). For very complex logic, you can chain multiple programs using tail calls. A tail call replaces the current program with another program — like execve() for eBPF. The called program gets the same context (registers, stack is reset). You can chain up to 33 tail calls deep.

How it works

/* PROG_ARRAY map: maps integer index to program fd */
struct {
    __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
    __uint(max_entries, 8);
    __type(key, __u32);
    __type(value, __u32);
} progs SEC(".maps");

/* Program 0: entry point — classifies and dispatches */
SEC("tracepoint/syscalls/sys_enter_openat")
int classify(struct trace_event_raw_sys_enter *ctx)
{
    __u32 uid = bpf_get_current_uid_gid() & 0xffffffff;

    if (uid == 0)
        bpf_tail_call(ctx, &progs, 1);  /* root user -> program 1 */
    else if (uid < 1000)
        bpf_tail_call(ctx, &progs, 2);  /* system user -> program 2 */
    else
        bpf_tail_call(ctx, &progs, 3);  /* regular user -> program 3 */

    /* If tail call fails (program not loaded), execution continues here */
    return 0;
}

/* Program 1: handle root operations */
SEC("tracepoint/syscalls/sys_enter_openat")
int handle_root(struct trace_event_raw_sys_enter *ctx)
{
    /* Log all root file opens with full detail */
    struct event *evt = bpf_ringbuf_reserve(&rb, sizeof(*evt), 0);
    if (!evt) return 0;
    evt->pid = bpf_get_current_pid_tgid() >> 32;
    evt->uid = 0;
    evt->severity = SEVERITY_HIGH;
    bpf_get_current_comm(evt->comm, sizeof(evt->comm));
    const char *fname = (const char *)ctx->args[1];
    bpf_probe_read_user_str(evt->filename, sizeof(evt->filename), fname);
    bpf_ringbuf_submit(evt, 0);
    return 0;
}

/* Program 2: handle system users */
SEC("tracepoint/syscalls/sys_enter_openat")
int handle_system(struct trace_event_raw_sys_enter *ctx)
{
    /* Only log writes to /etc/ */
    char fname[64] = {};
    bpf_probe_read_user_str(fname, sizeof(fname), (const char *)ctx->args[1]);
    if (fname[0] == '/' && fname[1] == 'e' && fname[2] == 't' && fname[3] == 'c') {
        /* ... emit event ... */
    }
    return 0;
}

/* Program 3: handle regular users */
SEC("tracepoint/syscalls/sys_enter_openat")
int handle_user(struct trace_event_raw_sys_enter *ctx)
{
    /* Lightweight: just count */
    __u32 pid = bpf_get_current_pid_tgid() >> 32;
    __u64 *cnt = bpf_map_lookup_elem(&user_open_count, &pid);
    if (cnt)
        __sync_fetch_and_add(cnt, 1);
    else {
        __u64 init = 1;
        bpf_map_update_elem(&user_open_count, &pid, &init, BPF_NOEXIST);
    }
    return 0;
}

Attaching tail call targets in userspace

/* After skeleton load, populate the PROG_ARRAY */
int prog_map_fd = bpf_map__fd(skel->maps.progs);

/* Map index 1 -> handle_root program */
int root_fd = bpf_program__fd(skel->progs.handle_root);
__u32 key = 1;
bpf_map_update_elem(prog_map_fd, &key, &root_fd, BPF_ANY);

/* Map index 2 -> handle_system program */
int sys_fd = bpf_program__fd(skel->progs.handle_system);
key = 2;
bpf_map_update_elem(prog_map_fd, &key, &sys_fd, BPF_ANY);

/* Map index 3 -> handle_user program */
int user_fd = bpf_program__fd(skel->progs.handle_user);
key = 3;
bpf_map_update_elem(prog_map_fd, &key, &user_fd, BPF_ANY);

/* Only attach the entry point — tail call targets are invoked via the map */
bpf_program__attach(skel->progs.classify);

Tail call gotchas

Max depth is 33. After 33 consecutive tail calls, the next bpf_tail_call silently fails and execution continues at the call site.
Stack is reset. Local variables from the calling program are gone. Pass data through maps or the context.
Same program type. All programs in a tail call chain must be the same type (all tracepoint, all kprobe, etc.).
Silent failure. If the target index is empty or the program type mismatches, bpf_tail_call returns and execution continues. Always handle the fallthrough case.

BPF helpers: the essential toolkit

BPF helpers are kernel functions that eBPF programs are allowed to call. They are the only way to interact with kernel state, read memory, get timestamps, and communicate with userspace. Here are the ones you will use in every project.

bpf_probe_read_kernel / bpf_probe_read_user

/* Read kernel memory safely (e.g., struct fields via pointers) */
struct task_struct *task = (void *)bpf_get_current_task();
char comm[16];
bpf_probe_read_kernel(comm, sizeof(comm), task->comm);

/* Read a kernel string */
char *name_ptr;
bpf_probe_read_kernel(&name_ptr, sizeof(name_ptr), &dentry->d_name.name);
char name[64];
bpf_probe_read_kernel_str(name, sizeof(name), name_ptr);

/* Read user memory (e.g., syscall arguments) */
char filename[256];
bpf_probe_read_user_str(filename, sizeof(filename), user_ptr);

/* CO-RE equivalents (preferred — handles field relocation) */
char comm[16];
BPF_CORE_READ_STR_INTO(comm, task, comm);

/* Read nested fields with CO-RE (task->mm->arg_start) */
unsigned long arg_start;
BPF_CORE_READ(&arg_start, task, mm, arg_start);

bpf_get_current_pid_tgid

/* Returns (tgid << 32) | tid
 * tgid = process ID (what userspace calls "pid")
 * tid  = thread ID (what userspace calls "tid")
 */
__u64 pid_tgid = bpf_get_current_pid_tgid();
__u32 pid = pid_tgid >> 32;          /* process ID */
__u32 tid = (__u32)pid_tgid;         /* thread ID */

/* Common pattern: filter by PID */
if (target_pid && pid != target_pid)
    return 0;

bpf_ktime_get_ns

/* Monotonic clock in nanoseconds — for latency measurement */
__u64 start = bpf_ktime_get_ns();
/* ... later ... */
__u64 end = bpf_ktime_get_ns();
__u64 latency_ns = end - start;
__u64 latency_us = latency_ns / 1000;
__u64 latency_ms = latency_ns / 1000000;

/* For wall-clock time, use bpf_ktime_get_boot_ns() (kernel 5.8+)
 * which includes time spent suspended */

bpf_ringbuf_reserve / bpf_ringbuf_submit / bpf_ringbuf_discard

/* Reserve space in the ring buffer (zero-copy) */
struct event *evt = bpf_ringbuf_reserve(&rb, sizeof(*evt), 0);
if (!evt)
    return 0;  /* ring full — event dropped */

/* Fill in the event (writing directly into ring buffer memory) */
evt->pid = bpf_get_current_pid_tgid() >> 32;
evt->ts = bpf_ktime_get_ns();

/* Decision point: submit or discard */
if (should_report(evt))
    bpf_ringbuf_submit(evt, 0);    /* visible to userspace */
else
    bpf_ringbuf_discard(evt, 0);   /* space reclaimed */

/* IMPORTANT: after reserve, you MUST call either submit or discard.
 * Failing to do so leaks ring buffer space until the program exits. */

/* BPF_RB_FORCE_WAKEUP flag: immediately wake the polling thread */
bpf_ringbuf_submit(evt, BPF_RB_FORCE_WAKEUP);

/* BPF_RB_NO_WAKEUP flag: batch notifications for throughput */
bpf_ringbuf_submit(evt, BPF_RB_NO_WAKEUP);

bpf_map_lookup_elem / bpf_map_update_elem / bpf_map_delete_elem

/* These work with ANY map type from BPF programs */

/* Lookup: returns pointer to value, or NULL */
__u64 *val = bpf_map_lookup_elem(&my_map, &key);
if (val) {
    /* IMPORTANT: val is a pointer into map memory.
     * Reads are safe. Writes modify the map directly.
     * The verifier tracks this pointer — you cannot
     * save it and use it after the program returns. */
    __u64 current = *val;
    *val = current + 1;  /* direct in-place update */
}

/* Update: insert or update entry */
__u64 new_val = 42;
bpf_map_update_elem(&my_map, &key, &new_val, BPF_ANY);

/* Delete: remove entry (hash maps only, not arrays) */
bpf_map_delete_elem(&my_map, &key);

bpf_trace_printk (debugging only)

/* Prints to /sys/kernel/debug/tracing/trace_pipe
 * LIMITED: max 3 format arguments, max 5 uses per program
 * Use for debugging ONLY — not for production output */

bpf_printk("pid=%d comm=%s fd=%d\n", pid, comm, fd);

/* Read output: */
/* sudo cat /sys/kernel/debug/tracing/trace_pipe */

/* For production, always use ring buffers or perf buffers
 * to send structured events to userspace. */

bpf_snprintf (kernel 5.13+)

/* Format a string into a buffer inside the BPF program */
char buf[128];
long ret = BPF_SNPRINTF(buf, sizeof(buf), "pid=%d file=%s",
                         pid, filename);

/* Useful for building structured log messages in-kernel
 * before sending to userspace via ring buffer */

bpf_get_current_task / bpf_get_current_task_btf

/* Get the current task_struct */
struct task_struct *task = (void *)bpf_get_current_task();

/* With BTF (preferred — enables CO-RE field access): */
struct task_struct *task = bpf_get_current_task_btf();

/* Read process info */
pid_t ppid;
BPF_CORE_READ(&ppid, task, real_parent, tgid);

/* Read cgroup info */
__u64 cgroup_id = bpf_get_current_cgroup_id();

/* Read namespace info */
unsigned int inum;
BPF_CORE_READ(&inum, task, nsproxy, pid_ns_for_children, ns.inum);

The number one mistake I see in eBPF programs is using bpf_trace_printk for output. It works for debugging, but it is a fixed-size kernel trace buffer shared with ftrace, limited to 3 arguments, and you have to parse strings on the consumer side. Ring buffers with typed structs are faster, cleaner, and scale to millions of events per second. Invest 10 minutes learning the ring buffer API and never look back.

Verifier deep dive

The BPF verifier is the gatekeeper. Every eBPF program must pass verification before the kernel will execute it. The verifier performs static analysis: it walks every possible execution path, tracks the type and value range of every register, and rejects programs that could crash the kernel, access invalid memory, or run forever. Understanding the verifier is essential for writing non-trivial eBPF programs.

The contract: The verifier exists to guarantee safety. An eBPF program that passes verification will not crash the kernel, will not access memory outside its bounds, will terminate in bounded time, and will not leak kernel pointers to userspace. In exchange for this safety guarantee, you must write programs that the verifier can prove safe — which sometimes means writing code in ways that seem redundant to a human but are necessary for the static analyzer.

What the verifier checks

Check	What it means	Example rejection
Memory safety	Every pointer dereference is within bounds	"R0 invalid mem access 'map_value_or_null'"
No null dereferences	Map lookups must be null-checked before use	"R0 invalid mem access 'map_value_or_null'"
No unreachable code	Every instruction is reachable	"unreachable insn"
No back-edges (loops)	Pre-5.3: no loops. Post-5.3: bounded loops only	"back-edge from insn X to Y"
Bounded complexity	Total verified instructions < 1,000,000	"program is too large"
Stack limit	Stack usage ≤ 512 bytes per program	"combined stack size exceeds 512 bytes"
Pointer arithmetic	Pointer math must stay within object bounds	"R1 pointer arithmetic on map_value prohibited"
Helper return values	Return values tracked, null-checked before use	"R0 invalid mem access"
License check	GPL-only helpers require GPL license	"cannot call GPL-restricted function"

Common rejections and fixes

"R0 invalid mem access 'map_value_or_null'"

/* REJECTED: using map lookup result without null check */
__u64 *val = bpf_map_lookup_elem(&my_map, &key);
*val += 1;  /* CRASH: val might be NULL */

/* FIXED: always null-check map lookups */
__u64 *val = bpf_map_lookup_elem(&my_map, &key);
if (val)
    *val += 1;  /* verifier knows val is non-NULL here */

"back-edge from insn X to Y" (loops)

/* REJECTED (kernel < 5.3): any loop */
for (int i = 0; i < n; i++) { ... }

/* FIXED for kernel < 5.3: unroll manually or use #pragma unroll */
#pragma unroll
for (int i = 0; i < 16; i++) { ... }  /* compile-time constant bound */

/* FIXED for kernel >= 5.3: bounded loops are allowed */
/* The verifier tracks the loop variable's range.
 * The loop MUST have a clear upper bound: */
for (int i = 0; i < 256 && i < n; i++) {
    /* verifier can prove this terminates in <= 256 iterations */
}

/* STILL REJECTED: unbounded loop */
while (ptr != NULL) {
    ptr = ptr->next;  /* verifier cannot prove termination */
}

"program is too large" (complexity limit)

/* The verifier explores every execution path. With many branches,
 * the number of paths explodes exponentially.
 *
 * MITIGATION 1: Reduce branching. Move complex logic to userspace.
 *
 * MITIGATION 2: Use tail calls to split the program:
 */
SEC("kprobe/vfs_read")
int entry(struct pt_regs *ctx)
{
    /* Simple classification */
    if (should_trace_detailed())
        bpf_tail_call(ctx, &progs, DETAILED_HANDLER);
    else
        bpf_tail_call(ctx, &progs, SIMPLE_HANDLER);
    return 0;
}

/* MITIGATION 3: Use global functions (kernel 5.6+) */
/* Global (non-static) functions are verified independently,
 * reducing the per-function complexity: */
__noinline int process_event(struct event *evt)
{
    /* This function is verified separately */
    /* ... complex logic ... */
    return 0;
}

SEC("tracepoint/...")
int main_prog(void *ctx)
{
    struct event evt = {};
    return process_event(&evt);
}

"combined stack size exceeds 512 bytes"

/* REJECTED: too much stack */
SEC("kprobe/vfs_read")
int trace(struct pt_regs *ctx)
{
    char buf1[256];  /* 256 bytes */
    char buf2[256];  /* 256 bytes — total 512 */
    int x;           /* 4 more bytes — OVER LIMIT */
}

/* FIXED: use per-CPU arrays as scratch space */
struct scratch {
    char buf1[256];
    char buf2[256];
};

struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, struct scratch);
} heap SEC(".maps");

SEC("kprobe/vfs_read")
int trace(struct pt_regs *ctx)
{
    __u32 zero = 0;
    struct scratch *s = bpf_map_lookup_elem(&heap, &zero);
    if (!s) return 0;

    /* Now you have 512 bytes of scratch space from the map,
     * without using any stack */
    bpf_probe_read_user_str(s->buf1, sizeof(s->buf1), ptr1);
    bpf_probe_read_user_str(s->buf2, sizeof(s->buf2), ptr2);
    return 0;
}

"cannot call GPL-restricted function"

/* REJECTED: missing or wrong license */
char LICENSE[] SEC("license") = "Proprietary";
/* ... program that calls bpf_probe_read_kernel ... */

/* FIXED: use GPL license */
char LICENSE[] SEC("license") = "GPL";
/* or "Dual BSD/GPL", "GPL v2", etc.
 *
 * Most BPF helpers require GPL. If you are writing a proprietary
 * program, you are limited to a small subset of helpers.
 * In practice: just use GPL. */

Debugging verifier output

/* Increase verifier log verbosity in your loader */
LIBBPF_OPTS(bpf_object_open_opts, opts);
struct my_prog_bpf *skel = my_prog_bpf__open_opts(&opts);

/* Set log level before load */
bpf_program__set_log_level(skel->progs.my_func, 2);
/*   0 = only errors
 *   1 = errors + warnings
 *   2 = full verifier log (every instruction analyzed) */

/* The verifier log shows register states at each instruction:
 *
 * 0: (bf) r6 = r1           ; R6=ctx
 * 1: (85) call bpf_get_current_pid_tgid#14  ; R0=inv
 * 2: (77) r0 >>= 32         ; R0=inv(id=0,umax=4294967295)
 * 3: (18) r1 = map[id:1]    ; R1=map_ptr(ks=4,vs=8)
 * 4: (85) call bpf_map_lookup_elem#1  ; R0=map_value_or_null
 * 5: (15) if r0 == 0x0 goto +3  ; R0=map_value_or_null
 *    TRUE:  R0=0 (null)       — branch taken when R0 is null
 *    FALSE: R0=map_value      — fall through when R0 is non-null
 * 6: (79) r1 = *(u64 *)(r0 +0)  ; R0=map_value — SAFE
 */

The verifier is your friend

New eBPF developers fight the verifier. Experienced eBPF developers appreciate it. Every verifier rejection is the kernel telling you "this code could crash me, and here is why." The fix is almost always straightforward: add a null check, bound a loop, reduce stack usage, or split into smaller programs. The verifier catches real bugs. I have lost count of the null pointer dereferences it has caught in my code that would have been kernel panics in a kernel module.

Analogy: the verifier is like a building inspector. You might grumble when they reject your plan, but they are preventing your house from collapsing. Learn to read their report and you will build better programs.

Testing eBPF programs

eBPF programs run in kernel context, which makes testing harder than userspace code. But the kernel provides BPF_PROG_TEST_RUN (also called BPF_PROG_RUN) which lets you feed synthetic input to a loaded BPF program and check its output — without attaching it to real events.

BPF_PROG_TEST_RUN: unit testing eBPF

/* test_fileopen.c - unit tests for the file-open eBPF program */

#include 
#include 
#include 
#include 
#include 
#include 
#include "fileopen.skel.h"

/* Test helper: run a BPF program with synthetic context */
static int run_prog(int prog_fd, void *ctx_in, __u32 ctx_size,
                    __u32 *retval)
{
    LIBBPF_OPTS(bpf_test_run_opts, opts,
        .ctx_in = ctx_in,
        .ctx_size_in = ctx_size,
    );

    int err = bpf_prog_test_run_opts(prog_fd, &opts);
    if (err)
        return err;
    *retval = opts.retval;
    return 0;
}

int main(void)
{
    struct fileopen_bpf *skel;
    int err;

    /* Load but do NOT attach (we will invoke programs manually) */
    skel = fileopen_bpf__open();
    assert(skel != NULL);

    /* Configure for testing */
    skel->rodata->target_uid = 1000;
    skel->rodata->min_latency_ns = 0;

    err = fileopen_bpf__load(skel);
    assert(err == 0);

    /* Test 1: verify the program loads and passes verification */
    printf("TEST 1: program loads successfully ... PASS\n");

    /* Test 2: verify map creation */
    int events_fd = bpf_map__fd(skel->maps.events);
    assert(events_fd > 0);
    printf("TEST 2: ring buffer map created ... PASS\n");

    int start_ts_fd = bpf_map__fd(skel->maps.start_ts);
    assert(start_ts_fd > 0);
    printf("TEST 3: start_ts hash map created ... PASS\n");

    /* Test 4: verify global variable was set */
    assert(skel->rodata->target_uid == 1000);
    printf("TEST 4: global variable target_uid=1000 ... PASS\n");

    /* Test 5: insert and lookup from hash map */
    __u32 test_tid = 12345;
    __u64 test_ts = 99999;
    err = bpf_map_update_elem(start_ts_fd, &test_tid, &test_ts, BPF_ANY);
    assert(err == 0);

    __u64 read_ts;
    err = bpf_map_lookup_elem(start_ts_fd, &test_tid, &read_ts);
    assert(err == 0);
    assert(read_ts == test_ts);
    printf("TEST 5: hash map insert/lookup ... PASS\n");

    /* Test 6: delete from hash map */
    err = bpf_map_delete_elem(start_ts_fd, &test_tid);
    assert(err == 0);
    err = bpf_map_lookup_elem(start_ts_fd, &test_tid, &read_ts);
    assert(err != 0);  /* should fail — deleted */
    printf("TEST 6: hash map delete ... PASS\n");

    /* Cleanup */
    fileopen_bpf__destroy(skel);

    printf("\nAll tests passed.\n");
    return 0;
}

/* Compile: gcc -g -O2 test_fileopen.c -lbpf -lelf -lz -o test_fileopen */
/* Run:     sudo ./test_fileopen */

Integration testing with bpftrace

# Quick smoke test: run your tool in the background,
# generate known events, check the output

# Start the tool
sudo ./fileopen > /tmp/fileopen-output.txt &
TOOL_PID=$!

# Give it a moment to attach
sleep 1

# Generate known events
touch /tmp/test-ebpf-file-1
cat /etc/hostname > /dev/null
ls /nonexistent 2>/dev/null

# Give events time to arrive
sleep 1

# Check output
kill $TOOL_PID
wait $TOOL_PID 2>/dev/null

# Verify we captured the events
grep "test-ebpf-file-1" /tmp/fileopen-output.txt && echo "PASS: file create captured"
grep "hostname" /tmp/fileopen-output.txt && echo "PASS: file read captured"

# Cleanup
rm -f /tmp/test-ebpf-file-1 /tmp/fileopen-output.txt

Testing with BPF_PROG_TEST_RUN for XDP/TC programs

/* XDP and TC programs support full packet-level testing */

/* Create a synthetic packet */
__u8 pkt[] = {
    /* Ethernet header */
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff,  /* dst MAC */
    0x00, 0x11, 0x22, 0x33, 0x44, 0x55,  /* src MAC */
    0x08, 0x00,                            /* EtherType: IPv4 */
    /* IPv4 header (simplified) */
    0x45, 0x00, 0x00, 0x28,  /* ver, ihl, len */
    0x00, 0x00, 0x00, 0x00,  /* id, flags, frag */
    0x40, 0x06, 0x00, 0x00,  /* ttl=64, proto=TCP, checksum */
    0x0a, 0x00, 0x00, 0x01,  /* src: 10.0.0.1 */
    0x0a, 0x00, 0x00, 0x02,  /* dst: 10.0.0.2 */
    /* TCP header */
    0x00, 0x50, 0x01, 0xbb,  /* src port: 80, dst port: 443 */
    0x00, 0x00, 0x00, 0x00,  /* seq */
    0x00, 0x00, 0x00, 0x00,  /* ack */
    0x50, 0x02, 0xff, 0xff,  /* offset, SYN, window */
    0x00, 0x00, 0x00, 0x00,  /* checksum, urgent */
};

LIBBPF_OPTS(bpf_test_run_opts, opts,
    .data_in = pkt,
    .data_size_in = sizeof(pkt),
    .data_out = NULL,
    .data_size_out = 0,
    .repeat = 1,
);

int prog_fd = bpf_program__fd(skel->progs.xdp_filter);
int err = bpf_prog_test_run_opts(prog_fd, &opts);
assert(err == 0);

/* Check return value */
if (opts.retval == XDP_DROP)
    printf("Packet dropped (expected for blocked IP)\n");
else if (opts.retval == XDP_PASS)
    printf("Packet passed (expected for allowed IP)\n");

/* For performance testing, use repeat: */
opts.repeat = 1000000;
err = bpf_prog_test_run_opts(prog_fd, &opts);
printf("1M packets processed in %u ns (%.1f ns/pkt)\n",
    opts.duration, (double)opts.duration / 1e6);

Testing eBPF programs is the part that most tutorials skip. "Just load it and see if it works" is not a testing strategy. BPF_PROG_TEST_RUN gives you deterministic, repeatable tests that catch regressions before you deploy. I run these in CI for every eBPF tool I ship. The first time a CI test caught a regression from a kernel version change, it paid for itself.

Packaging and deployment

Once your eBPF tool works, you need to ship it to production machines. The goal: a single binary or package that runs on any kernel version without requiring development tools on the target.

Static binaries

# Build a fully static binary
gcc -g -O2 -Wall -static fileopen.c \
    -lbpf -lelf -lz -lzstd -o fileopen

# Verify it is static
file fileopen
# fileopen: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux),
# statically linked, ...

ldd fileopen
# not a dynamic executable

# Size check (typical: 2-5 MB with embedded eBPF object)
ls -lh fileopen
# -rwxr-xr-x 1 root root 3.2M Apr  4 14:30 fileopen

# Deploy: just copy it
scp fileopen root@prod-server:/usr/local/bin/

Minimal BTF with min_core_btf

CO-RE requires BTF type information from the running kernel. Most kernels have it at /sys/kernel/btf/vmlinux, but some minimal or older kernels do not. bpftool gen min_core_btf extracts only the types your program actually uses and bundles them with the binary.

# Generate minimal BTF for your specific program
bpftool gen min_core_btf /sys/kernel/btf/vmlinux \
    fileopen.bpf.o fileopen.min_btf

# Check the size difference
ls -lh /sys/kernel/btf/vmlinux  # ~5 MB (full kernel BTF)
ls -lh fileopen.min_btf         # ~8 KB (just the types you use)

# In your loader, pass the custom BTF path:
LIBBPF_OPTS(bpf_object_open_opts, opts,
    .btf_custom_path = "/usr/share/fileopen/min_core_btf",
);
skel = fileopen_bpf__open_opts(&opts);

# Bundle the min_core_btf with your package

RPM packaging

cat > fileopen.spec << 'SPECEOF'
Name:           fileopen-ebpf
Version:        1.0.0
Release:        1%{?dist}
Summary:        eBPF file open tracer
License:        GPL-2.0-only
URL:            https://example.com/fileopen

# No BuildRequires on target — binary is pre-built
Source0:        fileopen
Source1:        fileopen.service
Source2:        fileopen.min_btf

%description
Traces all file open operations with process info, UID, and latency.
Uses eBPF with CO-RE for kernel portability.

%install
mkdir -p %{buildroot}/usr/local/bin
mkdir -p %{buildroot}/usr/share/fileopen
mkdir -p %{buildroot}/usr/lib/systemd/system

install -m 0755 %{SOURCE0} %{buildroot}/usr/local/bin/fileopen
install -m 0644 %{SOURCE1} %{buildroot}/usr/lib/systemd/system/fileopen.service
install -m 0644 %{SOURCE2} %{buildroot}/usr/share/fileopen/min_core_btf

%files
/usr/local/bin/fileopen
/usr/lib/systemd/system/fileopen.service
/usr/share/fileopen/min_core_btf

%post
systemctl daemon-reload
systemctl enable fileopen

%preun
systemctl disable --now fileopen

%postun
systemctl daemon-reload
SPECEOF

# Build the RPM (on your build machine)
rpmbuild -bb fileopen.spec \
    --define "_sourcedir $(pwd)" \
    --define "_rpmdir $(pwd)/rpms"

DEB packaging

# Create package structure
mkdir -p fileopen-ebpf_1.0.0/DEBIAN
mkdir -p fileopen-ebpf_1.0.0/usr/local/bin
mkdir -p fileopen-ebpf_1.0.0/usr/share/fileopen
mkdir -p fileopen-ebpf_1.0.0/usr/lib/systemd/system

cp fileopen fileopen-ebpf_1.0.0/usr/local/bin/
cp fileopen.min_btf fileopen-ebpf_1.0.0/usr/share/fileopen/
cp fileopen.service fileopen-ebpf_1.0.0/usr/lib/systemd/system/

cat > fileopen-ebpf_1.0.0/DEBIAN/control << 'EOF'
Package: fileopen-ebpf
Version: 1.0.0
Architecture: amd64
Maintainer: Your Name 
Description: eBPF file open tracer
 Traces all file open operations with process info, UID,
 and latency. Uses eBPF with CO-RE for kernel portability.
Depends: linux-image-5.8+
Section: admin
Priority: optional
EOF

cat > fileopen-ebpf_1.0.0/DEBIAN/postinst << 'EOF'
#!/bin/sh
systemctl daemon-reload
systemctl enable fileopen
EOF
chmod +x fileopen-ebpf_1.0.0/DEBIAN/postinst

cat > fileopen-ebpf_1.0.0/DEBIAN/prerm << 'EOF'
#!/bin/sh
systemctl disable --now fileopen
EOF
chmod +x fileopen-ebpf_1.0.0/DEBIAN/prerm

# Build the .deb
dpkg-deb --build fileopen-ebpf_1.0.0
# fileopen-ebpf_1.0.0.deb created

# Install on target
dpkg -i fileopen-ebpf_1.0.0.deb

Deployment checklist

1. Compile as a static binary (no runtime dependencies).
2. Generate min_core_btf and bundle it (for kernels without /sys/kernel/btf/vmlinux).
3. Write a systemd service file with Restart=always.
4. Package as RPM and/or DEB for your target distros.
5. Test on the oldest kernel version you support.
6. Add logrotate configuration for any output files.

Complete project: TCP connection tracker

This is a production-grade TCP connection tracker built with libbpf CO-RE. It traces all TCP connect and accept events, measures connection establishment latency, tracks bytes sent/received per connection, and detects connections to suspicious ports. The entire tool compiles to a single static binary.

This is the tool I wish existed when I started doing eBPF. Every production incident I have worked starts with "what is connecting to what?" This gives you that answer in real time, with latency and volume data, at zero measurable overhead.

tcptrack.h — shared types

cat > tcptrack.h << 'EOF'
/* tcptrack.h - shared types for TCP connection tracker */
#ifndef __TCPTRACK_H
#define __TCPTRACK_H

#define MAX_COMM_LEN 16

enum event_type {
    EVENT_CONNECT = 1,   /* outgoing connection established */
    EVENT_ACCEPT  = 2,   /* incoming connection accepted */
    EVENT_CLOSE   = 3,   /* connection closed */
};

struct tcp_event {
    __u32 pid;
    __u32 uid;
    __u32 saddr;
    __u32 daddr;
    __u16 sport;
    __u16 dport;
    __u64 ts_ns;
    __u64 latency_ns;     /* connect latency (connect events only) */
    __u64 bytes_sent;     /* total bytes sent (close events only) */
    __u64 bytes_received; /* total bytes received (close events only) */
    __u32 event_type;     /* enum event_type */
    __u32 netns;          /* network namespace inode */
    char  comm[MAX_COMM_LEN];
};

#endif /* __TCPTRACK_H */
EOF

tcptrack.bpf.c — kernel side

cat > tcptrack.bpf.c << 'EOF'
/* tcptrack.bpf.c - trace TCP connections with latency and volume */

#include "vmlinux.h"
#include 
#include 
#include 
#include "tcptrack.h"

/* Configuration (set from userspace before load) */
const volatile __u32 target_pid = 0;
const volatile __u16 target_port = 0;
const volatile bool trace_connects = true;
const volatile bool trace_accepts = true;
const volatile bool trace_closes = true;

/* Ring buffer for events */
struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 512 * 1024);
} events SEC(".maps");

/* Track connect start times: tid -> timestamp */
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 16384);
    __type(key, __u32);
    __type(value, __u64);
} connect_start SEC(".maps");

/* Track sock pointers for connect: tid -> sock* */
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 16384);
    __type(key, __u32);
    __type(value, __u64);
} sock_store SEC(".maps");

/* Helper: read IPv4 tuple from sock */
static __always_inline void read_sock_tuple(struct sock *sk,
    __u32 *saddr, __u32 *daddr, __u16 *sport, __u16 *dport)
{
    BPF_CORE_READ_INTO(saddr, sk, __sk_common.skc_rcv_saddr);
    BPF_CORE_READ_INTO(daddr, sk, __sk_common.skc_daddr);
    BPF_CORE_READ_INTO(sport, sk, __sk_common.skc_num);
    __u16 dport_be;
    BPF_CORE_READ_INTO(&dport_be, sk, __sk_common.skc_dport);
    *dport = __bpf_ntohs(dport_be);
}

/* Helper: get network namespace inode */
static __always_inline __u32 get_netns(struct sock *sk)
{
    __u32 inum = 0;
    struct net *net;
    BPF_CORE_READ_INTO(&net, sk, __sk_common.skc_net.net);
    if (net)
        BPF_CORE_READ_INTO(&inum, net, ns.inum);
    return inum;
}

/* ── Outgoing connections ─────────────────────────────── */

SEC("kprobe/tcp_v4_connect")
int BPF_KPROBE(tcp_connect_entry, struct sock *sk)
{
    if (!trace_connects)
        return 0;

    __u32 pid = bpf_get_current_pid_tgid() >> 32;
    if (target_pid && pid != target_pid)
        return 0;

    __u32 tid = (__u32)bpf_get_current_pid_tgid();
    __u64 ts = bpf_ktime_get_ns();
    __u64 skp = (__u64)sk;

    bpf_map_update_elem(&connect_start, &tid, &ts, BPF_ANY);
    bpf_map_update_elem(&sock_store, &tid, &skp, BPF_ANY);
    return 0;
}

SEC("kretprobe/tcp_v4_connect")
int BPF_KRETPROBE(tcp_connect_return, int ret)
{
    __u32 tid = (__u32)bpf_get_current_pid_tgid();

    __u64 *tsp = bpf_map_lookup_elem(&connect_start, &tid);
    __u64 *skpp = bpf_map_lookup_elem(&sock_store, &tid);

    if (!tsp || !skpp) {
        bpf_map_delete_elem(&connect_start, &tid);
        bpf_map_delete_elem(&sock_store, &tid);
        return 0;
    }

    if (ret != 0) {
        /* connect failed immediately */
        bpf_map_delete_elem(&connect_start, &tid);
        bpf_map_delete_elem(&sock_store, &tid);
        return 0;
    }

    __u64 latency = bpf_ktime_get_ns() - *tsp;
    struct sock *sk = (struct sock *)*skpp;

    /* Port filter */
    __u16 dport;
    __u16 dport_be;
    BPF_CORE_READ_INTO(&dport_be, sk, __sk_common.skc_dport);
    dport = __bpf_ntohs(dport_be);
    if (target_port && dport != target_port) {
        bpf_map_delete_elem(&connect_start, &tid);
        bpf_map_delete_elem(&sock_store, &tid);
        return 0;
    }

    /* Emit event */
    struct tcp_event *evt = bpf_ringbuf_reserve(&events, sizeof(*evt), 0);
    if (!evt) {
        bpf_map_delete_elem(&connect_start, &tid);
        bpf_map_delete_elem(&sock_store, &tid);
        return 0;
    }

    evt->pid = bpf_get_current_pid_tgid() >> 32;
    evt->uid = bpf_get_current_uid_gid() & 0xffffffff;
    evt->ts_ns = bpf_ktime_get_ns();
    evt->latency_ns = latency;
    evt->event_type = EVENT_CONNECT;
    evt->netns = get_netns(sk);
    bpf_get_current_comm(evt->comm, sizeof(evt->comm));
    read_sock_tuple(sk, &evt->saddr, &evt->daddr, &evt->sport, &evt->dport);
    evt->bytes_sent = 0;
    evt->bytes_received = 0;

    bpf_ringbuf_submit(evt, 0);

    bpf_map_delete_elem(&connect_start, &tid);
    bpf_map_delete_elem(&sock_store, &tid);
    return 0;
}

/* ── Incoming connections ─────────────────────────────── */

SEC("kretprobe/inet_csk_accept")
int BPF_KRETPROBE(tcp_accept_return, struct sock *sk)
{
    if (!trace_accepts || !sk)
        return 0;

    __u32 pid = bpf_get_current_pid_tgid() >> 32;
    if (target_pid && pid != target_pid)
        return 0;

    __u16 sport;
    BPF_CORE_READ_INTO(&sport, sk, __sk_common.skc_num);
    if (target_port && sport != target_port)
        return 0;

    struct tcp_event *evt = bpf_ringbuf_reserve(&events, sizeof(*evt), 0);
    if (!evt)
        return 0;

    evt->pid = pid;
    evt->uid = bpf_get_current_uid_gid() & 0xffffffff;
    evt->ts_ns = bpf_ktime_get_ns();
    evt->latency_ns = 0;
    evt->event_type = EVENT_ACCEPT;
    evt->netns = get_netns(sk);
    bpf_get_current_comm(evt->comm, sizeof(evt->comm));
    read_sock_tuple(sk, &evt->saddr, &evt->daddr, &evt->sport, &evt->dport);
    evt->bytes_sent = 0;
    evt->bytes_received = 0;

    bpf_ringbuf_submit(evt, 0);
    return 0;
}

/* ── Connection close ─────────────────────────────────── */

SEC("tracepoint/tcp/tcp_close")
int tracepoint__tcp__tcp_close(struct trace_event_raw_tcp_event_sk *ctx)
{
    if (!trace_closes)
        return 0;

    __u32 pid = bpf_get_current_pid_tgid() >> 32;
    if (target_pid && pid != target_pid)
        return 0;

    struct sock *sk = (struct sock *)ctx->skaddr;
    if (!sk)
        return 0;

    struct tcp_event *evt = bpf_ringbuf_reserve(&events, sizeof(*evt), 0);
    if (!evt)
        return 0;

    evt->pid = pid;
    evt->uid = bpf_get_current_uid_gid() & 0xffffffff;
    evt->ts_ns = bpf_ktime_get_ns();
    evt->latency_ns = 0;
    evt->event_type = EVENT_CLOSE;
    evt->netns = get_netns(sk);
    bpf_get_current_comm(evt->comm, sizeof(evt->comm));
    read_sock_tuple(sk, &evt->saddr, &evt->daddr, &evt->sport, &evt->dport);

    /* Read byte counters from tcp_sock */
    struct tcp_sock *tp = (struct tcp_sock *)sk;
    BPF_CORE_READ_INTO(&evt->bytes_sent, tp, bytes_sent);
    BPF_CORE_READ_INTO(&evt->bytes_received, tp, bytes_received);

    bpf_ringbuf_submit(evt, 0);
    return 0;
}

char LICENSE[] SEC("license") = "GPL";
EOF

tcptrack.c — userspace loader

cat > tcptrack.c << 'EOF'
/* tcptrack.c - userspace loader for TCP connection tracker */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include "tcptrack.h"
#include "tcptrack.skel.h"

static volatile sig_atomic_t exiting = 0;
static void sig_handler(int sig) { exiting = 1; }

/* Suspicious ports: C2, backdoors, crypto miners */
static const __u16 suspicious_ports[] = {
    4444, 5555, 6666, 8888, 9999, 1337, 31337, 12345,
    3389, /* RDP from unexpected source */
    0     /* sentinel */
};

static int is_suspicious(__u16 port)
{
    for (int i = 0; suspicious_ports[i]; i++)
        if (suspicious_ports[i] == port)
            return 1;
    return 0;
}

static const char *event_type_str(__u32 type)
{
    switch (type) {
    case EVENT_CONNECT: return "CONNECT";
    case EVENT_ACCEPT:  return "ACCEPT ";
    case EVENT_CLOSE:   return "CLOSE  ";
    default:            return "UNKNOWN";
    }
}

static void ip_str(__u32 addr, char *buf, size_t len)
{
    struct in_addr in = { .s_addr = addr };
    inet_ntop(AF_INET, &in, buf, len);
}

static int handle_event(void *ctx, void *data, size_t data_sz)
{
    struct tcp_event *evt = data;
    char saddr_str[INET_ADDRSTRLEN], daddr_str[INET_ADDRSTRLEN];
    char ts_buf[32];
    struct tm *tm;
    time_t t;

    ip_str(evt->saddr, saddr_str, sizeof(saddr_str));
    ip_str(evt->daddr, daddr_str, sizeof(daddr_str));

    t = time(NULL);
    tm = localtime(&t);
    strftime(ts_buf, sizeof(ts_buf), "%H:%M:%S", tm);

    /* Alert flag */
    const char *flag = "";
    if (evt->event_type == EVENT_CONNECT && is_suspicious(evt->dport))
        flag = "ALERT ";
    if (evt->event_type == EVENT_ACCEPT && is_suspicious(evt->sport))
        flag = "ALERT ";

    printf("%s%-5s %-7s pid=%-6d uid=%-5d %-16s %s:%-5d -> %s:%-5d",
        flag, ts_buf, event_type_str(evt->event_type),
        evt->pid, evt->uid, evt->comm,
        saddr_str, evt->sport, daddr_str, evt->dport);

    if (evt->event_type == EVENT_CONNECT && evt->latency_ns > 0)
        printf("  lat=%.3fms", (double)evt->latency_ns / 1e6);

    if (evt->event_type == EVENT_CLOSE) {
        printf("  sent=%lluKB recv=%lluKB",
            (unsigned long long)evt->bytes_sent / 1024,
            (unsigned long long)evt->bytes_received / 1024);
    }

    if (evt->netns)
        printf("  ns=%u", evt->netns);

    printf("\n");
    return 0;
}

int main(int argc, char **argv)
{
    struct tcptrack_bpf *skel;
    struct ring_buffer *rb = NULL;
    int err;

    /* Parse arguments */
    __u32 pid_filter = 0;
    __u16 port_filter = 0;

    for (int i = 1; i < argc; i++) {
        if (strncmp(argv[i], "--pid=", 6) == 0)
            pid_filter = atoi(argv[i] + 6);
        else if (strncmp(argv[i], "--port=", 7) == 0)
            port_filter = atoi(argv[i] + 7);
        else if (strcmp(argv[i], "--no-connect") == 0)
            ;  /* handled below */
        else if (strcmp(argv[i], "--no-accept") == 0)
            ;
        else if (strcmp(argv[i], "--no-close") == 0)
            ;
        else {
            fprintf(stderr, "Usage: %s [--pid=N] [--port=N] "
                "[--no-connect] [--no-accept] [--no-close]\n", argv[0]);
            return 1;
        }
    }

    signal(SIGINT, sig_handler);
    signal(SIGTERM, sig_handler);

    skel = tcptrack_bpf__open();
    if (!skel) {
        fprintf(stderr, "Failed to open BPF skeleton\n");
        return 1;
    }

    /* Set config */
    skel->rodata->target_pid = pid_filter;
    skel->rodata->target_port = port_filter;
    skel->rodata->trace_connects = true;
    skel->rodata->trace_accepts = true;
    skel->rodata->trace_closes = true;

    for (int i = 1; i < argc; i++) {
        if (strcmp(argv[i], "--no-connect") == 0)
            skel->rodata->trace_connects = false;
        else if (strcmp(argv[i], "--no-accept") == 0)
            skel->rodata->trace_accepts = false;
        else if (strcmp(argv[i], "--no-close") == 0)
            skel->rodata->trace_closes = false;
    }

    err = tcptrack_bpf__load(skel);
    if (err) {
        fprintf(stderr, "Failed to load BPF programs: %d\n", err);
        goto cleanup;
    }

    err = tcptrack_bpf__attach(skel);
    if (err) {
        fprintf(stderr, "Failed to attach BPF programs: %d\n", err);
        goto cleanup;
    }

    rb = ring_buffer__new(bpf_map__fd(skel->maps.events),
                          handle_event, NULL, NULL);
    if (!rb) {
        fprintf(stderr, "Failed to create ring buffer\n");
        err = -1;
        goto cleanup;
    }

    printf("Tracing TCP connections");
    if (pid_filter) printf(" for PID %d", pid_filter);
    if (port_filter) printf(" on port %d", port_filter);
    printf(". Ctrl+C to stop.\n\n");

    while (!exiting) {
        err = ring_buffer__poll(rb, 100);
        if (err == -EINTR) { err = 0; break; }
        if (err < 0) { fprintf(stderr, "Poll error: %d\n", err); break; }
    }

cleanup:
    ring_buffer__free(rb);
    tcptrack_bpf__destroy(skel);
    return err < 0 ? 1 : 0;
}
EOF

Example output:

Tracing TCP connections. Ctrl+C to stop.

14:30 CONNECT pid=3245   uid=1000  curl             10.0.0.5:48832 -> 93.184.216.34:443    lat=23.451ms  ns=4026531840
14:30 ACCEPT  pid=8821   uid=0     sshd             10.0.0.5:22    -> 10.0.0.1:52104       ns=4026531840
14:30 CONNECT pid=1102   uid=0     dnf              10.0.0.5:36712 -> 192.168.1.100:443    lat=1.234ms   ns=4026531840
ALERT 14:31 CONNECT pid=9912   uid=33    python3          10.0.0.5:44120 -> 45.33.32.156:4444    lat=89.102ms  ns=4026531840
14:31 CLOSE   pid=3245   uid=1000  curl             10.0.0.5:48832 -> 93.184.216.34:443    sent=2KB recv=48KB  ns=4026531840
14:32 CLOSE   pid=8821   uid=0     sshd             10.0.0.5:22    -> 10.0.0.1:52104       sent=124KB recv=8KB  ns=4026531840

Complete project: file change auditor

This tool monitors file writes to security-sensitive paths (/etc/, /usr/lib/systemd/, /root/.ssh/, etc.) and logs every modification with the process, user, and full path. It uses inode-level tracing to catch writes through any mechanism: direct writes, renames, truncates, and unlinks.

I built this after a contractor's script silently modified /etc/sudoers on 30 machines. Auditd would have caught it, but auditd logs are noisy and nobody was watching them. This tool sends alerts in real time, with the exact process and user that touched the file. It runs on every kldload server and desktop profile by default.

fileaudit.h — shared types

cat > fileaudit.h << 'EOF'
/* fileaudit.h - shared types for file change auditor */
#ifndef __FILEAUDIT_H
#define __FILEAUDIT_H

#define MAX_PATH_LEN 256
#define MAX_COMM_LEN 16
#define MAX_WATCH_PATHS 32

enum file_op {
    OP_WRITE    = 1,
    OP_TRUNCATE = 2,
    OP_RENAME   = 3,
    OP_UNLINK   = 4,
    OP_CREATE   = 5,
    OP_CHMOD    = 6,
    OP_CHOWN    = 7,
};

struct file_audit_event {
    __u32 pid;
    __u32 ppid;
    __u32 uid;
    __u32 gid;
    __u64 ts_ns;
    __u32 op;
    __u32 flags;
    __s64 size;
    char  comm[MAX_COMM_LEN];
    char  pcomm[MAX_COMM_LEN];    /* parent process name */
    char  path[MAX_PATH_LEN];
};

#endif /* __FILEAUDIT_H */
EOF

fileaudit.bpf.c — kernel side

cat > fileaudit.bpf.c << 'EOF'
/* fileaudit.bpf.c - audit file changes to sensitive paths */

#include "vmlinux.h"
#include 
#include 
#include 
#include "fileaudit.h"

/* Config */
const volatile __u32 target_uid = 0;
const volatile bool log_reads = false;

/* Ring buffer */
struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 512 * 1024);
} events SEC(".maps");

/* Per-CPU scratch space for path building (avoids stack limit) */
struct path_scratch {
    char buf[MAX_PATH_LEN];
};

struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, struct path_scratch);
} scratch SEC(".maps");

/* Bloom filter: quick check if a directory inode is watched */
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 256);
    __type(key, __u64);        /* inode number */
    __type(value, __u8);       /* 1 = watched */
} watched_inodes SEC(".maps");

/* Helper: check if file path starts with a watched prefix */
static __always_inline bool is_watched_path(const char *path)
{
    /* Check common sensitive prefixes by examining first bytes */
    if (path[0] != '/')
        return false;

    /* /etc/ */
    if (path[1] == 'e' && path[2] == 't' && path[3] == 'c' &&
        (path[4] == '/' || path[4] == '\0'))
        return true;

    /* /root/ */
    if (path[1] == 'r' && path[2] == 'o' && path[3] == 'o' &&
        path[4] == 't' && (path[5] == '/' || path[5] == '\0'))
        return true;

    /* /usr/lib/systemd/ */
    if (path[1] == 'u' && path[2] == 's' && path[3] == 'r' &&
        path[4] == '/' && path[5] == 'l' && path[6] == 'i' &&
        path[7] == 'b' && path[8] == '/')
        return true;

    /* /var/spool/cron/ */
    if (path[1] == 'v' && path[2] == 'a' && path[3] == 'r' &&
        path[4] == '/' && path[5] == 's' && path[6] == 'p')
        return true;

    return false;
}

/* Helper: emit an audit event */
static __always_inline void emit_event(void *ctx, enum file_op op,
    const char *path_user, __s64 size, __u32 flags)
{
    __u32 zero = 0;
    struct path_scratch *ps = bpf_map_lookup_elem(&scratch, &zero);
    if (!ps)
        return;

    /* Read the path into scratch space */
    bpf_probe_read_user_str(ps->buf, sizeof(ps->buf), path_user);

    /* Check if this path is interesting */
    if (!is_watched_path(ps->buf))
        return;

    /* UID filter */
    __u32 uid = bpf_get_current_uid_gid() & 0xffffffff;
    if (target_uid && uid != target_uid)
        return;

    /* Emit event */
    struct file_audit_event *evt;
    evt = bpf_ringbuf_reserve(&events, sizeof(*evt), 0);
    if (!evt)
        return;

    evt->pid = bpf_get_current_pid_tgid() >> 32;
    evt->uid = uid;
    evt->gid = bpf_get_current_uid_gid() >> 32;
    evt->ts_ns = bpf_ktime_get_ns();
    evt->op = op;
    evt->flags = flags;
    evt->size = size;
    bpf_get_current_comm(evt->comm, sizeof(evt->comm));
    __builtin_memcpy(evt->path, ps->buf, MAX_PATH_LEN);

    /* Get parent process info */
    struct task_struct *task = (void *)bpf_get_current_task();
    struct task_struct *parent;
    BPF_CORE_READ_INTO(&parent, task, real_parent);
    if (parent) {
        BPF_CORE_READ_INTO(&evt->ppid, parent, tgid);
        bpf_probe_read_kernel_str(evt->pcomm, sizeof(evt->pcomm),
            &parent->comm);
    }

    bpf_ringbuf_submit(evt, 0);
}

/* ── Trace file opens with write intent ───────────────── */

SEC("tracepoint/syscalls/sys_enter_openat")
int trace_openat(struct trace_event_raw_sys_enter *ctx)
{
    int flags = (int)ctx->args[2];

    /* Only trace writes (O_WRONLY=1, O_RDWR=2, O_CREAT=0x40, O_TRUNC=0x200) */
    if (!(flags & 0x243))
        return 0;

    const char *fname = (const char *)ctx->args[1];
    enum file_op op;
    if (flags & 0x40)
        op = OP_CREATE;
    else if (flags & 0x200)
        op = OP_TRUNCATE;
    else
        op = OP_WRITE;

    emit_event(ctx, op, fname, 0, flags);
    return 0;
}

/* ── Trace unlink (file deletion) ─────────────────────── */

SEC("tracepoint/syscalls/sys_enter_unlinkat")
int trace_unlink(struct trace_event_raw_sys_enter *ctx)
{
    const char *fname = (const char *)ctx->args[1];
    emit_event(ctx, OP_UNLINK, fname, 0, 0);
    return 0;
}

/* ── Trace rename ─────────────────────────────────────── */

SEC("tracepoint/syscalls/sys_enter_renameat2")
int trace_rename(struct trace_event_raw_sys_enter *ctx)
{
    /* Check both old and new paths */
    const char *oldname = (const char *)ctx->args[1];
    const char *newname = (const char *)ctx->args[3];
    emit_event(ctx, OP_RENAME, oldname, 0, 0);
    emit_event(ctx, OP_RENAME, newname, 0, 0);
    return 0;
}

/* ── Trace chmod ──────────────────────────────────────── */

SEC("tracepoint/syscalls/sys_enter_fchmodat")
int trace_chmod(struct trace_event_raw_sys_enter *ctx)
{
    const char *fname = (const char *)ctx->args[1];
    __u32 mode = (__u32)ctx->args[2];
    emit_event(ctx, OP_CHMOD, fname, 0, mode);
    return 0;
}

/* ── Trace chown ──────────────────────────────────────── */

SEC("tracepoint/syscalls/sys_enter_fchownat")
int trace_chown(struct trace_event_raw_sys_enter *ctx)
{
    const char *fname = (const char *)ctx->args[1];
    __u32 uid = (__u32)ctx->args[2];
    __u32 gid = (__u32)ctx->args[3];
    emit_event(ctx, OP_CHOWN, fname, uid, gid);
    return 0;
}

char LICENSE[] SEC("license") = "GPL";
EOF

fileaudit.c — userspace loader

cat > fileaudit.c << 'EOF'
/* fileaudit.c - userspace loader for file change auditor */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include "fileaudit.h"
#include "fileaudit.skel.h"

static volatile sig_atomic_t exiting = 0;
static void sig_handler(int sig) { exiting = 1; }

static const char *op_str(__u32 op)
{
    switch (op) {
    case OP_WRITE:    return "WRITE   ";
    case OP_TRUNCATE: return "TRUNCATE";
    case OP_RENAME:   return "RENAME  ";
    case OP_UNLINK:   return "DELETE  ";
    case OP_CREATE:   return "CREATE  ";
    case OP_CHMOD:    return "CHMOD   ";
    case OP_CHOWN:    return "CHOWN   ";
    default:          return "UNKNOWN ";
    }
}

/* Severity coloring: high-risk operations get flagged */
static const char *severity(__u32 op, const char *path)
{
    /* Any modification to these is HIGH severity */
    if (strstr(path, "/etc/passwd") || strstr(path, "/etc/shadow") ||
        strstr(path, "/etc/sudoers") || strstr(path, "/etc/ssh/") ||
        strstr(path, "authorized_keys") || strstr(path, "crontab"))
        return "HIGH  ";

    /* Deletions are MEDIUM */
    if (op == OP_UNLINK || op == OP_RENAME)
        return "MEDIUM";

    /* Permission/ownership changes */
    if (op == OP_CHMOD || op == OP_CHOWN)
        return "MEDIUM";

    return "LOW   ";
}

static int handle_event(void *ctx, void *data, size_t data_sz)
{
    struct file_audit_event *evt = data;
    char ts_buf[32];
    struct tm *tm;
    time_t t;

    t = time(NULL);
    tm = localtime(&t);
    strftime(ts_buf, sizeof(ts_buf), "%H:%M:%S", tm);

    printf("[%s] %s %s pid=%-6d uid=%-5d %-16s (parent: %-16s pid=%-6d) %s",
        severity(evt->op, evt->path),
        ts_buf,
        op_str(evt->op),
        evt->pid,
        evt->uid,
        evt->comm,
        evt->pcomm,
        evt->ppid,
        evt->path);

    if (evt->op == OP_CHMOD)
        printf(" mode=%04o", evt->flags);
    if (evt->op == OP_CHOWN)
        printf(" new_uid=%d new_gid=%d", evt->flags, (__u32)evt->size);

    printf("\n");
    return 0;
}

int main(int argc, char **argv)
{
    struct fileaudit_bpf *skel;
    struct ring_buffer *rb = NULL;
    int err;

    __u32 uid_filter = 0;
    for (int i = 1; i < argc; i++) {
        if (strncmp(argv[i], "--uid=", 6) == 0)
            uid_filter = atoi(argv[i] + 6);
        else {
            fprintf(stderr, "Usage: %s [--uid=N]\n", argv[0]);
            return 1;
        }
    }

    signal(SIGINT, sig_handler);
    signal(SIGTERM, sig_handler);

    skel = fileaudit_bpf__open();
    if (!skel) {
        fprintf(stderr, "Failed to open BPF skeleton\n");
        return 1;
    }

    skel->rodata->target_uid = uid_filter;

    err = fileaudit_bpf__load(skel);
    if (err) {
        fprintf(stderr, "Failed to load BPF programs: %d\n", err);
        goto cleanup;
    }

    err = fileaudit_bpf__attach(skel);
    if (err) {
        fprintf(stderr, "Failed to attach BPF programs: %d\n", err);
        goto cleanup;
    }

    rb = ring_buffer__new(bpf_map__fd(skel->maps.events),
                          handle_event, NULL, NULL);
    if (!rb) {
        fprintf(stderr, "Failed to create ring buffer\n");
        err = -1;
        goto cleanup;
    }

    printf("File change auditor started. Watching /etc/, /root/, "
           "/usr/lib/, /var/spool/cron/.\n");
    if (uid_filter)
        printf("Filtering for UID %d only.\n", uid_filter);
    printf("Ctrl+C to stop.\n\n");

    while (!exiting) {
        err = ring_buffer__poll(rb, 100);
        if (err == -EINTR) { err = 0; break; }
        if (err < 0) { fprintf(stderr, "Poll error: %d\n", err); break; }
    }

cleanup:
    ring_buffer__free(rb);
    fileaudit_bpf__destroy(skel);
    return err < 0 ? 1 : 0;
}
EOF

Example output:

File change auditor started. Watching /etc/, /root/, /usr/lib/, /var/spool/cron/.
Ctrl+C to stop.

[LOW   ] 14:30:01 WRITE    pid=3245   uid=0     vim              (parent: bash             pid=3200  ) /etc/hosts
[HIGH  ] 14:30:15 WRITE    pid=9421   uid=0     visudo           (parent: bash             pid=9400  ) /etc/sudoers.tmp
[HIGH  ] 14:30:15 RENAME   pid=9421   uid=0     visudo           (parent: bash             pid=9400  ) /etc/sudoers
[MEDIUM] 14:30:22 DELETE   pid=1102   uid=0     rm               (parent: bash             pid=1100  ) /etc/motd.bak
[HIGH  ] 14:31:01 CREATE   pid=8832   uid=1000  ssh-keygen       (parent: bash             pid=8800  ) /root/.ssh/authorized_keys
[MEDIUM] 14:31:05 CHMOD    pid=8832   uid=1000  chmod            (parent: bash             pid=8800  ) /root/.ssh/authorized_keys mode=0600
[LOW   ] 14:31:10 WRITE    pid=1500   uid=0     systemd-journal  (parent: systemd          pid=1     ) /usr/lib/systemd/system/my-new.service
[HIGH  ] 14:32:00 WRITE    pid=7700   uid=0     crontab          (parent: bash             pid=7650  ) /var/spool/cron/root

Reference summary

This page has covered three generations of eBPF tooling, from simplest to most powerful:

Approach	Language	When to use	Deployment
bpftrace	bpftrace DSL	Interactive exploration, quick answers, one-off investigations	Script file + bpftrace installed
BCC Python	Python + embedded C	Complex userspace logic, integration with Python tools, prototyping	Python + BCC + kernel-devel
libbpf CO-RE	C (kernel) + C (user)	Production tools, static binaries, CI/CD, performance-critical	Single static binary, no dependencies

The progression: Start with bpftrace to understand the problem. Prototype with BCC if you need complex userspace logic. Ship with libbpf CO-RE when you need production-grade reliability, performance, and portability. Most teams skip BCC entirely now and go straight from bpftrace to libbpf, because the skeleton-based workflow has made libbpf almost as easy to use as BCC, with none of the deployment headaches.

Essential commands cheat sheet

# Generate vmlinux.h (do this once per kernel)
bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h

# Compile eBPF program to BPF bytecode
clang -g -O2 -target bpf -D__TARGET_ARCH_x86 -c prog.bpf.c -o prog.bpf.o

# Generate skeleton header
bpftool gen skeleton prog.bpf.o > prog.skel.h

# Compile userspace loader
gcc -g -O2 prog.c -lbpf -lelf -lz -o prog

# Build static binary for deployment
gcc -g -O2 -static prog.c -lbpf -lelf -lz -lzstd -o prog

# Generate minimal BTF for kernel portability
bpftool gen min_core_btf /sys/kernel/btf/vmlinux prog.bpf.o prog.min_btf

# List loaded BPF programs
bpftool prog list

# Inspect a loaded program
bpftool prog show id 42

# Dump a map's contents
bpftool map dump id 17

# Show map info
bpftool map show name my_map

# Pin a program to bpffs (persists after loader exits)
bpftool prog pin id 42 /sys/fs/bpf/my_prog

# Attach an XDP program to a network interface
bpftool net attach xdp id 42 dev eth0

# Check kernel BTF support
bpftool btf show

If you have read this far, you know more about writing eBPF programs than 95% of the engineers who claim "eBPF experience" on their resume. The difference between reading about eBPF and writing eBPF programs is the same as reading about swimming and jumping in the pool. Pick one of the two projects above, build it, run it on your machine, and modify it. Break it, fix it, break it again. That is how you learn. The verifier will catch you if you fall.

← biolatency — disk I/O latency histograms Quick health check with kst →