| your Linux construction kit
Source

Writing Custom eBPF Programs

The BCC tools and bpftrace one-liners cover 80% of use cases. The other 20% requires writing your own programs. This page teaches you the bpftrace language from scratch, walks through three real custom scripts, and shows you how to graduate to BCC Python when bpftrace is not enough.

The power: bpftrace is a domain-specific language designed for kernel tracing. It has the terseness of awk, the probing power of DTrace, and the safety of eBPF's in-kernel verifier. You write a few lines, the kernel executes them at wire speed, and you get answers that no other tool can give you. The overhead is measured in nanoseconds, not milliseconds.


The bpftrace language

Every bpftrace program has the same structure: one or more probe blocks, each with an optional filter and action.

probe /filter/ { action }

That is it. The probe fires when an event occurs. The filter decides whether to process it. The action does the work.

Your first program

bpftrace -e 'BEGIN { printf("tracing started\n"); }'

BEGIN fires once when the program starts. printf works like C. Press Ctrl+C to exit.

A real probe

bpftrace -e '
tracepoint:syscalls:sys_enter_openat {
  printf("%s opened %s\n", comm, str(args.filename));
}'

This fires every time any process calls openat(). comm is the process name. args.filename is the syscall argument. str() converts the kernel pointer to a string.


Probe types

bpftrace supports 8 probe types. Each attaches to a different kind of kernel or userspace event.

Probe type What it traces Example
tracepoint Static kernel tracepoints tracepoint:syscalls:sys_enter_read
kprobe Any kernel function entry kprobe:vfs_read
kretprobe Any kernel function return kretprobe:vfs_read
uprobe Userspace function entry uprobe:/usr/bin/bash:readline
uretprobe Userspace function return uretprobe:/usr/bin/bash:readline
usdt User-level static tracing usdt:/usr/sbin/mysqld:query__start
software Software events (page faults, etc.) software:page-faults:100
hardware Hardware PMC events hardware:cache-misses:1000000

tracepoint vs. kprobe

Tracepoints are stable, documented hook points in the kernel. They survive kernel upgrades. Use them when available.
Kprobes can attach to any kernel function, but function names change between kernel versions. Use them when no tracepoint exists for what you need.

Analogy: tracepoints are labeled doors with signs. Kprobes are grappling hooks — you can attach anywhere, but the building layout might change.

List available probes

# List all tracepoints
bpftrace -l 'tracepoint:*' | head -50

# List syscall tracepoints
bpftrace -l 'tracepoint:syscalls:*'

# List ZFS-related kprobes
bpftrace -l 'kprobe:zfs_*'
bpftrace -l 'kprobe:arc_*'
bpftrace -l 'kprobe:zio_*'

# List probes in a userspace binary
bpftrace -l 'uprobe:/usr/sbin/postgres:*' | head -20

Built-in variables

Variable Type Description
pid uint64 Process ID
tid uint64 Thread ID
uid uint64 User ID
gid uint64 Group ID
comm string Process name (16 char max)
nsecs uint64 Nanosecond timestamp
kstack string Kernel stack trace
ustack string Userspace stack trace
arg0-argN uint64 Probe arguments (kprobe/uprobe)
retval uint64 Return value (kretprobe/uretprobe)
args struct Tracepoint arguments (named fields)
curtask struct Current task_struct pointer
cpu uint32 Current CPU number

Maps and aggregations

Maps are bpftrace's data structures. They are hash tables stored in kernel memory, keyed by anything you choose. The @ prefix declares a map.

# Count syscalls by process name
bpftrace -e '
tracepoint:raw_syscalls:sys_enter {
  @[comm] = count();
}'

Press Ctrl+C and bpftrace prints the map sorted by value:

@[systemd]: 234
@[sshd]: 567
@[postgres]: 12847
@[nginx]: 34521

Aggregation functions

Function Description Example
count() Count occurrences @[comm] = count();
sum(val) Sum values @bytes[comm] = sum(args.ret);
avg(val) Average value @lat[comm] = avg($dur);
min(val) Minimum value @min_lat = min($dur);
max(val) Maximum value @max_lat = max($dur);
hist(val) Power-of-2 histogram @latency = hist($dur);
lhist(val, min, max, step) Linear histogram @size = lhist(args.ret, 0, 1024, 64);

Timing pattern

The most common bpftrace pattern: save a timestamp on entry, compute duration on exit.

bpftrace -e '
kprobe:vfs_read {
  @start[tid] = nsecs;
}

kretprobe:vfs_read /@start[tid]/ {
  $duration_us = (nsecs - @start[tid]) / 1000;
  @read_latency = hist($duration_us);
  delete(@start[tid]);
}'

$duration_us is a scratch variable (local to this probe). @start is a map (persists across probes). delete() frees the entry to prevent memory leaks.

Maps are kernel memory

Every @map entry consumes kernel memory. Always delete() entries when done, especially in timing patterns. If you trace a function that fires millions of times per second and store an entry per tid, you need to clean up or the map grows unbounded. bpftrace will warn you if a map gets too large.

Analogy: maps are sticky notes on a whiteboard. Start timing = stick a note. End timing = read the note and throw it away. If you forget to throw them away, the whiteboard fills up.

Custom script: monitor all DNS queries leaving this machine

This script traces UDP packets to port 53 and extracts the DNS query name from the packet payload. Save it as dnswatch.bt.

#!/usr/bin/env bpftrace

/*
 * dnswatch.bt - Monitor all DNS queries leaving this machine
 *
 * Traces UDP sendmsg/sendto to port 53 and logs the query.
 * Works with any DNS resolver (systemd-resolved, unbound, direct queries).
 *
 * Run: bpftrace dnswatch.bt
 */

BEGIN
{
  printf("%-8s %-6s %-16s %s\n", "TIME", "PID", "COMM", "DNS QUERY");
}

/* Trace UDP sendto - catches most DNS queries */
tracepoint:syscalls:sys_enter_sendto
/args.addr != 0/
{
  /* Read the sockaddr_in structure to check for port 53 */
  $sa = (struct sockaddr_in *)args.addr;
  $port = ($sa->sin_port >> 8) | (($sa->sin_port & 0xff) << 8);  /* ntohs */

  if ($port == 53) {
    time("%H:%M:%S ");
    printf("%-6d %-16s [DNS query to %s]\n",
      pid, comm,
      ntop(AF_INET, $sa->sin_addr.s_addr));
  }
}

/* Also trace connect() to port 53 for DNS-over-TCP and DoT */
tracepoint:syscalls:sys_enter_connect
/args.uservaddr != 0/
{
  $sa = (struct sockaddr_in *)args.uservaddr;
  $port = ($sa->sin_port >> 8) | (($sa->sin_port & 0xff) << 8);

  if ($port == 53 || $port == 853) {
    time("%H:%M:%S ");
    printf("%-6d %-16s [DNS-%s connect to %s]\n",
      pid, comm,
      $port == 53 ? "TCP" : "TLS",
      ntop(AF_INET, $sa->sin_addr.s_addr));
  }
}

A simpler alternative using the BCC tool:

# Quick one-liner: trace all UDP traffic to port 53
bpftrace -e '
tracepoint:net:net_dev_queue
/args.len > 0/
{
  @dns_by_process[comm] = count();
}

interval:s:10 {
  printf("\n--- DNS-like packets by process (10s) ---\n");
  print(@dns_by_process, 10);
  clear(@dns_by_process);
}'

For a quick-and-dirty approach, BCC includes gethostlatency which traces the glibc resolver:

# Trace DNS resolution latency via glibc
gethostlatency

Output:

TIME      PID    COMM          LATms HOST
14:02:01  3245   curl           12.4 api.example.com
14:02:01  3245   curl            0.1 api.example.com   # cached
14:02:03  8821   wget           45.2 packages.debian.org
14:02:05  1234   python3         8.7 evil-c2-server.ru

Custom script: track ZFS snapshot creation and deletion

This script traces ZFS snapshot operations with timestamps, giving you an audit trail of who is creating and destroying snapshots. Save as zfs-snap-audit.bt.

#!/usr/bin/env bpftrace

/*
 * zfs-snap-audit.bt - Track ZFS snapshot create/destroy operations
 *
 * Traces the kernel-side ZFS snapshot path to catch all snapshot
 * operations regardless of which tool initiated them (zfs CLI,
 * sanoid, ksnap, or direct ioctl).
 *
 * Run: bpftrace zfs-snap-audit.bt
 */

BEGIN
{
  printf("ZFS Snapshot Auditor started. Ctrl+C to stop.\n");
  printf("%-10s %-8s %-6s %-16s %s\n",
    "TIME", "ACTION", "PID", "COMM", "DETAILS");
}

/* Trace snapshot creation */
kprobe:dsl_dataset_snapshot
{
  time("%H:%M:%S  ");
  printf("%-8s %-6d %-16s snapshot create initiated\n",
    "CREATE", pid, comm);
  @create_count = count();
}

/* Trace snapshot destruction */
kprobe:dsl_destroy_snapshot
{
  time("%H:%M:%S  ");
  printf("%-8s %-6d %-16s snapshot destroy initiated\n",
    "DESTROY", pid, comm);
  @destroy_count = count();
}

/* Trace zfs_ioc_snapshot (ioctl path - catches zfs CLI commands) */
kprobe:zfs_ioc_snapshot
{
  time("%H:%M:%S  ");
  printf("%-8s %-6d %-16s snapshot via ioctl\n",
    "IOCTL", pid, comm);
}

/* Trace zfs_ioc_destroy_snaps */
kprobe:zfs_ioc_destroy_snaps
{
  time("%H:%M:%S  ");
  printf("%-8s %-6d %-16s destroy via ioctl\n",
    "IOCTL", pid, comm);
}

interval:s:60
{
  printf("\n--- 60s summary: creates=%d destroys=%d ---\n",
    @create_count, @destroy_count);
  clear(@create_count);
  clear(@destroy_count);
}

END
{
  printf("\nZFS Snapshot Auditor stopped.\n");
  clear(@create_count);
  clear(@destroy_count);
}

Run it and then create a snapshot in another terminal:

# Terminal 1: start the auditor
bpftrace zfs-snap-audit.bt

# Terminal 2: create and destroy a snapshot
zfs snapshot rpool/ROOT@test-audit
zfs destroy rpool/ROOT@test-audit

Output in Terminal 1:

ZFS Snapshot Auditor started. Ctrl+C to stop.
TIME       ACTION   PID    COMM             DETAILS
15:23:01   CREATE   9421   zfs              snapshot create initiated
15:23:01   IOCTL    9421   zfs              snapshot via ioctl
15:23:15   DESTROY  9434   zfs              snapshot destroy initiated
15:23:15   IOCTL    9434   zfs              destroy via ioctl

Custom script: measure WireGuard tunnel latency per peer

This script traces WireGuard packet send/receive paths to measure per-peer tunnel latency. Save as wg-latency.bt.

#!/usr/bin/env bpftrace

/*
 * wg-latency.bt - Measure WireGuard tunnel processing latency
 *
 * Traces the WireGuard send/receive paths to measure how long
 * the kernel spends encrypting, encapsulating, and transmitting
 * packets through WireGuard tunnels.
 *
 * Run: bpftrace wg-latency.bt
 */

BEGIN
{
  printf("WireGuard latency tracer started. Ctrl+C for results.\n");
}

/* Trace packet encryption (send path) */
kprobe:wg_packet_encrypt
{
  @encrypt_start[tid] = nsecs;
}

kretprobe:wg_packet_encrypt /@encrypt_start[tid]/
{
  $dur_us = (nsecs - @encrypt_start[tid]) / 1000;
  @encrypt_latency = hist($dur_us);
  @encrypt_count = count();
  delete(@encrypt_start[tid]);
}

/* Trace packet decryption (receive path) */
kprobe:wg_packet_decrypt
{
  @decrypt_start[tid] = nsecs;
}

kretprobe:wg_packet_decrypt /@decrypt_start[tid]/
{
  $dur_us = (nsecs - @decrypt_start[tid]) / 1000;
  @decrypt_latency = hist($dur_us);
  @decrypt_count = count();
  delete(@decrypt_start[tid]);
}

/* Trace the send path (queuing + transmission) */
kprobe:wg_xmit
{
  @xmit_start[tid] = nsecs;
}

kretprobe:wg_xmit /@xmit_start[tid]/
{
  $dur_us = (nsecs - @xmit_start[tid]) / 1000;
  @xmit_latency = hist($dur_us);
  delete(@xmit_start[tid]);
}

/* Trace peer handshake events */
kprobe:wg_packet_handshake_send_worker
{
  time("%H:%M:%S ");
  printf("WG handshake initiated by %s (pid=%d)\n", comm, pid);
  @handshakes = count();
}

interval:s:30
{
  printf("\n=== WireGuard tunnel stats (30s window) ===\n");
  printf("Encrypt operations: "); print(@encrypt_count);
  printf("Decrypt operations: "); print(@decrypt_count);
  printf("Handshakes: "); print(@handshakes);
  printf("\nEncrypt latency (us):\n"); print(@encrypt_latency);
  printf("\nDecrypt latency (us):\n"); print(@decrypt_latency);
  printf("\nXmit latency (us):\n"); print(@xmit_latency);

  clear(@encrypt_count);
  clear(@decrypt_count);
  clear(@handshakes);
  clear(@encrypt_latency);
  clear(@decrypt_latency);
  clear(@xmit_latency);
}

END
{
  clear(@encrypt_start);
  clear(@decrypt_start);
  clear(@xmit_start);
  clear(@encrypt_count);
  clear(@decrypt_count);
  clear(@handshakes);
}

Expected output after 30 seconds of WireGuard traffic:

=== WireGuard tunnel stats (30s window) ===
Encrypt operations: 4521
Decrypt operations: 4498
Handshakes: 0

Encrypt latency (us):
[0]                   12     |*                               |
[1]                  3891    |****************************************|
[2, 4)                567    |*****                           |
[4, 8)                 48    |                                |
[8, 16)                 3    |                                |

Decrypt latency (us):
[0]                    8     |                                |
[1]                  3712    |****************************************|
[2, 4)                689    |*******                         |
[4, 8)                 82    |                                |
[8, 16)                 7    |                                |

WireGuard encryption and decryption should complete in 1-4 microseconds on modern hardware. If you see a tail extending into milliseconds, check for CPU contention (runqlat) or interrupt coalescing issues.


BCC Python programs: when bpftrace is not enough

bpftrace is fast to write but limited in its control flow and data processing. When you need complex logic, persistent state, or integration with other Python libraries, use BCC's Python API.

bpftrace vs. BCC Python

Use bpftrace when: you need a quick answer, the logic is simple (trace + count/histogram), or you are exploring interactively.
Use BCC Python when: you need complex filtering logic, want to correlate events across multiple probes with state machines, need to write data to files/databases, or want to integrate with existing Python tools.

Analogy: bpftrace is a calculator — fast, focused, does one thing well. BCC Python is a spreadsheet — more setup, but handles complex analysis and automation.

BCC Python structure

cat > /usr/local/bin/ebpf-connection-monitor.py << 'PYEOF'
#!/usr/bin/env python3
"""
eBPF connection monitor - tracks all TCP connections and alerts on suspicious ones.
Uses BCC to attach to tcp_v4_connect and log connection details.
"""

from bcc import BPF
from time import strftime
import sys

# BPF program (C code that runs in the kernel)
bpf_program = """
#include 
#include 

struct conn_event {
    u32 pid;
    u32 uid;
    u32 daddr;
    u16 dport;
    char comm[16];
};

BPF_PERF_OUTPUT(events);

int trace_connect(struct pt_regs *ctx, struct sock *sk)
{
    struct conn_event evt = {};

    evt.pid = bpf_get_current_pid_tgid() >> 32;
    evt.uid = bpf_get_current_uid_gid() & 0xffffffff;
    bpf_get_current_comm(&evt.comm, sizeof(evt.comm));

    evt.daddr = sk->__sk_common.skc_daddr;
    evt.dport = sk->__sk_common.skc_dport;

    events.perf_submit(ctx, &evt, sizeof(evt));
    return 0;
}
"""

# Load and attach
b = BPF(text=bpf_program)
b.attach_kprobe(event="tcp_v4_connect", fn_name="trace_connect")

# Suspicious ports (common C2, backdoor, and mining ports)
SUSPICIOUS_PORTS = {4444, 5555, 6666, 8888, 9999, 1337, 31337, 12345}

def handle_event(cpu, data, size):
    event = b["events"].event(data)
    dport = ((event.dport & 0xff) << 8) | (event.dport >> 8)  # ntohs

    # Convert IP
    daddr = "%d.%d.%d.%d" % (
        event.daddr & 0xff,
        (event.daddr >> 8) & 0xff,
        (event.daddr >> 16) & 0xff,
        (event.daddr >> 24) & 0xff,
    )

    prefix = "ALERT" if dport in SUSPICIOUS_PORTS else "     "
    print("%s %s pid=%-6d uid=%-5d %-16s -> %s:%d" % (
        prefix,
        strftime("%H:%M:%S"),
        event.pid,
        event.uid,
        event.comm.decode('utf-8', errors='replace'),
        daddr,
        dport,
    ))

b["events"].open_perf_buffer(handle_event)

print("Tracing TCP connections. Ctrl+C to stop.")
print("%-5s %-8s %-10s %-7s %-16s    %s" % (
    "FLAG", "TIME", "PID", "UID", "COMM", "DESTINATION"))

try:
    while True:
        b.perf_buffer_poll()
except KeyboardInterrupt:
    print("\nDone.")
    sys.exit(0)
PYEOF
chmod +x /usr/local/bin/ebpf-connection-monitor.py

Run it:

python3 /usr/local/bin/ebpf-connection-monitor.py

Output:

Tracing TCP connections. Ctrl+C to stop.
FLAG  TIME     PID        UID     COMM                DESTINATION
      14:30:01 pid=3245   uid=1000  curl             -> 93.184.216.34:443
      14:30:02 pid=8821   uid=0     sshd             -> 10.0.0.12:22
ALERT 14:30:03 pid=3315   uid=33    python3          -> 45.33.32.156:4444
      14:30:05 pid=1234   uid=0     dnf              -> 192.168.1.100:443

Compiling and running BCC tools

# BCC tools are Python scripts — no compilation needed
# On Debian (kldload installs these):
dpkg -L bpfcc-tools | head -20

# On CentOS/RHEL:
rpm -ql bcc-tools | head -20

# Run any BCC tool directly:
python3 /usr/share/bcc/tools/execsnoop

# Or if installed to PATH:
execsnoop-bpfcc   # Debian
/usr/share/bcc/tools/execsnoop  # CentOS

Creating systemd services for persistent eBPF monitors

Any bpftrace script or BCC Python program can be wrapped in a systemd service for continuous monitoring.

bpftrace script as a service

# Save your bpftrace script
mkdir -p /usr/local/share/bpftrace
cp zfs-snap-audit.bt /usr/local/share/bpftrace/

# Create the service
cat > /etc/systemd/system/zfs-snap-audit.service << 'EOF'
[Unit]
Description=eBPF ZFS snapshot auditor
After=zfs.target

[Service]
Type=simple
ExecStart=/usr/bin/bpftrace /usr/local/share/bpftrace/zfs-snap-audit.bt
StandardOutput=append:/var/log/zfs-snap-audit.log
StandardError=append:/var/log/zfs-snap-audit.err
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now zfs-snap-audit

BCC Python program as a service

cat > /etc/systemd/system/ebpf-connmonitor.service << 'EOF'
[Unit]
Description=eBPF TCP connection monitor
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/python3 /usr/local/bin/ebpf-connection-monitor.py
StandardOutput=append:/var/log/ebpf-connections.log
StandardError=append:/var/log/ebpf-connections.err
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now ebpf-connmonitor

Log rotation for all eBPF services

cat > /etc/logrotate.d/ebpf << 'EOF'
/var/log/ebpf-*.log /var/log/zfs-snap-audit.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    copytruncate
}
EOF

Performance impact: nanoseconds, not milliseconds

eBPF programs run inside the kernel's BPF virtual machine. The verifier guarantees they terminate, do not access invalid memory, and complete in bounded time. The overhead per probe hit is typically 50-200 nanoseconds.

Operation Overhead per event Context
Tracepoint hit (no action) ~50 ns Just counting events
Tracepoint + map update ~100-150 ns Incrementing a counter
Kprobe + printf ~200-500 ns Logging each event
Kprobe + stack trace ~1-5 us Walking the call stack
BCC perf_submit ~500 ns-1 us Sending event to userspace

For comparison: a single disk I/O takes 50-10,000 microseconds. A network round-trip takes 200-100,000 microseconds. eBPF overhead is invisible in any real workload.

Safe by design

The BPF verifier in the kernel checks every eBPF program before it runs. It proves the program terminates (no infinite loops), accesses only valid memory, and uses bounded stack space. You cannot crash the kernel with a buggy bpftrace script. The worst that happens is the verifier rejects your program with an error message. This is a fundamental difference from kernel modules, where a bug means a kernel panic.

Analogy: writing a kernel module is like performing open-heart surgery — one mistake and the patient dies. Writing an eBPF program is like using an exercise machine with safety stops — it physically cannot let you hurt yourself.