| pick your distro, get ZFS on root
kldload — your platform, your way, free
Source

eBPF Tracepoints & Probes — Every Observable Event in Your Stack

Linux exposes tens of thousands of instrumentation points — scheduler decisions, disk I/O, network packets, memory allocations, syscalls, even individual function calls inside the kernel and your applications. eBPF lets you attach programs to any of them without modifying source code, recompiling, or rebooting. This page is the complete field guide to finding those instrumentation points, understanding the tradeoffs between probe types, and writing probes that survive kernel upgrades.

The mental model: the Linux kernel is a building with tens of thousands of wired sensors — light switches, motion detectors, door contacts, temperature gauges. Static tracepoints are the ones the architect put in on purpose, with documented wiring diagrams. Kprobes are tap wires you splice into any random wire in the wall. Uprobes are tap wires you splice into the wiring of the appliances plugged into the building. All three feed into the same eBPF alarm panel.

I spent years staring at strace output and printk debugging before I understood that the kernel already has structured instrumentation built in. Tracepoints are not a debugging hack. They are a first-class observability API that the kernel developers maintain and version. The moment you internalize that, you stop grep-ing log files and start asking the kernel directly.

The three probe types

Every eBPF tracing program attaches to one of three probe types. The choice determines stability, performance overhead, and what you can see. Get this wrong and your production tracing breaks on the next kernel update.

Static tracepoints

Instrumentation points placed deliberately by kernel developers in the source code using the TRACE_EVENT macro. They have stable names, documented arguments, and survive kernel upgrades. There are ~2,000 of them in a typical kernel. Examples: sched:sched_switch, block:block_rq_issue, net:netif_receive_skb.

Think of these as the labeled test points on a circuit board. The engineer put them there for you. They have names printed on the silk screen.

Kprobes (dynamic kernel probes)

Attach to any kernel function by name, at entry or return. No pre-placed instrumentation required — you pick the function, the kernel inserts a breakpoint. Incredibly powerful but fragile: kernel functions get renamed, inlined, or removed between versions. There are ~60,000 probeable functions in a typical kernel.

Think of these as alligator clips. You can clip onto any wire in the circuit board, but if the next board revision moves that wire, your clip falls off.

Uprobes (userspace probes)

Attach to functions inside userspace binaries — your applications, libraries, databases. The kernel patches the binary in memory (not on disk) to insert a breakpoint. You can trace PostgreSQL query execution, nginx request handling, or any function in any ELF binary without modifying the application.

Think of these as tapping the phone line between two offices. You can listen to any conversation without the offices knowing, but you need to know which wire to tap (symbol name + binary path).

Comparison table

Property Static tracepoint Kprobe Uprobe
Attaches to Pre-defined kernel instrumentation point Any kernel function Any userspace function
Stable ABI Yes — maintained across kernel versions No — functions renamed/inlined freely Depends on app — stable if USDT
Overhead when not attached Near zero (NOP sled) Zero (no code modification until attached) Zero (no code modification until attached)
Overhead when attached Low (~100ns per hit) Medium (~200-500ns, breakpoint trap) High (~1-5us, context switch to kernel)
Available count (typical) ~2,000 ~60,000 Unlimited (every ELF symbol)
Argument access Structured, typed, documented Raw registers, need BTF for types Raw registers, need debug symbols
Production safe Yes Careful — avoid hot-path functions Careful — high overhead on hot functions
Modern replacement fentry/fexit (faster, BTF-aware) USDT (stable, lower overhead)
The single most important thing on this page: use static tracepoints for production monitoring. Use kprobes for debugging and investigation. Use uprobes when you need to see inside applications. If you deploy kprobes in production and they break after a kernel update, that is on you.

The tracing filesystem

Before eBPF, before bpftrace, before bcc — there was ftrace. The kernel exposes its entire tracing infrastructure through a virtual filesystem. Understanding this filesystem is how you understand what eBPF tools are doing under the hood.

Where it lives

# Modern kernels (5.x+) mount it here:
ls /sys/kernel/tracing/

# Older kernels (or if tracefs isn't auto-mounted):
ls /sys/kernel/debug/tracing/

# Check which one your system uses:
mount | grep tracefs

Output:

tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)

Key files in the tracing filesystem

available_events

Every static tracepoint in the kernel, one per line, in subsystem:event format. This is the master catalog of stable instrumentation points.

available_filter_functions

Every kernel function you can attach a kprobe to. One per line. This list is what bpftrace -l 'kprobe:*' queries. ~60,000 entries on a typical kernel.

tracing_on

Write 1 to enable tracing, 0 to disable. A global kill switch for the ftrace ring buffer. Does not affect eBPF programs (they have their own output path).

trace_pipe

Streaming output of active ftrace traces. Like tail -f for the kernel. Reading it consumes the events (they don't replay). eBPF programs rarely use this — they write to perf buffers or maps instead.

# How many static tracepoints does your kernel have?
wc -l /sys/kernel/tracing/available_events
2147 /sys/kernel/tracing/available_events
# How many kprobe-able functions?
wc -l /sys/kernel/tracing/available_filter_functions
63842 /sys/kernel/tracing/available_filter_functions
# List all tracepoint categories (subsystems):
cat /sys/kernel/tracing/available_events | cut -d: -f1 | sort -u
alarmtimer
block
bpf_test_run
bpf_trace
bridge
cgroup
clk
compaction
cpuhp
devfreq
devlink
dma_fence
drm
exceptions
ext4
fib
fib6
filelock
filemap
fs_dax
gpio
huge_memory
hwmon
i2c
initcall
intel_iommu
io_uring
iocost
iomap
ipi
irq
irq_matrix
irq_vectors
jbd2
kmem
kvm
libata
lock
mce
mdio
migrate
mmap
mmap_lock
module
mptcp
napi
neigh
net
netfs
netlink
nmi
oom
page_isolation
page_pool
pagemap
percpu
power
printk
pwm
qdisc
random
ras
raw_syscalls
rcu
regmap
regulator
resctrl
rpm
rseq
rtc
sched
scsi
signal
skb
smbus
sock
spi
sunrpc
swap
syscalls
task
tcp
thermal
timer
tlb
udp
vmscan
vsyscall
wbt
workqueue
writeback
x86_fpu
xdp
xhci-hcd

That is every subsystem in the kernel that has stable instrumentation. Every one of those categories contains multiple tracepoints, each with typed arguments you can read from eBPF.

Raw ftrace before eBPF

You can use the tracing filesystem directly, without any eBPF tooling. This is useful when bpftrace is not installed or you need the absolute simplest trace possible.

# Enable the sched_switch tracepoint via raw ftrace:
echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable

# Watch it:
cat /sys/kernel/tracing/trace_pipe
 kworker/0:1-28    [000] d..2  1234.567890: sched_switch: prev_comm=kworker/0:1 prev_pid=28 prev_prio=120 prev_state=I ==> next_comm=bash next_pid=1842 next_prio=120
           bash-1842  [000] d..2  1234.567950: sched_switch: prev_comm=bash prev_pid=1842 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
# Clean up:
echo 0 > /sys/kernel/tracing/events/sched/sched_switch/enable

Why this matters

Every eBPF tracing tool is ultimately using these same kernel interfaces. When bpftrace attaches to tracepoint:sched:sched_switch, it is registering an eBPF program with the same kernel subsystem that powers the files above. Understanding the raw interface means you can debug your eBPF tools when they misbehave — and you can trace on minimal systems where bpftrace is not available.


Finding tracepoints — the discovery workflow

The hardest part of eBPF tracing is not writing the program. It is finding the right thing to attach to. Here is the systematic workflow for discovering what you can trace on any system.

Step 1: List available probes by type

# All static tracepoints:
bpftrace -l 'tracepoint:*' | head -20
tracepoint:alarmtimer:alarmtimer_cancel
tracepoint:alarmtimer:alarmtimer_fired
tracepoint:alarmtimer:alarmtimer_start
tracepoint:alarmtimer:alarmtimer_suspend
tracepoint:block:block_bio_backmerge
tracepoint:block:block_bio_bounce
tracepoint:block:block_bio_complete
tracepoint:block:block_bio_frontmerge
tracepoint:block:block_bio_queue
tracepoint:block:block_bio_remap
tracepoint:block:block_dirty_buffer
tracepoint:block:block_getrq
tracepoint:block:block_io_done
tracepoint:block:block_io_start
tracepoint:block:block_plug
tracepoint:block:block_rq_complete
tracepoint:block:block_rq_error
tracepoint:block:block_rq_insert
tracepoint:block:block_rq_issue
tracepoint:block:block_rq_merge
# All kprobe-able kernel functions:
bpftrace -l 'kprobe:*' | wc -l
63842
# Search for specific functionality:
bpftrace -l 'tracepoint:*' | grep -i tcp
tracepoint:tcp:tcp_bad_csum
tracepoint:tcp:tcp_cong_state_set
tracepoint:tcp:tcp_destroy_sock
tracepoint:tcp:tcp_probe
tracepoint:tcp:tcp_rcv_space_adjust
tracepoint:tcp:tcp_receive_reset
tracepoint:tcp:tcp_retransmit_skb
tracepoint:tcp:tcp_retransmit_synack
tracepoint:tcp:tcp_send_reset
tracepoint:sock:inet_sock_set_state
# Search kprobes for a specific area:
bpftrace -l 'kprobe:*' | grep -i wireguard
kprobe:wg_allowedips_insert_v4
kprobe:wg_allowedips_insert_v6
kprobe:wg_allowedips_lookup_dst
kprobe:wg_allowedips_lookup_src
kprobe:wg_cookie_message_consume
kprobe:wg_cookie_message_create
kprobe:wg_index_hashtable_insert
kprobe:wg_index_hashtable_lookup
kprobe:wg_noise_handshake_begin_session
kprobe:wg_noise_handshake_consume_initiation
kprobe:wg_noise_handshake_consume_response
kprobe:wg_noise_handshake_create_initiation
kprobe:wg_noise_handshake_create_response
kprobe:wg_packet_decrypt_worker
kprobe:wg_packet_encrypt_worker
kprobe:wg_packet_receive
kprobe:wg_packet_send_keepalive
kprobe:wg_packet_tx_worker
kprobe:wg_socket_send_buffer_to_peer
kprobe:wg_xmit

Step 2: Get the tracepoint arguments

Once you find a tracepoint, you need to know what data it provides. Static tracepoints have structured arguments. Kprobes give you function arguments via registers.

# View the format of a static tracepoint:
cat /sys/kernel/tracing/events/sched/sched_switch/format
name: sched_switch
ID: 316
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:char prev_comm[16];       offset:8;       size:16;        signed:0;
        field:pid_t prev_pid;   offset:24;      size:4; signed:1;
        field:int prev_prio;    offset:28;      size:4; signed:1;
        field:long prev_state;  offset:32;      size:8; signed:1;
        field:char next_comm[16];       offset:40;      size:16;        signed:0;
        field:pid_t next_pid;   offset:56;      size:4; signed:1;
        field:int next_prio;    offset:60;      size:4; signed:1;

print fmt: "prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d", REC->prev_comm, REC->prev_pid, REC->prev_prio, ...

Every field is documented: name, offset, size, signedness. In bpftrace you access them as args->prev_comm, args->prev_pid, etc.

# View the format of a TCP tracepoint:
cat /sys/kernel/tracing/events/tcp/tcp_retransmit_skb/format
name: tcp_retransmit_skb
ID: 1628
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:const void * skbaddr;     offset:8;       size:8; signed:0;
        field:const void * skaddr;      offset:16;      size:8; signed:0;
        field:int state;        offset:24;      size:4; signed:1;
        field:__u16 sport;      offset:28;      size:2; signed:0;
        field:__u16 dport;      offset:30;      size:2; signed:0;
        field:__u16 family;     offset:32;      size:2; signed:0;
        field:__u8 saddr[4];    offset:34;      size:4; signed:0;
        field:__u8 daddr[4];    offset:38;      size:4; signed:0;
        field:__u8 saddr_v6[16];        offset:42;      size:16;        signed:0;
        field:__u8 daddr_v6[16];        offset:58;      size:16;        signed:0;

print fmt: "sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c state=%s", ...

Step 3: Use BTF for kernel struct definitions

When tracing with kprobes, you often need to read struct fields from pointer arguments. BTF (BPF Type Format) embeds full type information in the kernel, so bpftrace and bcc can resolve struct layouts automatically.

# Check if BTF is available:
ls -la /sys/kernel/btf/vmlinux
-r--r--r-- 1 root root 5765432 Jan  1 00:00 /sys/kernel/btf/vmlinux
# Dump a specific struct definition:
bpftool btf dump file /sys/kernel/btf/vmlinux format c | grep -A 20 'struct task_struct {'
struct task_struct {
        struct thread_info thread_info;
        unsigned int __state;
        unsigned int saved_state;
        void *stack;
        refcount_t usage;
        unsigned int flags;
        unsigned int ptrace;
        int on_cpu;
        struct __call_single_node wake_entry;
        unsigned int wakee_flips;
        unsigned long wakee_flip_decay_ts;
        struct task_struct *last_wakee;
        int recent_used_cpu;
        int wake_cpu;
        int on_rq;
        int prio;
        int static_prio;
        int normal_prio;
        unsigned int rt_priority;
        ...
# Dump the sock struct (for network tracing):
bpftool btf dump file /sys/kernel/btf/vmlinux format c | grep -A 15 'struct sock {'
struct sock {
        struct sock_common __sk_common;
        struct dst_entry *sk_rx_dst;
        int sk_rx_dst_ifindex;
        u32 sk_rx_dst_cookie;
        socket_lock_t sk_lock;
        atomic_t sk_drops;
        int sk_rcvlowat;
        struct sk_buff_head sk_error_queue;
        struct sk_buff_head sk_receive_queue;
        struct sk_buff_head sk_write_queue;
        union { ... };
        unsigned long sk_flags;
        ...

The BTF revolution

Before BTF, kprobe programs needed to include kernel headers and be recompiled for each kernel version. BTF embeds the type information in the running kernel itself. Combined with CO-RE (Compile Once — Run Everywhere), a single eBPF binary works on any kernel version. bpftrace uses BTF automatically when available — you just access struct fields by name and it handles the rest.


Tracepoint categories — the major subsystems

The kernel organizes tracepoints into subsystem categories. Each category covers a specific area of kernel functionality. Here are the categories you will use most, with concrete examples from each.

sched — scheduler events

The scheduler decides which process runs on which CPU, when context switches happen, and how processes are created and destroyed. These tracepoints are the foundation of performance analysis.

# List all scheduler tracepoints:
bpftrace -l 'tracepoint:sched:*'
tracepoint:sched:sched_kthread_stop
tracepoint:sched:sched_kthread_stop_ret
tracepoint:sched:sched_kthread_work_execute_end
tracepoint:sched:sched_kthread_work_execute_start
tracepoint:sched:sched_kthread_work_queue_work
tracepoint:sched:sched_migrate_task
tracepoint:sched:sched_move_numa
tracepoint:sched:sched_pi_setprio
tracepoint:sched:sched_process_exec
tracepoint:sched:sched_process_exit
tracepoint:sched:sched_process_fork
tracepoint:sched:sched_process_free
tracepoint:sched:sched_process_wait
tracepoint:sched:sched_stat_blocked
tracepoint:sched:sched_stat_iowait
tracepoint:sched:sched_stat_runtime
tracepoint:sched:sched_stat_sleep
tracepoint:sched:sched_stat_wait
tracepoint:sched:sched_switch
tracepoint:sched:sched_wait_task
tracepoint:sched:sched_wake_idle_without_ipi
tracepoint:sched:sched_wakeup
tracepoint:sched:sched_wakeup_new
tracepoint:sched:sched_waking
# Count context switches by process over 5 seconds:
bpftrace -e 'tracepoint:sched:sched_switch { @[args->next_comm] = count(); }' -d 5
Attaching 1 probe...

@[swapper/0]: 8432
@[swapper/1]: 7891
@[kworker/0:1]: 1234
@[bash]: 456
@[postgres]: 312
@[nginx]: 287
@[containerd]: 198
@[sshd]: 87
# Track process creation chain (fork + exec):
bpftrace -e '
tracepoint:sched:sched_process_fork {
  printf("FORK: %s[%d] -> child[%d]\n", args->parent_comm, args->parent_pid, args->child_pid);
}
tracepoint:sched:sched_process_exec {
  printf("EXEC: %s[%d] -> %s\n", comm, pid, args->filename);
}'
Attaching 2 probes...
FORK: bash[1842] -> child[4521]
EXEC: bash[4521] -> /usr/bin/ls
FORK: sshd[1200] -> child[4522]
EXEC: sshd[4522] -> /usr/sbin/sshd
FORK: sshd[4522] -> child[4523]
EXEC: sshd[4523] -> /bin/bash
# Measure CPU time per process (scheduler runtime accounting):
bpftrace -e '
tracepoint:sched:sched_stat_runtime {
  @cpu_ns[args->comm, args->pid] = sum(args->runtime);
}' -d 5
Attaching 1 probe...

@cpu_ns[postgres, 2341]: 487234567
@cpu_ns[nginx, 1567]: 234567890
@cpu_ns[python3, 3456]: 198765432
@cpu_ns[kworker/0:1, 28]: 45678901
@cpu_ns[bash, 1842]: 12345678

block — I/O events

Every disk I/O request passes through the block layer. These tracepoints let you see every read and write, measure latency, and identify which processes are hammering your disks.

# List block tracepoints:
bpftrace -l 'tracepoint:block:*'
tracepoint:block:block_bio_backmerge
tracepoint:block:block_bio_bounce
tracepoint:block:block_bio_complete
tracepoint:block:block_bio_frontmerge
tracepoint:block:block_bio_queue
tracepoint:block:block_bio_remap
tracepoint:block:block_dirty_buffer
tracepoint:block:block_getrq
tracepoint:block:block_io_done
tracepoint:block:block_io_start
tracepoint:block:block_plug
tracepoint:block:block_rq_complete
tracepoint:block:block_rq_error
tracepoint:block:block_rq_insert
tracepoint:block:block_rq_issue
tracepoint:block:block_rq_merge
tracepoint:block:block_rq_remap
tracepoint:block:block_rq_requeue
tracepoint:block:block_split
tracepoint:block:block_touch_buffer
tracepoint:block:block_unplug
# I/O latency histogram for all block devices:
bpftrace -e '
tracepoint:block:block_rq_issue {
  @start[args->dev, args->sector] = nsecs;
}
tracepoint:block:block_rq_complete /@start[args->dev, args->sector]/ {
  @usecs = hist((nsecs - @start[args->dev, args->sector]) / 1000);
  delete(@start[args->dev, args->sector]);
}' -d 10
Attaching 2 probes...

@usecs:
[0]                    2 |                                                    |
[1]                   14 |@@                                                  |
[2, 4)                87 |@@@@@@@@@@@@@                                       |
[4, 8)               312 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    |
[8, 16)              341 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32)             198 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      |
[32, 64)              67 |@@@@@@@@@@                                          |
[64, 128)             23 |@@@                                                 |
[128, 256)             8 |@                                                   |
[256, 512)             2 |                                                    |
# Top I/O consumers by process and operation type:
bpftrace -e '
tracepoint:block:block_rq_issue {
  @io_bytes[comm, args->rwbs] = sum(args->bytes);
  @io_count[comm, args->rwbs] = count();
}' -d 10
Attaching 1 probe...

@io_bytes[postgres, R]: 145678336
@io_bytes[z_wr_iss, W]: 67891234
@io_bytes[txg_sync, W]: 34567890
@io_bytes[nginx, R]: 12345678
@io_bytes[systemd-journal, W]: 2345678

@io_count[postgres, R]: 4521
@io_count[z_wr_iss, W]: 1234
@io_count[txg_sync, W]: 567
@io_count[nginx, R]: 312
@io_count[systemd-journal, W]: 89
Notice those z_wr_iss and txg_sync processes in the I/O output? Those are ZFS kernel threads. z_wr_iss issues write I/O for ZFS, and txg_sync flushes transaction groups to disk. On a ZFS system, most of your disk writes come from these threads, not from the application that generated the data. Block tracepoints show you the real I/O path.

net — networking events

Network tracepoints cover packet transmission, reception, and protocol-level events. Combined with the TCP and UDP subsystem tracepoints, you can trace a packet from wire to application.

# Key networking tracepoints:
bpftrace -l 'tracepoint:net:*'
bpftrace -l 'tracepoint:tcp:*'
bpftrace -l 'tracepoint:udp:*'
bpftrace -l 'tracepoint:sock:*'
tracepoint:net:napi_gro_frags_entry
tracepoint:net:napi_gro_frags_exit
tracepoint:net:napi_gro_receive_entry
tracepoint:net:napi_gro_receive_exit
tracepoint:net:net_dev_queue
tracepoint:net:net_dev_start_xmit
tracepoint:net:net_dev_xmit
tracepoint:net:net_dev_xmit_timeout
tracepoint:net:netif_receive_skb
tracepoint:net:netif_receive_skb_entry
tracepoint:net:netif_receive_skb_exit
tracepoint:net:netif_receive_skb_list_entry
tracepoint:net:netif_receive_skb_list_exit
tracepoint:net:netif_rx
tracepoint:net:netif_rx_entry
tracepoint:net:netif_rx_exit
tracepoint:tcp:tcp_bad_csum
tracepoint:tcp:tcp_cong_state_set
tracepoint:tcp:tcp_destroy_sock
tracepoint:tcp:tcp_probe
tracepoint:tcp:tcp_rcv_space_adjust
tracepoint:tcp:tcp_receive_reset
tracepoint:tcp:tcp_retransmit_skb
tracepoint:tcp:tcp_retransmit_synack
tracepoint:tcp:tcp_send_reset
tracepoint:udp:udp_fail_queue_rcv_skb
tracepoint:sock:inet_sock_set_state
# Track TCP retransmissions with source/dest:
bpftrace -e '
tracepoint:tcp:tcp_retransmit_skb {
  printf("RETRANSMIT: %s:%d -> %s:%d state=%d\n",
    ntop(args->saddr), args->sport,
    ntop(args->daddr), args->dport,
    args->state);
}'
Attaching 1 probe...
RETRANSMIT: 10.0.1.5:45678 -> 10.0.2.10:443 state=1
RETRANSMIT: 10.0.1.5:45678 -> 10.0.2.10:443 state=1
RETRANSMIT: 10.0.1.5:52341 -> 172.16.0.3:5432 state=1
# Track TCP connection state changes (full lifecycle):
bpftrace -e '
tracepoint:sock:inet_sock_set_state {
  if (args->protocol == IPPROTO_TCP) {
    printf("TCP %s:%d -> %s:%d  %d -> %d  (%s)\n",
      ntop(args->saddr), args->sport,
      ntop(args->daddr), args->dport,
      args->oldstate, args->newstate,
      comm);
  }
}'
Attaching 1 probe...
TCP 10.0.1.5:0 -> 10.0.2.10:443  7 -> 2  (curl)
TCP 10.0.1.5:45892 -> 10.0.2.10:443  2 -> 1  (curl)
TCP 10.0.1.5:45892 -> 10.0.2.10:443  1 -> 4  (curl)
TCP 10.0.1.5:45892 -> 10.0.2.10:443  4 -> 8  (curl)
TCP 10.0.1.5:45892 -> 10.0.2.10:443  8 -> 7  (curl)

State numbers: 2=SYN_SENT, 1=ESTABLISHED, 4=FIN_WAIT1, 8=CLOSE_WAIT, 7=CLOSE. That output is the complete lifecycle of a TCP connection from curl — SYN_SENT, ESTABLISHED, FIN_WAIT1, CLOSE_WAIT, CLOSE.

# Bytes sent per connection, grouped by destination:
bpftrace -e '
tracepoint:net:net_dev_xmit {
  @bytes_out[args->name] = sum(args->len);
  @pkts_out[args->name] = count();
}' -d 10
Attaching 1 probe...

@bytes_out[eth0]: 45678234
@bytes_out[wg0]: 12345678
@bytes_out[lo]: 2345678

@pkts_out[eth0]: 34521
@pkts_out[wg0]: 9876
@pkts_out[lo]: 1234

syscalls — every system call is traceable

The syscalls subsystem provides enter and exit tracepoints for every system call. This is the most comprehensive tracing surface in the kernel — every interaction between userspace and the kernel passes through a syscall.

# How many syscall tracepoints?
bpftrace -l 'tracepoint:syscalls:*' | wc -l
734
# That's 367 syscalls x 2 (enter + exit):
bpftrace -l 'tracepoint:syscalls:sys_enter_*' | head -20
tracepoint:syscalls:sys_enter_accept
tracepoint:syscalls:sys_enter_accept4
tracepoint:syscalls:sys_enter_access
tracepoint:syscalls:sys_enter_acct
tracepoint:syscalls:sys_enter_add_key
tracepoint:syscalls:sys_enter_adjtimex
tracepoint:syscalls:sys_enter_alarm
tracepoint:syscalls:sys_enter_arch_prctl
tracepoint:syscalls:sys_enter_bind
tracepoint:syscalls:sys_enter_bpf
tracepoint:syscalls:sys_enter_brk
tracepoint:syscalls:sys_enter_capget
tracepoint:syscalls:sys_enter_capset
tracepoint:syscalls:sys_enter_chdir
tracepoint:syscalls:sys_enter_chmod
tracepoint:syscalls:sys_enter_chown
tracepoint:syscalls:sys_enter_chroot
tracepoint:syscalls:sys_enter_clock_adjtime
tracepoint:syscalls:sys_enter_clock_getres
tracepoint:syscalls:sys_enter_clock_gettime
# Count syscalls by type for a specific process:
bpftrace -e '
tracepoint:raw_syscalls:sys_enter /pid == 1842/ {
  @syscalls[ksym(@args[1])] = count();
}' -d 5
# More practically — count syscalls by name using the syscall ID:
bpftrace -e '
tracepoint:raw_syscalls:sys_enter /comm == "nginx"/ {
  @[args->id] = count();
}' -d 10
Attaching 1 probe...

@[0]: 12456    // read
@[1]: 11234    // write
@[7]: 8901     // poll
@[232]: 4567   // epoll_wait
@[45]: 2345    // recvfrom
@[44]: 2341    // sendto
@[3]: 1234     // close
@[257]: 567    // openat
@[9]: 234      // mmap
@[47]: 123     // recvmsg

Other important categories

writeback — page cache flushing

Traces dirty page writeback to disk. Critical for understanding I/O spikes caused by the page cache flushing. Key events: writeback_dirty_page, writeback_pages_written, writeback_start, writeback_written. When your system freezes for 30 seconds during heavy writes, these tracepoints tell you exactly what the writeback thread is doing.

compaction — memory compaction

Memory compaction moves pages around to create contiguous blocks for huge pages. On high-memory systems this can cause latency spikes. Key events: mm_compaction_begin, mm_compaction_end, mm_compaction_migratepages. If your database has random latency spikes, check compaction before blaming the disk.

kmem — kernel memory allocation

Traces kmalloc, kfree, page allocations, and slab cache operations. Useful for finding kernel memory leaks and understanding memory pressure. Key events: kmalloc, kfree, mm_page_alloc, mm_page_free.

signal — process signals

Traces signal delivery: which process sent which signal to which target. Key events: signal_generate, signal_deliver. Catches OOM kills (SIGKILL from the OOM killer), segfaults (SIGSEGV), and processes being terminated by other processes.

timer — kernel timers

Traces timer creation, expiration, and cancellation. Key events: timer_start, timer_expire_entry, timer_expire_exit, hrtimer_start, hrtimer_expire_entry. Useful for debugging timer storms or understanding why a process wakes up on a specific schedule.

irq — interrupt handling

Traces hardware and software interrupt entry/exit. Key events: irq_handler_entry, irq_handler_exit, softirq_entry, softirq_exit. When you see high %si (soft interrupt) in top, these tracepoints tell you exactly which soft IRQ is consuming CPU.

workqueue — deferred work

Traces kernel work items queued for deferred execution. Key events: workqueue_queue_work, workqueue_execute_start, workqueue_execute_end. The kernel uses workqueues extensively — deferred network processing, filesystem operations, device driver work. If kworker threads are consuming CPU, these tracepoints show you exactly what work they are running.

vmscan — memory reclaim

Traces page reclaim (when the kernel needs to free memory). Key events: mm_vmscan_direct_reclaim_begin, mm_vmscan_direct_reclaim_end, mm_vmscan_lru_shrink_inactive. Direct reclaim means a process had to stop and wait for the kernel to free memory before its allocation could succeed — a major latency source.

ZFS internals via kprobes

ZFS does not have static kernel tracepoints (it is an out-of-tree module). But since it loads as a kernel module, every exported function is kprobe-able. This gives you deep visibility into ZFS internals.

# Find all ZFS functions available for kprobes:
bpftrace -l 'kprobe:*' | grep -c '^kprobe:zfs_'
487
# Find ZIO (ZFS I/O pipeline) functions:
bpftrace -l 'kprobe:*' | grep '^kprobe:zio_'
kprobe:zio_alloc_zil
kprobe:zio_assess
kprobe:zio_buf_alloc
kprobe:zio_buf_free
kprobe:zio_checksum_generate
kprobe:zio_checksum_verify
kprobe:zio_child_stage
kprobe:zio_claim
kprobe:zio_close
kprobe:zio_compress_data
kprobe:zio_create
kprobe:zio_data_buf_alloc
kprobe:zio_data_buf_free
kprobe:zio_decompress_data
kprobe:zio_done
kprobe:zio_execute
kprobe:zio_flush
kprobe:zio_free
kprobe:zio_gang_tree_assemble
kprobe:zio_interrupt
kprobe:zio_nowait
kprobe:zio_read
kprobe:zio_read_phys
kprobe:zio_resume
kprobe:zio_rewrite
kprobe:zio_root
kprobe:zio_shrink
kprobe:zio_suspend
kprobe:zio_taskq_dispatch
kprobe:zio_trim
kprobe:zio_unique_parent
kprobe:zio_vdev_child_io
kprobe:zio_vdev_io_assess
kprobe:zio_vdev_io_done
kprobe:zio_vdev_io_start
kprobe:zio_wait
kprobe:zio_write
kprobe:zio_write_phys
# Find ARC (Adaptive Replacement Cache) functions:
bpftrace -l 'kprobe:*' | grep '^kprobe:arc_'
kprobe:arc_access
kprobe:arc_adapt
kprobe:arc_buf_access
kprobe:arc_buf_add_ref
kprobe:arc_buf_alloc
kprobe:arc_buf_destroy
kprobe:arc_buf_fill
kprobe:arc_buf_free
kprobe:arc_buf_info
kprobe:arc_buf_size
kprobe:arc_change_state
kprobe:arc_evict
kprobe:arc_evict_hdr
kprobe:arc_getbuf_func
kprobe:arc_hdr_alloc
kprobe:arc_hdr_destroy
kprobe:arc_hdr_realloc
kprobe:arc_init
kprobe:arc_loan_buf
kprobe:arc_loan_raw_buf
kprobe:arc_read
kprobe:arc_read_done
kprobe:arc_release
kprobe:arc_return_buf
kprobe:arc_write
kprobe:arc_write_done
# Trace ARC read operations with hit/miss counting:
bpftrace -e '
kprobe:arc_read {
  @arc_reads = count();
}
kprobe:arc_read_done {
  @arc_read_completions = count();
}
kprobe:arc_evict {
  @arc_evictions = count();
}' -d 30
Attaching 3 probes...

@arc_reads: 45678
@arc_read_completions: 45678
@arc_evictions: 234
# Trace TXG (transaction group) sync timing:
bpftrace -e '
kprobe:txg_sync_thread {
  @txg_syncs = count();
}
kprobe:spa_sync {
  @start = nsecs;
}
kretprobe:spa_sync /@start/ {
  @spa_sync_ms = hist((nsecs - @start) / 1000000);
  delete(@start);
}' -d 60
Attaching 3 probes...

@txg_syncs: 12
@spa_sync_ms:
[0]                    1 |@@@@@@@@                                            |
[1]                    3 |@@@@@@@@@@@@@@@@@@@@@@@@                             |
[2, 4)                 5 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@            |
[4, 8)                 2 |@@@@@@@@@@@@@@@@                                     |
[8, 16)                1 |@@@@@@@@                                            |
ZFS kprobes are the most powerful ZFS debugging tool nobody talks about. You can trace the entire I/O pipeline — from zio_create through compression, checksum, and vdev dispatch. You can watch the ARC evict pages under memory pressure. You can measure exactly how long each TXG sync takes and correlate it with application latency. The ZFS SPL (Solaris Porting Layer) functions are kprobe-able too — taskq_dispatch, kmem_cache_alloc, all of it.

Writing stable probes for production

Production tracing has different requirements than debugging. You need probes that survive kernel updates, have predictable overhead, and do not break at 3am.

Rule 1: Tracepoints over kprobes

Static tracepoints are stable ABI. The kernel developers commit to maintaining their names and argument formats across versions. Kprobes attach to internal function names that can change without notice. If a tracepoint exists for what you need, always prefer it.

Rule 2: fentry/fexit over kprobe/kretprobe

When you must probe kernel functions (no tracepoint available), use fentry/fexit instead of kprobe/kretprobe. fentry/fexit use the kernel's ftrace infrastructure instead of breakpoint traps. They are 2-5x faster, support BTF-typed arguments natively, and avoid the stack depth issues that plague kretprobes.

# kprobe (old way — breakpoint trap, untyped args):
bpftrace -e 'kprobe:tcp_sendmsg { @bytes = sum(arg2); }'

# fentry (modern way — ftrace hook, BTF-typed args):
bpftrace -e 'fentry:tcp_sendmsg { @bytes = sum(args->size); }'

Both do the same thing — count bytes sent through TCP. But the fentry version is faster (no breakpoint trap), the argument is named (args->size vs arg2), and the type is known at attach time.

Property kprobe/kretprobe fentry/fexit
Mechanism INT3 breakpoint trap ftrace function hook (NOP patching)
Overhead ~200-500ns per hit ~50-100ns per hit
Argument access Raw registers (arg0, arg1, ...) BTF-typed structs (args->field)
Return value access kretprobe + retval fexit has both args and retval
Stack depth limit Yes — kretprobe uses a limited stack No — uses the function's own stack frame
Kernel requirement Any kernel with eBPF 5.5+ with BTF
Nesting safe No — recursion can deadlock Yes — handled by ftrace framework

Why kprobes break

Here are the most common ways kprobes fail after a kernel update, and what to do about it.

# This kprobe worked on kernel 5.15:
bpftrace -e 'kprobe:__tcp_transmit_skb { @sends = count(); }'

# On kernel 6.1 it fails — function was renamed:
# ERROR: kprobe:__tcp_transmit_skb not found

# The function is now:
bpftrace -l 'kprobe:*' | grep tcp_transmit
# kprobe:tcp_transmit_skb    (no double underscore)

# Or it might have been inlined entirely:
bpftrace -l 'kprobe:*' | grep tcp_transmit
# (nothing — the compiler inlined it into the caller)

This is why the static tracepoint tracepoint:tcp:tcp_retransmit_skb exists. It provides the same information but with a stable name that never changes.


Tracepoint arguments and struct access

Static tracepoint arguments are accessed via the args pointer in bpftrace. Each field is named and typed. Kprobe arguments require knowing the function signature and accessing registers or BTF-typed parameters.

Static tracepoint argument access

# The sched_switch tracepoint provides these fields:
# args->prev_comm, args->prev_pid, args->prev_prio, args->prev_state
# args->next_comm, args->next_pid, args->next_prio

bpftrace -e '
tracepoint:sched:sched_switch {
  printf("CPU%-2d  %s[%d] prio=%d -> %s[%d] prio=%d\n",
    cpu,
    args->prev_comm, args->prev_pid, args->prev_prio,
    args->next_comm, args->next_pid, args->next_prio);
}'
Attaching 1 probe...
CPU0   nginx[1567] prio=120 -> postgres[2341] prio=120
CPU0   postgres[2341] prio=120 -> kworker/0:1[28] prio=120
CPU1   bash[1842] prio=120 -> swapper/1[0] prio=120
CPU0   kworker/0:1[28] prio=120 -> nginx[1567] prio=120
CPU1   swapper/1[0] prio=120 -> sshd[4523] prio=120

Kprobe argument access with BTF

# Without BTF (raw registers):
bpftrace -e 'kprobe:tcp_sendmsg {
  printf("pid=%d bytes=%d\n", pid, arg2);
}'

# With BTF (typed args — requires kernel 5.5+ with BTF):
bpftrace -e 'kprobe:tcp_sendmsg {
  printf("pid=%d bytes=%d\n", pid, args->size);
}'

Reading kernel memory safely

When a tracepoint or kprobe gives you a pointer to a kernel struct, you cannot dereference it directly from eBPF. The kernel provides helper functions for safe memory access.

# Reading a string from a kernel pointer (e.g., filename from openat):
bpftrace -e '
tracepoint:syscalls:sys_enter_openat {
  printf("%s opened %s\n", comm, str(args->filename));
}'
Attaching 1 probe...
nginx opened /var/log/nginx/access.log
postgres opened /var/lib/pgsql/data/base/16384/2619
bash opened /etc/profile
systemd opened /proc/1/cgroup
# Reading struct fields from a pointer (task_struct):
bpftrace -e '
kprobe:wake_up_new_task {
  $task = (struct task_struct *)arg0;
  printf("New task: comm=%s pid=%d tgid=%d ppid=%d\n",
    $task->comm, $task->pid, $task->tgid, $task->real_parent->pid);
}'
Attaching 1 probe...
New task: comm=ls pid=4592 tgid=4592 ppid=1842
New task: comm=grep pid=4593 tgid=4593 ppid=1842
New task: comm=worker pid=4594 tgid=4594 ppid=1567

bpf_probe_read vs direct access

In raw C eBPF programs, you must call bpf_probe_read_kernel() to read kernel memory. bpftrace handles this automatically — when you access $task->comm, it generates the bpf_probe_read_kernel call for you. In bcc (Python+C), you must call it explicitly. In libbpf (C), CO-RE with BTF handles it via BPF_CORE_READ() macros. The verifier rejects any program that tries to dereference a kernel pointer directly.


Uprobe deep dive — tracing userspace applications

Uprobes attach to functions in userspace binaries. The kernel patches the first instruction of the target function with a breakpoint (in memory only, not on disk). When execution hits that address, the kernel fires your eBPF program, then resumes the original function. You can trace any ELF binary without modifying or restarting it.

PostgreSQL query tracing

# Find the PostgreSQL binary:
which postgres
/usr/bin/postgres
# List available symbols in the PostgreSQL binary:
nm -D /usr/bin/postgres | grep -i exec | head -20
00000000004a2b40 T ExecutorEnd
00000000004a2a80 T ExecutorFinish
00000000004a28c0 T ExecutorRun
00000000004a2710 T ExecutorStart
0000000000572340 T exec_bind_message
0000000000571e80 T exec_describe_portal_message
0000000000571b40 T exec_describe_statement_message
0000000000570820 T exec_execute_message
00000000005713a0 T exec_parse_message
0000000000570240 T exec_simple_query
# Trace every SQL query execution:
bpftrace -e '
uprobe:/usr/bin/postgres:exec_simple_query {
  printf("QUERY [pid=%d]: %s\n", pid, str(arg0));
}'
Attaching 1 probe...
QUERY [pid=2341]: SELECT * FROM users WHERE id = 42
QUERY [pid=2342]: INSERT INTO events (type, ts) VALUES ('login', now())
QUERY [pid=2341]: UPDATE sessions SET last_seen = now() WHERE user_id = 42
QUERY [pid=2343]: SELECT count(*) FROM events WHERE ts > now() - interval '1 hour'
# Measure query execution time:
bpftrace -e '
uprobe:/usr/bin/postgres:exec_simple_query {
  @start[tid] = nsecs;
  @query[tid] = str(arg0);
}
uretprobe:/usr/bin/postgres:exec_simple_query /@start[tid]/ {
  $dur_ms = (nsecs - @start[tid]) / 1000000;
  printf("QUERY [%dms]: %s\n", $dur_ms, @query[tid]);
  @latency_ms = hist($dur_ms);
  delete(@start[tid]);
  delete(@query[tid]);
}'
Attaching 2 probes...
QUERY [0ms]: SELECT 1
QUERY [2ms]: SELECT * FROM users WHERE id = 42
QUERY [145ms]: SELECT count(*) FROM events WHERE ts > now() - interval '1 hour'
QUERY [1203ms]: VACUUM ANALYZE users

@latency_ms:
[0]                   45 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                   23 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@                        |
[2, 4)                12 |@@@@@@@@@@@@@@                                      |
[4, 8)                 8 |@@@@@@@@@                                           |
[8, 16)                5 |@@@@@@                                              |
[16, 32)               3 |@@@                                                 |
[32, 64)               2 |@@                                                  |
[64, 128)              1 |@                                                   |
[128, 256)             1 |@                                                   |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               1 |@                                                   |

nginx request tracing

# Find probeable nginx functions:
nm -D /usr/sbin/nginx | grep -i 'http.*request' | head -10
0000000000468a20 T ngx_http_close_request
0000000000467b40 T ngx_http_create_request
0000000000468120 T ngx_http_finalize_request
00000000004688c0 T ngx_http_free_request
0000000000467540 T ngx_http_process_request
0000000000467960 T ngx_http_process_request_headers
0000000000467340 T ngx_http_process_request_line
# Count HTTP requests per second:
bpftrace -e '
uprobe:/usr/sbin/nginx:ngx_http_process_request {
  @requests = count();
}
interval:s:1 {
  printf("requests/sec: %d\n", @requests);
  clear(@requests);
}'
Attaching 2 probes...
requests/sec: 1234
requests/sec: 1456
requests/sec: 1389
requests/sec: 1401

Go program tracing

Go binaries are statically linked with full symbol tables by default. This makes them excellent uprobe targets — every function is visible. But Go's ABI is unusual: arguments are passed on the stack (not in registers) in older Go versions, and the function names include the full package path.

# List Go symbols in a binary:
nm /usr/local/bin/myapp | grep 'main\.' | head -10
000000000048a220 T main.handleRequest
000000000048a440 T main.processJob
000000000048a680 T main.connectDB
000000000048a8c0 T main.main
000000000048ab00 T main.init
# Trace Go function calls:
bpftrace -e '
uprobe:/usr/local/bin/myapp:main.handleRequest {
  @requests = count();
}
uprobe:/usr/local/bin/myapp:main.connectDB {
  @db_connects = count();
}'

Go uprobe gotchas

Go 1.17+ uses register-based calling convention on amd64, so arg0, arg1 work for function arguments. Older Go (pre-1.17) passes arguments on the stack — you need sarg0, sarg1 instead. Go's goroutine scheduler moves goroutines between OS threads, so tid is unreliable for tracking request duration. Use goroutine IDs from the runtime.g struct if you need per-request tracking.

Python tracing

# Python has USDT probes (if compiled with --enable-dtrace):
bpftrace -l 'usdt:/usr/bin/python3:*'
usdt:/usr/bin/python3:python:audit
usdt:/usr/bin/python3:python:function__entry
usdt:/usr/bin/python3:python:function__return
usdt:/usr/bin/python3:python:gc__done
usdt:/usr/bin/python3:python:gc__start
usdt:/usr/bin/python3:python:import__find__load__done
usdt:/usr/bin/python3:python:import__find__load__start
usdt:/usr/bin/python3:python:line
# Trace Python function calls:
bpftrace -e '
usdt:/usr/bin/python3:python:function__entry {
  printf("CALL %s:%s:%d\n", str(arg0), str(arg1), arg2);
}'
Attaching 1 probe...
CALL /app/server.py:handle_request:45
CALL /app/server.py:validate_input:67
CALL /app/db.py:execute_query:23
CALL /app/db.py:fetch_results:89
CALL /app/server.py:format_response:102

USDT — User Statically Defined Tracing

USDT probes are instrumentation points that application developers embed in their source code, similar to how kernel developers create static tracepoints. They provide a stable tracing API that survives application updates. Major applications ship with USDT probes: PostgreSQL, MySQL, Node.js, Python, Ruby, Java (via the JVM), and many more.

USDT vs raw uprobes

A raw uprobe attaches to a function name. If the developer renames the function or inlines it, your probe breaks. USDT probes are explicitly placed by the developer with stable names and documented arguments — they are the equivalent of static tracepoints for userspace. The overhead of an inactive USDT probe is a single NOP instruction — zero cost until you attach.

Uprobes are like tapping a phone wire by physical location. USDT is like the phone having a dedicated monitoring port that the manufacturer installed for you.

Listing USDT probes

# List all USDT probes in a binary:
bpftrace -l 'usdt:/usr/bin/postgres:*'
usdt:/usr/bin/postgres:postgresql:buffer__checkpoint__done
usdt:/usr/bin/postgres:postgresql:buffer__checkpoint__start
usdt:/usr/bin/postgres:postgresql:buffer__checkpoint__sync__start
usdt:/usr/bin/postgres:postgresql:buffer__flush__done
usdt:/usr/bin/postgres:postgresql:buffer__flush__start
usdt:/usr/bin/postgres:postgresql:buffer__read__done
usdt:/usr/bin/postgres:postgresql:buffer__read__start
usdt:/usr/bin/postgres:postgresql:buffer__sync__done
usdt:/usr/bin/postgres:postgresql:buffer__sync__start
usdt:/usr/bin/postgres:postgresql:buffer__sync__written
usdt:/usr/bin/postgres:postgresql:checkpoint__done
usdt:/usr/bin/postgres:postgresql:checkpoint__start
usdt:/usr/bin/postgres:postgresql:clog__checkpoint__done
usdt:/usr/bin/postgres:postgresql:clog__checkpoint__start
usdt:/usr/bin/postgres:postgresql:deadlock__found
usdt:/usr/bin/postgres:postgresql:lock__wait__done
usdt:/usr/bin/postgres:postgresql:lock__wait__start
usdt:/usr/bin/postgres:postgresql:lwlock__acquire
usdt:/usr/bin/postgres:postgresql:lwlock__condacquire
usdt:/usr/bin/postgres:postgresql:lwlock__condacquire__fail
usdt:/usr/bin/postgres:postgresql:lwlock__release
usdt:/usr/bin/postgres:postgresql:lwlock__wait__done
usdt:/usr/bin/postgres:postgresql:lwlock__wait__start
usdt:/usr/bin/postgres:postgresql:query__done
usdt:/usr/bin/postgres:postgresql:query__execute__done
usdt:/usr/bin/postgres:postgresql:query__execute__start
usdt:/usr/bin/postgres:postgresql:query__parse__done
usdt:/usr/bin/postgres:postgresql:query__parse__start
usdt:/usr/bin/postgres:postgresql:query__plan__done
usdt:/usr/bin/postgres:postgresql:query__plan__start
usdt:/usr/bin/postgres:postgresql:query__rewrite__done
usdt:/usr/bin/postgres:postgresql:query__rewrite__start
usdt:/usr/bin/postgres:postgresql:query__start
usdt:/usr/bin/postgres:postgresql:sort__done
usdt:/usr/bin/postgres:postgresql:sort__start
usdt:/usr/bin/postgres:postgresql:statement__status
usdt:/usr/bin/postgres:postgresql:transaction__abort
usdt:/usr/bin/postgres:postgresql:transaction__commit
usdt:/usr/bin/postgres:postgresql:transaction__start
usdt:/usr/bin/postgres:postgresql:wal__buffer__write__dirty__done
usdt:/usr/bin/postgres:postgresql:wal__buffer__write__dirty__start
usdt:/usr/bin/postgres:postgresql:wal__insert
usdt:/usr/bin/postgres:postgresql:wal__switch

That is 40+ stable instrumentation points in PostgreSQL alone. You can trace query parsing, planning, execution, buffer I/O, WAL operations, checkpoint progress, lock contention, and transaction lifecycle — all without modifying PostgreSQL or attaching a debugger.

MySQL USDT probes

# List MySQL USDT probes:
bpftrace -l 'usdt:/usr/sbin/mysqld:*' | head -20
usdt:/usr/sbin/mysqld:mysql:command__done
usdt:/usr/sbin/mysqld:mysql:command__start
usdt:/usr/sbin/mysqld:mysql:connection__done
usdt:/usr/sbin/mysqld:mysql:connection__start
usdt:/usr/sbin/mysqld:mysql:filesort__done
usdt:/usr/sbin/mysqld:mysql:filesort__start
usdt:/usr/sbin/mysqld:mysql:handler__rdlock__done
usdt:/usr/sbin/mysqld:mysql:handler__rdlock__start
usdt:/usr/sbin/mysqld:mysql:handler__wrlock__done
usdt:/usr/sbin/mysqld:mysql:handler__wrlock__start
usdt:/usr/sbin/mysqld:mysql:net__read__done
usdt:/usr/sbin/mysqld:mysql:net__read__start
usdt:/usr/sbin/mysqld:mysql:net__write__done
usdt:/usr/sbin/mysqld:mysql:net__write__start
usdt:/usr/sbin/mysqld:mysql:query__done
usdt:/usr/sbin/mysqld:mysql:query__exec__done
usdt:/usr/sbin/mysqld:mysql:query__exec__start
usdt:/usr/sbin/mysqld:mysql:query__parse__done
usdt:/usr/sbin/mysqld:mysql:query__parse__start
usdt:/usr/sbin/mysqld:mysql:query__start
# Trace MySQL query execution with timing:
bpftrace -e '
usdt:/usr/sbin/mysqld:mysql:query__start {
  @start[tid] = nsecs;
  @query[tid] = str(arg0);
}
usdt:/usr/sbin/mysqld:mysql:query__done /@start[tid]/ {
  $dur_ms = (nsecs - @start[tid]) / 1000000;
  printf("[%dms] %s\n", $dur_ms, @query[tid]);
  @latency = hist($dur_ms);
  delete(@start[tid]);
  delete(@query[tid]);
}'
Attaching 2 probes...
[0ms] SELECT 1
[1ms] SELECT * FROM products WHERE id = 7823
[3ms] INSERT INTO orders (user_id, product_id, qty) VALUES (42, 7823, 1)
[87ms] SELECT o.*, p.name FROM orders o JOIN products p ON o.product_id = p.id WHERE o.user_id = 42 ORDER BY o.created_at DESC LIMIT 50
[2341ms] ALTER TABLE events ADD INDEX idx_created_at (created_at)

Node.js USDT probes

# Node.js with --enable-dtrace-probes:
bpftrace -l 'usdt:/usr/bin/node:*'
usdt:/usr/bin/node:node:gc__done
usdt:/usr/bin/node:node:gc__start
usdt:/usr/bin/node:node:http__client__request
usdt:/usr/bin/node:node:http__client__response
usdt:/usr/bin/node:node:http__server__request
usdt:/usr/bin/node:node:http__server__response
usdt:/usr/bin/node:node:net__server__connection
usdt:/usr/bin/node:node:net__stream__end
# Trace Node.js HTTP server requests:
bpftrace -e '
usdt:/usr/bin/node:node:http__server__request {
  printf("HTTP %s %s from %s:%d\n", str(arg4), str(arg5), str(arg2), arg3);
}
usdt:/usr/bin/node:node:http__server__response {
  printf("HTTP response sent\n");
}'

Java JVM USDT probes

# The JVM ships with extensive USDT probes:
bpftrace -l 'usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:*' | head -20
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:class__loaded
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:class__unloaded
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:compiled__method__load
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:compiled__method__unload
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:gc__begin
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:gc__end
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:mem__pool__gc__begin
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:mem__pool__gc__end
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:method__compile__begin
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:method__compile__end
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:method__entry
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:method__return
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:monitor__contended__enter
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:monitor__contended__entered
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:monitor__contended__exit
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:monitor__wait
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:object__alloc
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:thread__start
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:thread__stop
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:vm__init__begin
# Trace JVM garbage collection:
bpftrace -e '
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:gc__begin {
  @gc_start[tid] = nsecs;
  @gc_count = count();
}
usdt:/usr/lib/jvm/java-17/lib/server/libjvm.so:hotspot:gc__end /@gc_start[tid]/ {
  $dur_ms = (nsecs - @gc_start[tid]) / 1000000;
  printf("GC pause: %dms\n", $dur_ms);
  @gc_latency_ms = hist($dur_ms);
  delete(@gc_start[tid]);
}'
Attaching 2 probes...
GC pause: 12ms
GC pause: 8ms
GC pause: 234ms
GC pause: 11ms
GC pause: 9ms
GC pause: 1456ms

@gc_latency_ms:
[0]                    0 |                                                    |
[1]                    0 |                                                    |
[2, 4)                 0 |                                                    |
[4, 8)                 0 |                                                    |
[8, 16)                4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32)               0 |                                                    |
[32, 64)               0 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)             1 |@@@@@@@@@@@@@                                       |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               1 |@@@@@@@@@@@@@                                       |
USDT is criminally underused. Every major database, web server, and runtime ships with dozens of probes already compiled in. Most people reach for application-level logging or metrics libraries when the instrumentation is already there, baked into the binary, waiting for someone to attach. Zero overhead until you connect. Zero code changes required. Zero restarts. Check what your application ships with before you add another logging framework.

Real examples with full output

Trace every syscall from a specific process with timing

# Trace all syscalls from PID 2341 (postgres) with nanosecond timing:
bpftrace -e '
tracepoint:raw_syscalls:sys_enter /pid == 2341/ {
  @start[tid] = nsecs;
  @sc[tid] = args->id;
}
tracepoint:raw_syscalls:sys_exit /pid == 2341 && @start[tid]/ {
  $dur = nsecs - @start[tid];
  @syscall_ns[@sc[tid]] = hist($dur);
  @total_ns[@sc[tid]] = sum($dur);
  @count[@sc[tid]] = count();
  delete(@start[tid]);
  delete(@sc[tid]);
}' -d 10
Attaching 2 probes...

@count[0]: 8945       // read
@count[1]: 7823       // write
@count[232]: 4521     // epoll_wait
@count[17]: 2341      // pread64
@count[18]: 1234      // pwrite64
@count[3]: 567        // close
@count[257]: 234      // openat

@total_ns[0]: 234567890123     // read: 234ms total
@total_ns[232]: 189012345678   // epoll_wait: 189ms total (mostly sleeping)
@total_ns[17]: 45678901234     // pread64: 45ms total
@total_ns[1]: 12345678901      // write: 12ms total

@syscall_ns[0]:  // read latency distribution
[256, 512)        123 |@@@@                                                  |
[512, 1K)         456 |@@@@@@@@@@@@@@@@                                      |
[1K, 2K)         1234 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          |
[2K, 4K)         1456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |
[4K, 8K)          987 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                    |
[8K, 16K)         345 |@@@@@@@@@@@@                                          |
[16K, 32K)         89 |@@@                                                   |
[32K, 64K)         23 |                                                      |

Trace ZFS ARC hits/misses via kprobe

# Track ARC hit ratio in real time:
bpftrace -e '
kprobe:arc_read {
  @arc_total = count();
}
kretprobe:arc_read /retval == 0/ {
  @arc_hits = count();
}
kretprobe:arc_read /retval != 0/ {
  @arc_misses = count();
}
interval:s:5 {
  printf("ARC: total=%d hits=%d misses=%d\n",
    @arc_total, @arc_hits, @arc_misses);
}'
Attaching 4 probes...
ARC: total=12456 hits=11987 misses=469
ARC: total=24891 hits=24012 misses=879
ARC: total=37234 hits=35987 misses=1247
ARC: total=49567 hits=47901 misses=1666
# Track ARC evictions by state (MRU vs MFU):
bpftrace -e '
kprobe:arc_evict {
  @evict_calls = count();
}
kprobe:arc_change_state {
  @state_changes = count();
}
kprobe:arc_hdr_destroy {
  @headers_destroyed = count();
}' -d 30
Attaching 3 probes...

@evict_calls: 45
@state_changes: 23456
@headers_destroyed: 234

Trace TCP state changes for connection lifecycle analysis

# Full TCP connection lifecycle with timing:
bpftrace -e '
tracepoint:sock:inet_sock_set_state {
  if (args->protocol == IPPROTO_TCP) {
    $now = nsecs;

    if (args->newstate == 1) {  // ESTABLISHED
      @established[args->sport, args->dport] = $now;
    }

    if (args->oldstate == 1 && @established[args->sport, args->dport]) {
      $dur_sec = ($now - @established[args->sport, args->dport]) / 1000000000;
      printf("CONNECTION CLOSED: %s:%d -> %s:%d  duration=%ds  (%s)\n",
        ntop(args->saddr), args->sport,
        ntop(args->daddr), args->dport,
        $dur_sec, comm);
      @conn_duration_sec = hist($dur_sec);
      delete(@established[args->sport, args->dport]);
    }
  }
}'
Attaching 1 probe...
CONNECTION CLOSED: 10.0.1.5:45892 -> 10.0.2.10:443 duration=2s (curl)
CONNECTION CLOSED: 10.0.1.5:45893 -> 10.0.2.10:443 duration=0s (curl)
CONNECTION CLOSED: 10.0.1.5:52341 -> 172.16.0.3:5432 duration=3600s (postgres)
CONNECTION CLOSED: 10.0.1.5:38901 -> 10.0.2.10:80 duration=45s (nginx)

@conn_duration_sec:
[0]                   23 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                   12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@                        |
[2, 4)                 8 |@@@@@@@@@@@@@@@@@@                                  |
[4, 8)                 3 |@@@@@@@                                             |
[8, 16)                2 |@@@@                                                |
[16, 32)               1 |@@                                                  |
[32, 64)               2 |@@@@                                                |
[64, 128)              0 |                                                    |
[128, 256)             0 |                                                    |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               1 |@@                                                  |

Trace disk I/O per ZFS dataset

# This requires correlating block I/O with ZFS's internal
# dataset tracking. We trace zio_create to capture the dataset context:
bpftrace -e '
kprobe:zio_read {
  @zfs_reads = count();
  @zfs_read_bytes = sum(arg2);
}
kprobe:zio_write {
  @zfs_writes = count();
  @zfs_write_bytes = sum(arg2);
}
interval:s:10 {
  printf("--- 10 second summary ---\n");
  printf("reads:  %d ops, %d bytes\n", @zfs_reads, @zfs_read_bytes);
  printf("writes: %d ops, %d bytes\n", @zfs_writes, @zfs_write_bytes);
}'
Attaching 3 probes...
--- 10 second summary ---
reads:  4521 ops, 589234176 bytes
writes: 1234 ops, 167890432 bytes
--- 10 second summary ---
reads:  3987 ops, 512345088 bytes
writes: 2345 ops, 345678912 bytes
# For per-dataset breakdown, use arc_read with the dataset objset ID:
bpftrace -e '
kprobe:dmu_read {
  @dmu_reads[arg1] = count();  // arg1 is offset, but we can group by objset
}
kprobe:dmu_write {
  @dmu_writes[arg1] = count();
}' -d 30

Trace WireGuard packet processing

# Trace WireGuard encryption/decryption operations:
bpftrace -e '
kprobe:wg_packet_encrypt_worker {
  @encrypt_ops = count();
  @encrypt_start[tid] = nsecs;
}
kretprobe:wg_packet_encrypt_worker /@encrypt_start[tid]/ {
  @encrypt_ns = hist(nsecs - @encrypt_start[tid]);
  delete(@encrypt_start[tid]);
}
kprobe:wg_packet_decrypt_worker {
  @decrypt_ops = count();
  @decrypt_start[tid] = nsecs;
}
kretprobe:wg_packet_decrypt_worker /@decrypt_start[tid]/ {
  @decrypt_ns = hist(nsecs - @decrypt_start[tid]);
  delete(@decrypt_start[tid]);
}' -d 30
Attaching 4 probes...

@encrypt_ops: 45678
@decrypt_ops: 43210

@encrypt_ns:
[512, 1K)         234 |@@@@                                                  |
[1K, 2K)         4567 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |
[2K, 4K)         3456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@              |
[4K, 8K)         1234 |@@@@@@@@@@@@@@                                        |
[8K, 16K)         345 |@@@@                                                  |

@decrypt_ns:
[512, 1K)         198 |@@@                                                   |
[1K, 2K)         4321 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |
[2K, 4K)         3210 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                |
[4K, 8K)         1098 |@@@@@@@@@@@@@                                         |
[8K, 16K)         287 |@@@                                                   |
# Trace WireGuard handshakes (key exchanges):
bpftrace -e '
kprobe:wg_noise_handshake_create_initiation {
  printf("WG handshake INITIATION created [%s pid=%d]\n", comm, pid);
  @initiations = count();
}
kprobe:wg_noise_handshake_consume_initiation {
  printf("WG handshake INITIATION consumed [%s pid=%d]\n", comm, pid);
}
kprobe:wg_noise_handshake_create_response {
  printf("WG handshake RESPONSE created [%s pid=%d]\n", comm, pid);
}
kprobe:wg_noise_handshake_consume_response {
  printf("WG handshake RESPONSE consumed [%s pid=%d]\n", comm, pid);
  @completed_handshakes = count();
}
kprobe:wg_noise_handshake_begin_session {
  printf("WG SESSION established [%s pid=%d]\n", comm, pid);
}'
Attaching 5 probes...
WG handshake INITIATION created [wg-crypt-wg0 pid=0]
WG handshake RESPONSE consumed [wg-crypt-wg0 pid=0]
WG SESSION established [wg-crypt-wg0 pid=0]
WG handshake INITIATION consumed [wg-crypt-wg0 pid=0]
WG handshake RESPONSE created [wg-crypt-wg0 pid=0]
WG SESSION established [wg-crypt-wg0 pid=0]

Trace container syscalls by cgroup

# Trace syscalls per container using cgroup ID:
bpftrace -e '
tracepoint:raw_syscalls:sys_enter {
  @syscalls_by_cgroup[cgroup] = count();
}' -d 10
Attaching 1 probe...

@syscalls_by_cgroup[/sys/fs/cgroup/system.slice/sshd.service]: 4567
@syscalls_by_cgroup[/sys/fs/cgroup/system.slice/nginx.service]: 34521
@syscalls_by_cgroup[/sys/fs/cgroup/system.slice/docker-abc123.scope]: 89012
@syscalls_by_cgroup[/sys/fs/cgroup/system.slice/docker-def456.scope]: 12345
@syscalls_by_cgroup[/sys/fs/cgroup/user.slice/user-1000.slice]: 2345
# Filter to a specific container and get syscall breakdown:
bpftrace -e '
tracepoint:raw_syscalls:sys_enter
/cgroup == cgroupid("/sys/fs/cgroup/system.slice/docker-abc123.scope")/ {
  @container_syscalls[args->id] = count();
}' -d 10
Attaching 1 probe...

@container_syscalls[0]: 12456      // read
@container_syscalls[1]: 11234      // write
@container_syscalls[232]: 8901     // epoll_wait
@container_syscalls[202]: 4567     // futex
@container_syscalls[230]: 2341     // clock_nanosleep
@container_syscalls[257]: 1234     // openat
@container_syscalls[3]: 567        // close
@container_syscalls[9]: 234        // mmap
Container tracing by cgroup is the production use case that justifies the entire eBPF investment. Traditional monitoring gives you per-process metrics. eBPF gives you per-container kernel-level visibility without any agent inside the container. You can trace every syscall, every network packet, every disk I/O from outside the container, at the kernel level, where it cannot be evaded or tampered with. This is why Cilium and Falco exist.

Building a tracepoint inventory

Before you start tracing, build a catalog of what is available on your specific system. Different kernels, different configs, different loaded modules all change what tracepoints exist.

# Script to dump complete tracepoint inventory with counts:
#!/bin/bash
echo "=== System Tracepoint Inventory ==="
echo "Kernel: $(uname -r)"
echo "Date:   $(date -Iseconds)"
echo ""

echo "=== Static Tracepoints ==="
total=$(cat /sys/kernel/tracing/available_events | wc -l)
echo "Total: $total"
echo ""
echo "By category:"
cat /sys/kernel/tracing/available_events | \
  cut -d: -f1 | sort | uniq -c | sort -rn | head -30

echo ""
echo "=== Kprobe Functions ==="
total=$(cat /sys/kernel/tracing/available_filter_functions | wc -l)
echo "Total: $total"
echo ""
echo "Top prefixes (by module/subsystem):"
cat /sys/kernel/tracing/available_filter_functions | \
  sed 's/_.*//' | sort | uniq -c | sort -rn | head -20

echo ""
echo "=== BTF Status ==="
if [ -f /sys/kernel/btf/vmlinux ]; then
  size=$(stat -c%s /sys/kernel/btf/vmlinux)
  echo "BTF available: /sys/kernel/btf/vmlinux ($size bytes)"
  types=$(bpftool btf dump file /sys/kernel/btf/vmlinux format raw 2>/dev/null | wc -l)
  echo "Type definitions: $types"
else
  echo "BTF not available (kernel compiled without CONFIG_DEBUG_INFO_BTF)"
fi

echo ""
echo "=== Loaded Module Tracepoints ==="
for mod in $(lsmod | tail -n +2 | awk '{print $1}'); do
  count=$(bpftrace -l "kprobe:${mod}_*" 2>/dev/null | wc -l)
  if [ "$count" -gt 0 ]; then
    echo "  $mod: $count functions"
  fi
done
=== System Tracepoint Inventory ===
Kernel: 6.1.0-26-amd64
Date:   2026-04-04T14:23:01-04:00

=== Static Tracepoints ===
Total: 2147

By category:
    367 syscalls
    156 ext4
     98 kmem
     89 block
     78 sched
     67 net
     56 writeback
     52 tcp
     48 irq
     45 timer
     42 signal
     38 workqueue
     35 vmscan
     32 compaction
     28 filemap
     25 sock
     24 pagemap
     22 skb
     21 random
     20 power
     18 module
     17 rcu
     16 cgroup
     15 mmap
     14 huge_memory
     13 io_uring
     12 xdp
     11 fib
     10 qdisc

=== Kprobe Functions ===
Total: 63842

Top prefixes (by module/subsystem):
   4521 __
   2341 nf
   1987 tcp
   1876 ext4
   1654 ip
   1432 sk
   1321 net
   1234 page
   1198 sched
   1087 blk
    987 dm
    876 kvm
    765 usb
    654 pci
    567 drm
    456 crypto
    345 xfs
    234 nfs
    198 scsi
    167 zfs

=== BTF Status ===
BTF available: /sys/kernel/btf/vmlinux (5765432 bytes)
Type definitions: 142567

=== Loaded Module Tracepoints ===
  zfs: 487 functions
  spl: 123 functions
  wireguard: 42 functions
  nf_tables: 234 functions
  kvm: 876 functions
  ext4: 0 functions
# Quick one-liner to find tracepoints related to a topic:
bpftrace -l '*' | grep -i "zfs\|arc\|zio\|txg\|spa\|vdev" | head -30
# Export full inventory to a file for diffing between systems:
bpftrace -l '*' > /tmp/tracepoint-inventory-$(hostname)-$(date +%Y%m%d).txt
wc -l /tmp/tracepoint-inventory-*.txt
67234 /tmp/tracepoint-inventory-prod-web-01-20260404.txt

Why inventory matters

Different kernel configs expose different tracepoints. A distro kernel might have 2,000 tracepoints while a custom kernel has 1,500 (or 3,000). Loaded modules add kprobe targets dynamically — if ZFS is not loaded, those 487 functions are not available. If you write a tracing script on one machine and deploy it to another, the inventory diff tells you what will break. Keep inventories for each machine class and diff them when scripts fail.


Common pitfalls

Kprobe on inlined function

The compiler inlines small functions for performance. When a function is inlined, it no longer exists as a separate symbol — its code is merged into the caller. Your kprobe silently fails to attach or attaches to nothing. Check with bpftrace -l 'kprobe:function_name' first. If it does not appear, the function is inlined. You need to probe the caller instead, or find a tracepoint that covers the same event.

Missing BTF

BTF requires CONFIG_DEBUG_INFO_BTF=y at kernel compile time. Without it, bpftrace cannot resolve struct field names for kprobes and you are stuck with raw register offsets (arg0, arg1). Most distro kernels since 5.4+ ship with BTF. Check: ls /sys/kernel/btf/vmlinux. If missing, you need the kernel-debuginfo package or a custom kernel build.

Struct layout changes

Kernel structs change between versions. A field at offset 48 in kernel 5.15 might be at offset 56 in kernel 6.1. Raw offset access (*(uint32 *)(arg0 + 48)) breaks silently — you read the wrong field, get garbage data, and never get an error. BTF + CO-RE solves this by looking up field offsets at load time. Always use named field access (args->field) instead of manual offsets.

Overhead on high-frequency probes

Some tracepoints fire millions of times per second. kmem:kmalloc fires on every kernel memory allocation. sched:sched_switch fires on every context switch. Attaching a probe that does printf or map updates on these events can consume significant CPU. Use frequency limiting (@sample = count(); if (@sample % 1000 == 0) { ... }) or just use count()/hist() aggregations without printf.

Missing debug symbols for uprobes

Uprobes attach by function name (symbol). Stripped binaries have no symbols — nm shows nothing, and your uprobe cannot find the function. Install debug packages: dnf debuginfo-install postgresql-server (RHEL/CentOS) or apt install postgresql-15-dbgsym (Debian/Ubuntu). For Go binaries, do not strip with -s -w ldflags. For Rust, keep debug = true in the release profile.

USDT not compiled in

USDT probes must be compiled into the binary at build time. Not all distro packages include them. PostgreSQL needs --enable-dtrace at configure time. Python needs --enable-dtrace. Node.js needs --with-dtrace. Check with bpftrace -l 'usdt:/path/to/binary:*' — if it returns nothing, the binary was built without USDT support. You may need to rebuild from source or find a package that includes DTrace probes.

Recursion in kprobes

If your kprobe handler calls a function that triggers the same kprobe, you get infinite recursion and a kernel deadlock or crash. Classic example: probing printk and using printf in the handler (which calls printk). The kernel has some protection against this, but it is not foolproof. Avoid probing low-level functions that your probe handler depends on. fentry/fexit handle this better via the ftrace recursion guard.

Stack depth limits

eBPF programs have a 512-byte stack limit. This is enforced by the verifier at load time. If your probe declares too many local variables or builds large strings, it will fail to load with Looks like the BPF stack limit of 512 bytes is exceeded. Solutions: use fewer local variables, use maps instead of stack variables, split complex logic across multiple probes with map-based communication, or use tail calls to chain programs.

The stack depth limit is the one that bites everyone eventually. You write a beautiful 30-line bpftrace script, it works great, you add one more variable, and suddenly the verifier rejects it with a cryptic error about stack usage. The fix is almost always to move intermediate values into maps. Maps live in kernel memory, not on the stack. The downside is maps are slower to access and you have to manage cleanup yourself.

Quick reference

Task Command
List all static tracepoints bpftrace -l 'tracepoint:*'
List all kprobe targets bpftrace -l 'kprobe:*'
Search for tracepoints by keyword bpftrace -l 'tracepoint:*' | grep tcp
View tracepoint format/arguments cat /sys/kernel/tracing/events/sched/sched_switch/format
List USDT probes in a binary bpftrace -l 'usdt:/path/to/binary:*'
Check BTF availability ls /sys/kernel/btf/vmlinux
Dump kernel struct via BTF bpftool btf dump file /sys/kernel/btf/vmlinux format c | grep -A 20 'struct name {'
Count events from a tracepoint bpftrace -e 'tracepoint:sched:sched_switch { @count = count(); }'
Histogram of kprobe latency bpftrace -e 'kprobe:func { @start[tid]=nsecs; } kretprobe:func /@start[tid]/ { @ns=hist(nsecs-@start[tid]); delete(@start[tid]); }'
List available subsystems ls /sys/kernel/tracing/events/
Count kprobe-able functions wc -l /sys/kernel/tracing/available_filter_functions
Find ZFS kprobe targets bpftrace -l 'kprobe:*' | grep '^kprobe:z'
← eBPF Performance Custom eBPF Programs →