Observability — Beginner
Start here if you’ve never traced a system call or looked at a flame
graph. Everything below runs on kldloadOS out of the box — no extra
packages to install on Debian, and a single kpkg install on
CentOS/RHEL.
Level 0: What am I looking at?
kst — your first command
kst
This is the one-command health check. It shows: - Is the ZFS pool healthy? - How much disk space is left? - Are snapshots running? - How many boot environments do I have? - Is anything using too much memory? - Are my services running?
If kst looks green, your system is fine. If something is
yellow or red, keep reading.
System diagnostics — the full picture
kldloadOS includes comprehensive diagnostic scripts that collect everything about your system into a single Markdown report:
# Debian
sudo diagnostics.sh
# CentOS/RHEL
sudo rhel-diag.sh
This generates diagnostic.md — a complete snapshot of: -
Network interfaces, routes, DNS - Failed services, disk usage - Firewall
rules, listening ports - Package status, security updates - ZFS pool
health, disk SMART status - CPU/memory/I/O pressure
When to use it: Something is wrong and you don’t know where to start. Run the diagnostics, read the report, search for the red flags.
Level 1: What’s happening right now?
Watch processes
# What's using CPU?
top
# Better top (if installed)
htop
# What processes just launched?
# (This is your first eBPF command)
execsnoop
execsnoop uses eBPF under the hood — it hooks into the
kernel’s execve syscall and prints every new process as it
starts. No performance impact, no log parsing, just live data.
Watch files being opened
opensnoop
Shows every file open in real time — which process, which file, whether it succeeded or failed. Useful for finding “file not found” errors, permission denials, or figuring out what config files an application reads at startup.
Watch network connections
# New TCP connections
tcpconnect
# TCP sessions with duration and bytes
tcplife
tcpconnect shows every outbound TCP connection with the
process that made it. tcplife is like
tcpconnect but waits for connections to close and shows how
long they lasted and how much data moved.
Watch disk I/O
# I/O latency histogram
biolatency
# Slow filesystem operations (>10ms)
fileslower 10
# Per-process I/O stats
biotop
Level 2: Your first bpftrace one-liner
bpftrace is like awk for the kernel. You
write tiny programs that attach to kernel events.
Who’s calling open()?
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args.filename)); }'
Output:
nginx /etc/nginx/nginx.conf
sshd /etc/ssh/sshd_config
bash /etc/profile
What this means: -
tracepoint:syscalls:sys_enter_openat — fires every time any
process opens a file - comm — the process name -
args.filename — the file being opened
How big are read() calls?
bpftrace -e 'tracepoint:syscalls:sys_exit_read /args.ret > 0/ { @bytes = hist(args.ret); }'
Press Ctrl+C to see a histogram of read sizes. If most reads are tiny (1–16 bytes), something might be doing inefficient I/O.
Count syscalls by process
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
Ctrl+C after a few seconds. Shows which processes are making the most syscalls — a quick way to find noisy applications.
Level 3: Log forensics with LogHog
LogHog (lh) stitches logs from multiple sources by
timestamp and gives you an interactive forensics interface:
lh
Interactive menu: - (A)uth — authentication failures (brute force, bad passwords, key rejections) - (E)rrors — system errors across all logs - (L)ive All — tail all logs stitched by timestamp - (N)etwork — network protocol activity (HTTP, SSH, DNS, etc.) - (R)egEX — search by pattern - (I)P Search — extract and locate IP addresses - (J)SON Export — structured output for further analysis
Example: find authentication failures
Launch lh, press A. It pulls auth failures
from syslog, auth.log, journald, and sshd logs, stitched together
chronologically:
Mar 21 14:23:01 sshd[1234]: Failed password for root from 203.0.113.5 port 43210
Mar 21 14:23:02 sshd[1234]: Failed password for root from 203.0.113.5 port 43210
Mar 21 14:23:03 sshd[1234]: Connection closed by 203.0.113.5 port 43210
Example: follow everything live
Launch lh, press L. All logs from all
sources, merged by timestamp, streaming in real time. Ctrl+C to
stop.
Installing the tools
Debian (most tools pre-installed)
eBPF tools (bpftrace, bpfcc-tools,
bpftool, linux-perf) are in the kldloadOS base
image.
For LogHog:
kpkg install libjson-c-dev libreadline-dev
cd /opt
git clone https://github.com/unixbox-net/linux-tools.git
cd linux-tools/debian/utils/lh
./install.sh
CentOS/RHEL
# eBPF tools
kpkg install bcc-tools bpftrace perf
# LogHog
kpkg install json-c-devel readline-devel
cd /opt
git clone https://github.com/unixbox-net/linux-tools.git
cd linux-tools/debian/utils/lh
./install.sh
The diagnostics scripts work out of the box:
# Use the right one for your distro
sudo /opt/linux-tools/debian/diagnostics/diagnostics.sh # Debian
sudo /opt/linux-tools/rhel/rhel-diag.sh # CentOS/RHEL
Cheat sheet
| I want to… | Run this |
|---|---|
| Quick health check | kst |
| Full system diagnostic | diagnostics.sh or rhel-diag.sh |
| See new processes | execsnoop |
| See file opens | opensnoop |
| See TCP connections | tcpconnect |
| See TCP session details | tcplife |
| See disk I/O latency | biolatency |
| See slow file ops | fileslower 10 |
| Watch all logs live | lh → L |
| Find auth failures | lh → A |
| Count syscalls by process | bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' |
Next level
Once you’re comfortable with the BCC tools and basic bpftrace, move on to Observability — Intermediate to build real-time dashboards with socket_snoop, latency_snoop, and Prometheus.