| pick your distro, get ZFS on root
kldload — your platform, your way, free
Source

Labeling & Asset Management Masterclass

This guide covers everything about properly naming, labeling, tagging, and managing infrastructure assets on OpenZFS — from the sticker on a physical drive to automated fleet inventory pulled directly from ZFS properties. It starts at the physical layer and ends with a fully automated CMDB built from filesystem metadata.

By the end, every disk in your fleet has a scannable QR code linking to its RMA URL, every pool name tells you its environment, region, purpose, SLA, and media class, every dataset carries machine-readable tags that drive backup, replication, and alerting policy automatically — and a new operator can walk up to any rack in any datacenter and know exactly what they are looking at without opening a single wiki page.

Why labeling is the foundation of operations: At 3 AM when a disk fails, you need to know: which pool, which vdev, which slot, which rack, which vendor, and where to order the replacement. Without labels, you are guessing. With labels, you are executing. The difference between a homelab and production is not the hardware — it is the labeling. A homelab has tank and /dev/sda. Production has prd-caw1-db-gold-nvme and UB-DSK-CAW1-88322. The second tells you everything: production, CA-West-1, database tier, gold SLA, NVMe class. The first tells you nothing.

What this masterclass builds: A complete labeling system — physical disk labels, pool naming conventions, ZFS custom properties as infrastructure tags, dataset hierarchies, and automated inventory — everything needed to run a production environment where any operator can walk up to any rack and know exactly what they are looking at. This masterclass teaches you to name things so the name IS the documentation.

1. Physical Disk Labels — What Goes on the Sticker

A physical disk label is not a bureaucratic nicety. It is the first line of incident response. When a drive fails, you walk to the rack, pull the right drive, and start the replacement. The label on that drive contains everything you need to do the next ten steps without stopping to look anything up.

The complete label template used in production:

PHYSICAL LOCATION
  Region:     CA-WEST-1
  Datacenter: YVR01
  Building:   A
  Row:        R12
  Rack:       R12-08
  Chassis:    CH01
  Slot:       SLOT07

ZFS INFORMATION
  ZFS Pool:   prd-caw1-db-gold-nvme
  VDEV:       slot07
  Role:       DATA VDEV
  Layout:     draid2:10d:2c:128s

HARDWARE DETAILS
  Vendor:     Samsung
  Model:      PM9A3
  Interface:  NVMe PCIe 4.0 x4
  Capacity:   3.84 TB
  Serial:     S6ZUNX0R123456A
  Firmware:   EDA92Q5Q
  SMART Base: 2025-01-01

LIFECYCLE / INVENTORY
  Asset ID:   UB-DSK-CAW1-88322
  Installed:  2025-02-12
  Warranty:   2028-02-12
  Supplier:   CDW Canada
  RMA URL:    https://cdw.ca/rma/S6ZUNX0R123456A
  Reorder URL: https://cdw.ca/p/samsung-pm9a3-3.84tb/PM9A3-3840

What each field means and why it matters

Physical Location — the drill-down hierarchy that gets a tech to the right chassis slot. Region is the geographic zone (aligned with your WireGuard mesh topology so the naming is consistent across infrastructure layers). Datacenter is the facility code. Building, Row, Rack, Chassis, and Slot are the physical path. A tech who has never been to this datacenter reads YVR01 / A / R12 / R12-08 / CH01 / SLOT07 and walks directly to the right drive without asking anyone for directions.

ZFS Information — what OpenZFS knows about this drive. The pool name encodes environment, location, role, tier, and media class (covered in section 3). The VDEV name matches the physical slot identifier so you can correlate zpool status output directly to the label. The Role field tells you whether this is a data vdev, a spare, a SLOG, or an L2ARC — which matters when you are deciding whether to pull the drive immediately or let resilver finish first. The Layout field records the exact dRAID or RAIDZ geometry at the time of installation.

Hardware Details — the vendor, model, interface, and firmware needed for procurement and warranty claims. The Serial is the primary key for warranty and RMA lookups. SMART Base records the date when baseline SMART data was captured so you can compute drive age and track normalized attribute degradation over time.

Lifecycle / Inventory — the operational fields. Asset ID follows a structured scheme (covered below). Installed and Warranty dates let you compute time-to-warranty-expiry without a spreadsheet. The Supplier field tells you who to call. RMA URL and Reorder URL are the killer features: a tech scans the QR code, taps the RMA URL, and the return authorization process starts immediately from their phone.

Asset ID structure

The Asset ID format UB-DSK-CAW1-88322 encodes: organization prefix (UB), asset class (DSK for disk, SRV for server, NET for network gear, PDU for power), region code (CAW1), and a sequential 5-digit number within that region and class. The sequential number is assigned at procurement — not at installation — so an ordered drive already has an asset ID before it arrives, and the label can be printed before the drive ships.

# Asset ID prefix table
UB-SRV-{REGION}-{SEQ}  — server (compute node)
UB-DSK-{REGION}-{SEQ}  — disk (any storage medium)
UB-NET-{REGION}-{SEQ}  — network device (switch, router, ToR)
UB-PDU-{REGION}-{SEQ}  — PDU or UPS
UB-CAB-{REGION}-{SEQ}  — cable or patch panel
UB-JBD-{REGION}-{SEQ}  — JBOD enclosure

QR code: JSON metadata on the label

The QR code on the label encodes the full drive record as JSON. Any phone can scan it. The JSON is the same structure used by the inventory database, so a scan is also an inventory lookup:

{
  "asset_id": "UB-DSK-CAW1-88322",
  "serial": "S6ZUNX0R123456A",
  "zfs": {
    "pool": "prd-caw1-db-gold-nvme",
    "vdev": "slot07",
    "role": "data",
    "layout": "draid2:10d:2c:128s"
  },
  "location": {
    "region": "CA-WEST-1",
    "datacenter": "YVR01",
    "building": "A",
    "row": "R12",
    "rack": "R12-08",
    "chassis": "CH01",
    "slot": "SLOT07"
  },
  "hardware": {
    "vendor": "Samsung",
    "model": "PM9A3",
    "capacity_tb": 3.84,
    "interface": "NVMe",
    "firmware": "EDA92Q5Q"
  },
  "lifecycle": {
    "installed": "2025-02-12",
    "warranty_expiry": "2028-02-12",
    "supplier": "CDW Canada"
  }
}

Generating labels from live system data

Labels should be generated from actual hardware data, not typed by hand. This script pulls SMART data, correlates it with ZFS pool membership, and outputs the label text ready to print:

#!/bin/bash
# gen-disk-label.sh — generate a disk label from live system data
# Usage: gen-disk-label.sh /dev/disk/by-id/nvme-Samsung_PM9A3_S6ZUNX0R123456A
#
# Requires: smartmontools, jq, zpool

DISK="$1"
if [[ -z "$DISK" ]]; then
  echo "Usage: $0 /dev/disk/by-id/..." >&2
  exit 1
fi

# Resolve to real device
REALDEV=$(realpath "$DISK")

# Pull SMART data
SMART=$(smartctl -j -a "$REALDEV" 2>/dev/null)
VENDOR=$(echo "$SMART" | jq -r '.device.type // "unknown"')
MODEL=$(echo "$SMART" | jq -r '.model_name // "unknown"')
SERIAL=$(echo "$SMART" | jq -r '.serial_number // "unknown"')
FIRMWARE=$(echo "$SMART" | jq -r '.firmware_version // "unknown"')
CAPACITY=$(echo "$SMART" | jq -r '(.user_capacity.bytes // 0) / 1e12 | . * 100 | round / 100 | tostring + " TB"')

# Find which pool this disk belongs to
POOL=$(zpool status | awk -v dev="$(basename $REALDEV)" '
  /^  pool:/ { pool=$2 }
  $0 ~ dev   { print pool; exit }
')
VDEV=$(zpool status "$POOL" 2>/dev/null | awk -v dev="$(basename $REALDEV)" '
  prev ~ /slot[0-9]+/ && $0 ~ dev { print prev }
  { prev=$1 }
' || echo "unknown")

# Output label text
cat <

Set the environment variables (REGION, DC, RACK, ASSET_ID, etc.) from your provisioning system or a per-rack config file before running. The script fills in everything it can from live hardware data automatically.

The QR code is the killer feature. A tech walks up to a failed drive, scans the QR code with their phone, and sees: the pool it belongs to, the RMA URL to start the replacement, the reorder URL to buy a new one, the warranty expiry. No spreadsheet. No wiki. No "call Todd, he knows the setup." The label IS the documentation. The QR code IS the inventory lookup. Every field on the label was chosen because it answers a specific question someone will have at exactly the worst possible moment.

2. Pool Naming Conventions — The Name IS the Documentation

A ZFS pool name is permanent. You cannot rename a pool in place — you would have to export it, create a new pool with the new name, and transfer all data. Get the naming convention right before creating the first pool, because you are living with it.

The convention: {env}-{region}-{role}-{tier}-{media}

env — environment

prd production — stg staging — dev development — tst test — dr disaster recovery

The first token tells you the blast radius. Never confuse prd with dev.

region — geographic zone

caw1 CA-West-1 — use1 US-East-1 — euw1 EU-West-1 — aps1 AP-South-1

Matches your WireGuard mesh region codes. Consistent naming across layers.

role — workload purpose

db database — web web servers — stor object/file storage — vm virtual machines — k8s Kubernetes — mon monitoring — bak backup

The purpose of the data, not the technology. Databases go on db pools regardless of engine.

tier — SLA class

gold — highest durability, mirrored or dRAID2+, replicated to DR
silver — standard production, RAIDZ2, replicated daily
bronze — dev/test/backup, RAIDZ1 or single disk, no DR target

The tier drives the backup frequency, replication target, and monitoring sensitivity.

media — storage class

nvme NVMe SSD — ssd SATA/SAS SSD — hdd spinning disk — mix heterogeneous (SLOG on NVMe, data on HDD)

Media class tells you the expected I/O characteristics without running benchmarks.

Examples

# Production examples
prd-caw1-db-gold-nvme    # production, CA-West-1, database, gold SLA, NVMe
prd-caw1-vm-gold-nvme    # production, CA-West-1, VMs, gold SLA, NVMe
prd-caw1-stor-silver-hdd # production, CA-West-1, storage, silver SLA, HDD
prd-use1-db-gold-nvme    # production, US-East-1, database, gold SLA, NVMe

# Staging examples
stg-caw1-db-silver-ssd   # staging, CA-West-1, database, silver SLA, SSD
stg-use1-web-bronze-ssd  # staging, US-East-1, web, bronze SLA, SSD

# Development examples
dev-use1-web-bronze-ssd  # development, US-East-1, web, bronze SLA, SSD
dev-caw1-db-bronze-ssd   # development, CA-West-1, database, bronze SLA, SSD

# DR site examples
dr-euw1-bak-silver-hdd   # DR, EU-West-1, backup, silver SLA, HDD
dr-aps1-db-gold-nvme     # DR, AP-South-1, database, gold SLA, NVMe

After adopting this convention, zpool list becomes an infrastructure overview:

$ zpool list
NAME                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH
dr-euw1-bak-silver-hdd    120T  67.2T  52.8T        -         -    12%    56%   1.00  ONLINE
prd-caw1-db-gold-nvme      46T  38.1T  7.90T        -         -     3%    82%   1.00  ONLINE
prd-caw1-stor-silver-hdd  240T   189T  51.0T        -         -    18%    78%   1.00  ONLINE
prd-caw1-vm-gold-nvme      92T  71.3T  20.7T        -         -     2%    77%   1.00  ONLINE
prd-use1-db-gold-nvme      46T  21.4T  24.6T        -         -     1%    46%   1.00  ONLINE
stg-caw1-db-silver-ssd    7.68T  3.1T  4.58T        -         -     1%    40%   1.00  ONLINE

Six pools, six lines. You see every environment, every region, every role, every SLA tier, every media class — without opening any documentation. The database team's gold NVMe pool in CA-West-1 is at 82% capacity. You know to order disks. No Grafana required to see that.

Pool names are permanent — you cannot rename a pool without destroying and recreating it. Get the naming right the first time. The convention above encodes environment, location, purpose, SLA, and storage class into the name. A new operator reads prd-caw1-db-gold-nvme and knows exactly what it is without any documentation. A pool named tank or data or storage1 tells you nothing. When that pool is at 82% capacity and you need to explain urgency to a manager at 11 PM, prd-caw1-db-gold-nvme carries the urgency itself. tank does not.

3. VDEV Naming and Disk Identification

The most dangerous thing you can do with ZFS is create a pool using raw device names like /dev/sda, /dev/sdb, /dev/nvme0n1. Device names are ephemeral. They are assigned at boot time by the kernel based on discovery order. Add a USB drive to a server and everything shifts. Replace a failing drive and the replacement gets a different device name. The pool continues to function, but your mental model of which physical disk is which is now wrong.

Use /dev/disk/by-id/ for persistent identification

The /dev/disk/by-id/ path is stable across reboots. It is derived from the device's serial number, which is burned into the hardware at manufacture:

# Never do this — device name can change between reboots
zpool create tank sda sdb sdc sdd

# Always do this — stable across reboots and replacements
zpool create prd-caw1-db-gold-nvme \
  draid2:10d:2c:128s \
  /dev/disk/by-id/nvme-Samsung_PM9A3_S6ZUNX0R123456A \
  /dev/disk/by-id/nvme-Samsung_PM9A3_S6ZUNX0R789012B \
  /dev/disk/by-id/nvme-Samsung_PM9A3_S6ZUNX0R345678C \
  ... (all 12 drives)

# Verify what's in the pool after creation
zpool status -v prd-caw1-db-gold-nvme

VDEV labels: use slot identifiers

OpenZFS allows you to assign friendly names to vdevs using udev rules. Map physical enclosure slots to names that match your label format, so zpool status output reads in terms of slots — the same identifiers that are on the physical labels:

# /etc/udev/rules.d/99-zfs-slots.rules
# Maps NVMe enclosure slot WWNs to slot names
# Generate these by running: ls -la /dev/disk/by-path/ | grep nvme

KERNEL=="nvme*", SUBSYSTEM=="block", \
  ENV{ID_PATH}=="pci-0000:01:00.0-nvme-1", \
  SYMLINK+="disk/by-slot/slot01"

KERNEL=="nvme*", SUBSYSTEM=="block", \
  ENV{ID_PATH}=="pci-0000:02:00.0-nvme-1", \
  SYMLINK+="disk/by-slot/slot02"

KERNEL=="nvme*", SUBSYSTEM=="block", \
  ENV{ID_PATH}=="pci-0000:03:00.0-nvme-1", \
  SYMLINK+="disk/by-slot/slot03"

# ... continue for all slots

# After creating rules, reload udev
udevadm control --reload-rules
udevadm trigger

# Create pool using slot names — now zpool status shows slot01, slot02
zpool create prd-caw1-db-gold-nvme \
  draid2:10d:2c:128s \
  /dev/disk/by-slot/slot01 \
  /dev/disk/by-slot/slot02 \
  /dev/disk/by-slot/slot03 \
  ...

With slot-based udev rules in place, zpool status reports:

$ zpool status prd-caw1-db-gold-nvme
  pool: prd-caw1-db-gold-nvme
 state: ONLINE
config:
        NAME                    STATE  READ WRITE CKSUM
        prd-caw1-db-gold-nvme   ONLINE    0     0     0
          draid2:10d:2c:128s    ONLINE    0     0     0
            slot01              ONLINE    0     0     0
            slot02              ONLINE    0     0     0
            slot03              FAULTED   5     0     0   too many errors
            slot04              ONLINE    0     0     0
            ...

slot03 is the failed drive. The physical label on slot 3 in chassis CH01 tells you everything you need to know: the RMA URL, the reorder URL, the pool it belongs to, the warranty status. You do not need to open any tool other than zpool status.

Mapping physical slots: 24-disk JBOD example

#!/bin/bash
# map-jbod-slots.sh — print slot to device mapping for a JBOD enclosure
# Requires sg3-utils for enclosure management

# List all SES (SCSI Enclosure Services) devices
for enc in /sys/class/enclosure/*/; do
  encdev=$(basename "$enc")
  echo "Enclosure: $encdev"

  # List slots and their current disk
  for slot in "$enc"*/; do
    slotnum=$(basename "$slot")
    # Get the disk in this slot
    if [[ -L "$slot/device" ]]; then
      diskdev=$(ls "$slot/device/block/" 2>/dev/null | head -1)
      if [[ -n "$diskdev" ]]; then
        serial=$(cat /sys/block/"$diskdev"/device/serial 2>/dev/null || \
                 smartctl -i /dev/"$diskdev" | awk '/Serial/{print $NF}')
        echo "  slot$slotnum -> /dev/$diskdev  serial: $serial"
      else
        echo "  slot$slotnum -> empty"
      fi
    fi
  done
done
A pool created with /dev/sda through /dev/sdx is a ticking time bomb. Plug in a USB drive during a maintenance window and /dev/sda may shift. Replace a drive under pressure at 3 AM and the mapping changes. The pool continues to work — OpenZFS uses its own internal VDEV GUIDs — but the moment you need to correlate a fault to a physical drive, you are doing detective work instead of executing a procedure. Use /dev/disk/by-id/ always. Add udev slot rules for large JBODs. The 20 minutes spent writing udev rules saves hours of correlation work the first time a drive fails.

4. ZFS Custom Properties — Infrastructure Tags

OpenZFS allows you to set arbitrary key-value metadata on any pool, dataset, or volume. These are called user properties or custom properties. They are stored inside ZFS itself — no separate database, no API, no sync service. They survive snapshots, clones, and replication. A dataset tagged in Vancouver arrives at your DR site in Frankfurt with every tag intact.

The namespace convention

Custom properties must be namespaced with a colon. The convention used throughout kldload is com.kldload:{key}. This prevents collisions with OpenZFS built-in properties and with other tools that set custom properties:

# Setting a custom property
zfs set com.kldload:region=CA-WEST-1 prd-caw1-db-gold-nvme

# Setting multiple properties at once
zfs set \
  com.kldload:region=CA-WEST-1 \
  com.kldload:tier=production \
  com.kldload:app=postgres \
  com.kldload:owner=team-database \
  com.kldload:sla=gold \
  com.kldload:backup-policy=hourly \
  com.kldload:dr-target=dr-euw1-bak-silver-hdd \
  com.kldload:cost-center=eng-001 \
  prd-caw1-db-gold-nvme/postgres/main

# Reading a single property
zfs get com.kldload:sla prd-caw1-db-gold-nvme/postgres/main

# Reading all custom properties on a dataset
zfs get -r -s local all prd-caw1-db-gold-nvme/postgres/main | grep com.kldload

# Reading a specific property recursively across a pool
zfs get -r -s local com.kldload:tier prd-caw1-db-gold-nvme

Standard tag library

com.kldload:region

Geographic zone: CA-WEST-1, US-EAST-1, EU-WEST-1. Matches WireGuard mesh region codes and pool naming convention. Drives replication topology.

com.kldload:tier

Environment tier: production, staging, development, testing, dr. Drives alert sensitivity and change control gates.

com.kldload:app

Application name: postgres, nginx, redis, kafka, prometheus. Drives workload-specific tuning and monitoring dashboards.

com.kldload:owner

Team responsible: team-database, team-platform, team-security. Drives quota allocation and alert routing — incidents page the owner.

com.kldload:sla

SLA class: gold, silver, bronze. Drives scrub frequency, snapshot retention, and on-call urgency. Gold pages primary and secondary simultaneously.

com.kldload:backup-policy

Snapshot schedule: 15min, hourly, daily, weekly, none. This tag IS the backup configuration. Sanoid reads it. No config file to maintain.

com.kldload:dr-target

Replication destination: dr-euw1-bak-silver-hdd, dr-aps1-db-gold-nvme, none. Syncoid reads this. The tag IS the replication topology.

com.kldload:cost-center

Cost center code: eng-001, ops-002, fin-003. Drives capacity chargebacks. Total used space per cost center from a single query.

Properties survive replication

This is the property that makes ZFS tagging more powerful than any external tag system. When you zfs send | zfs receive a dataset to another host, all custom properties travel with it:

# Replicate with all properties intact
syncoid --sendoptions="-p" \
  prd-caw1-db-gold-nvme/postgres/main \
  dr-euw1-bak-silver-hdd/postgres/main

# On the DR host, verify tags arrived
zfs get com.kldload:dr-target dr-euw1-bak-silver-hdd/postgres/main
NAME                                       PROPERTY               VALUE                     SOURCE
dr-euw1-bak-silver-hdd/postgres/main  com.kldload:dr-target  dr-euw1-bak-silver-hdd    received

The SOURCE column shows received — the tag was set on the source and transmitted via zfs send. No sync job, no separate tagging step, no tag drift between production and DR.

ZFS custom properties are the equivalent of AWS resource tags, Azure labels, or GCP labels — but they live inside the filesystem, not in a separate API. They survive snapshots, clones, and replication. You tag a dataset, replicate it to DR, and the DR copy has the same tags. No sync service, no tag drift, no API to maintain. Cloud providers charge you to query your own tags at scale. ZFS gives you the same capability with a grep. The model is also strictly superior for disaster recovery: when you restore from backup, the tags come back too. You do not have to re-tag a restored dataset and hope you remembered everything.

5. Tag-Based Operations — The Real Power

Tags without automation are bureaucracy. Tags with automation are policy. Every tag applied to a dataset becomes a selector for every automated operation in your infrastructure. Adding a tag to a new dataset automatically enrolls it in backup, replication, monitoring, and quota enforcement — with no configuration file to edit.

Replicate by tag: everything tagged production to DR

#!/bin/bash
# replicate-by-tag.sh — replicate all datasets tagged for a given dr-target
# Usage: replicate-by-tag.sh prd-caw1-db-gold-nvme dr-host.euw1.internal

POOL="$1"
DR_HOST="$2"

# Find all datasets with a dr-target set
zfs get -r -H -o name,value com.kldload:dr-target "$POOL" | \
  grep -v "^-" | \
  grep -v "none$" | \
  while IFS=$'\t' read -r dataset target; do
    DR_POOL="${target%%/*}"
    DR_PATH="${dataset#*/}"
    echo "Replicating $dataset -> $DR_HOST:$target"
    syncoid \
      --sendoptions="-p" \
      --no-sync-snap \
      "$dataset" \
      "${DR_HOST}:${target}"
  done

Capacity planning by tag: storage usage per team

#!/bin/bash
# capacity-by-owner.sh — total used space per cost-center tag
# Output: cost-center, used-bytes, dataset-count

echo "COST-CENTER       USED     DATASETS"
echo "----------------  -------  --------"

for pool in $(zpool list -H -o name); do
  zfs get -r -H -o name,value com.kldload:cost-center "$pool" | \
    grep -v "^-" | \
    while IFS=$'\t' read -r dataset costcenter; do
      used=$(zfs get -H -o value used "$dataset")
      echo "$costcenter $used $dataset"
    done
done | sort | awk '
{
  cc[$1] += 1
  used[$1] = $2   # last value (imprecise for display only)
}
END {
  for (c in cc) printf "%-16s  %-7s  %d\n", c, used[c], cc[c]
}
' | sort

Snapshot by tag: gold SLA datasets every 15 minutes

#!/bin/bash
# snapshot-by-sla.sh — snapshot all datasets matching a given SLA tag
# Run from cron: */15 * * * * /usr/local/bin/snapshot-by-sla.sh gold

SLA="${1:-gold}"
SNAP_NAME="$(date +%Y%m%d-%H%M)"

for pool in $(zpool list -H -o name); do
  zfs get -r -H -o name,value com.kldload:sla "$pool" | \
    grep -v "^-" | \
    awk -F'\t' -v sla="$SLA" '$2 == sla {print $1}' | \
    while read -r dataset; do
      zfs snapshot "${dataset}@auto-${SNAP_NAME}"
      echo "Snapped: ${dataset}@auto-${SNAP_NAME}"
    done
done

Quota enforcement by tag

#!/bin/bash
# apply-quotas.sh — apply quotas from a policy file keyed by owner tag
# Policy file format: owner  quota
# Example: team-database  2T

POLICY_FILE="/etc/kldload/quota-policy.conf"

for pool in $(zpool list -H -o name); do
  zfs get -r -H -o name,value com.kldload:owner "$pool" | \
    grep -v "^-" | \
    while IFS=$'\t' read -r dataset owner; do
      quota=$(awk -v o="$owner" '$1 == o {print $2}' "$POLICY_FILE")
      if [[ -n "$quota" ]]; then
        zfs set quota="$quota" "$dataset"
        echo "Set quota $quota on $dataset (owner: $owner)"
      fi
    done
done

The inventory report: entire infrastructure from ZFS properties

#!/bin/bash
# inventory-report.sh — JSON inventory of all tagged datasets across all pools
# Output: one JSON object per dataset, suitable for jq/CMDB import

echo "["
FIRST=1
for pool in $(zpool list -H -o name); do
  zfs list -r -H -o name,used,avail,refer "$pool" | \
  while IFS=$'\t' read -r name used avail refer; do
    # Gather all com.kldload: properties for this dataset
    props=$(zfs get -H -o property,value all "$name" 2>/dev/null | \
            grep "^com.kldload:" | \
            awk -F'\t' '{
              key=$1; val=$2
              sub(/^com.kldload:/, "", key)
              printf "    \"%s\": \"%s\",\n", key, val
            }')
    [[ -z "$props" ]] && continue  # skip untagged datasets

    [[ "$FIRST" -eq 0 ]] && echo ","
    FIRST=0

    cat <
This is where labeling stops being bureaucracy and starts being automation. A cron job that runs zfs get -r com.kldload:backup-policy rpool | grep hourly | awk '{print $1}' | xargs -I{} syncoid {} dr-host:{} replicates every dataset tagged for hourly backup. No configuration file. No list to maintain. Add the tag to a new dataset, it is automatically included in replication. Remove the tag, it is excluded. The tags ARE the policy. Every operational procedure in this section is triggered by a tag value. The tag is the single source of truth. There is no other place where "this dataset gets hourly backups" is recorded.

6. Dataset Hierarchy Conventions

OpenZFS datasets are cheap to create — there is no pre-allocation, no minimum size, no formatting step. Create as many as you need. The discipline is not minimizing the number of datasets — it is designing a hierarchy where properties, quotas, and snapshots are set at the right level so children inherit correctly.

The canonical hierarchy

# General pattern: pool / category / application / instance
prd-caw1-db-gold-nvme/
  postgres/           # category: postgres databases
    main/             # instance: primary database cluster
    replica/          # instance: replica cluster
    analytics/        # instance: analytics replica (read-heavy tuning)
  redis/              # category: redis instances
    cache/            # instance: application cache
    session/          # instance: session store
  mysql/              # category: mysql databases
    legacy/           # instance: legacy application

prd-caw1-vm-gold-nvme/
  vms/                # category: virtual machine disks
    web-1/            # instance: web server VM
    web-2/            # instance: web server VM
    app-1/            # instance: application server VM
  images/             # category: base OS images
    centos-9/         # instance: CentOS 9 base image
    debian-13/        # instance: Debian 13 base image

prd-caw1-stor-silver-hdd/
  media/              # category: media files
    raw/              # instance: raw ingest
    processed/        # instance: processed output
  backups/            # category: backup data
    postgres/         # instance: database backups
    config/           # instance: configuration backups
  logs/               # category: log archives
    nginx/            # instance: nginx logs
    app/              # instance: application logs

# Home NAS hierarchy
tank/
  home/               # category: home directories
    alice/            # instance: user alice
    bob/              # instance: user bob
  media/              # category: media library
    movies/           # instance: movies
    tv/               # instance: TV series
    music/            # instance: music
  downloads/          # category: download staging

Inheritance: set once, inherit everywhere

Set properties at the highest appropriate level. Children inherit and you can override at any lower level. This is how you avoid setting the same property 50 times:

# Set region and tier on the pool — all datasets inherit
zfs set com.kldload:region=CA-WEST-1 prd-caw1-db-gold-nvme
zfs set com.kldload:tier=production prd-caw1-db-gold-nvme
zfs set com.kldload:sla=gold prd-caw1-db-gold-nvme

# Set database-specific tags on the postgres subtree — all postgres datasets inherit
zfs set com.kldload:app=postgres prd-caw1-db-gold-nvme/postgres
zfs set com.kldload:owner=team-database prd-caw1-db-gold-nvme/postgres
zfs set com.kldload:backup-policy=hourly prd-caw1-db-gold-nvme/postgres
zfs set com.kldload:dr-target=dr-euw1-bak-silver-hdd/postgres prd-caw1-db-gold-nvme/postgres

# Override for a specific instance that needs different settings
zfs set com.kldload:backup-policy=daily prd-caw1-db-gold-nvme/postgres/analytics
zfs set com.kldload:dr-target=none prd-caw1-db-gold-nvme/postgres/analytics

# The analytics dataset has all inherited tags (region, tier, sla, owner)
# but its own backup-policy and dr-target
zfs get -r -s local,inherited com.kldload:backup-policy prd-caw1-db-gold-nvme/postgres
NAME                                              PROPERTY                  VALUE   SOURCE
prd-caw1-db-gold-nvme/postgres                   com.kldload:backup-policy  hourly  local
prd-caw1-db-gold-nvme/postgres/main              com.kldload:backup-policy  hourly  inherited
prd-caw1-db-gold-nvme/postgres/replica           com.kldload:backup-policy  hourly  inherited
prd-caw1-db-gold-nvme/postgres/analytics         com.kldload:backup-policy  daily   local

When to use datasets vs directories

The answer is almost always datasets. The cost of a dataset is a few kilobytes of metadata. The benefit is independent snapshots, independent quotas, independent compression settings, independent replication, and independent tagging. A directory inside a dataset cannot be snapshotted independently. A dataset can. If in doubt, create a dataset.

The exceptions: files that change together and always need to be snapshotted together (the WAL directory and data directory of a database should be in the same dataset so snapshots are consistent), and temporary data that should explicitly not be snapshotted (put it in a directory under a com.sun:auto-snapshot=false dataset).

Workload-specific hierarchies

# KVM host: one dataset per VM, one volume per virtual disk
prd-caw1-vm-gold-nvme/vms/web-1/          # dataset: VM config, logs
prd-caw1-vm-gold-nvme/vms/web-1/disk0     # zvol: primary virtual disk (20G)
prd-caw1-vm-gold-nvme/vms/web-1/disk1     # zvol: data virtual disk (100G)

# Kubernetes cluster: one dataset per namespace
prd-caw1-vm-gold-nvme/k8s/
  default/      # default namespace PVCs
  monitoring/   # Prometheus/Grafana PVCs
  databases/    # Database PVCs

# PostgreSQL: data and WAL in separate datasets (different recordsize)
prd-caw1-db-gold-nvme/postgres/main/
  data/         # recordsize=8k (matches PostgreSQL page size)
  wal/          # recordsize=32k (matches WAL segment size)
  temp/         # no snapshots, no backup

# NAS: shares at the dataset level, not the directory level
tank/shares/
  engineering/  # share: Engineering team files
  finance/      # share: Finance team files (separate quota, separate encryption key)
  public/       # share: Public read-only content

7. Fleet Inventory Automation

The inventory is not a spreadsheet. It is not a wiki. It is a query against ZFS properties, SMART data, and pool status — run on demand, always current, always accurate. The following scripts build a complete fleet inventory from live system data.

Complete fleet inventory script

#!/bin/bash
# kldload-inventory — complete fleet inventory from ZFS and SMART data
# Output: JSON to stdout, suitable for CMDB import, Prometheus push, or HTML report

HOSTNAME=$(hostname -f)
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)

# --- Pool inventory ---
pool_inventory() {
  for pool in $(zpool list -H -o name); do
    health=$(zpool list -H -o health "$pool")
    size=$(zpool list -H -o size "$pool")
    alloc=$(zpool list -H -o alloc "$pool")
    free=$(zpool list -H -o free "$pool")
    cap=$(zpool list -H -o cap "$pool")
    frag=$(zpool list -H -o frag "$pool")

    # Gather all custom tags for this pool
    tags=$(zfs get -H -o property,value all "$pool" | \
           grep "^com.kldload:" | \
           awk -F'\t' '{
             key=$1; val=$2
             sub(/^com.kldload:/, "", key)
             printf "      \"%s\": \"%s\",\n", key, val
           }')

    cat </dev/null)
    [[ -z "$smart" ]] && continue

    model=$(echo "$smart" | jq -r '.model_name // "unknown"')
    serial=$(echo "$smart" | jq -r '.serial_number // "unknown"')
    firmware=$(echo "$smart" | jq -r '.firmware_version // "unknown"')
    cap_bytes=$(echo "$smart" | jq -r '.user_capacity.bytes // 0')
    cap_tb=$(echo "$cap_bytes" | awk '{printf "%.2f", $1/1e12}')
    hours=$(echo "$smart" | jq -r '.power_on_time.hours // 0')
    temp=$(echo "$smart" | jq -r '.temperature.current // 0')
    health=$(echo "$smart" | jq -r '.smart_status.passed // false')

    # Which pool is this disk in?
    devname=$(basename "$realdev")
    pool=$(zpool status 2>/dev/null | awk -v d="$devname" '
      /^  pool:/ { pool=$2 }
      $0 ~ d { print pool; exit }
    ')

    cat <

Prometheus metrics from ZFS properties

#!/bin/bash
# zfs-property-exporter — expose ZFS custom properties as Prometheus metrics
# Run from a systemd timer, output to node_exporter textfile directory

OUTFILE="/var/lib/node_exporter/textfile_collector/zfs_properties.prom"
TMPFILE="${OUTFILE}.tmp"

{
  echo "# HELP zfs_dataset_used_bytes ZFS dataset used bytes"
  echo "# TYPE zfs_dataset_used_bytes gauge"

  echo "# HELP zfs_dataset_available_bytes ZFS dataset available bytes"
  echo "# TYPE zfs_dataset_available_bytes gauge"

  echo "# HELP zfs_pool_capacity_percent ZFS pool capacity percentage"
  echo "# TYPE zfs_pool_capacity_percent gauge"

  for pool in $(zpool list -H -o name); do
    cap=$(zpool list -H -o cap "$pool" | tr -d '%')
    health=$(zpool list -H -o health "$pool")

    # Gather tags for labels
    region=$(zfs get -H -o value com.kldload:region "$pool" 2>/dev/null || echo "unknown")
    tier=$(zfs get -H -o value com.kldload:tier "$pool" 2>/dev/null || echo "unknown")
    sla=$(zfs get -H -o value com.kldload:sla "$pool" 2>/dev/null || echo "unknown")

    echo "zfs_pool_capacity_percent{pool=\"$pool\",region=\"$region\",tier=\"$tier\",sla=\"$sla\",health=\"$health\"} $cap"

    # Per-dataset metrics
    zfs list -r -H -o name,used,avail "$pool" | \
    while IFS=$'\t' read -r name used avail; do
      # Convert used/avail to bytes (handle K, M, G, T suffixes)
      used_bytes=$(numfmt --from=iec "$used" 2>/dev/null || echo 0)
      avail_bytes=$(numfmt --from=iec "$avail" 2>/dev/null || echo 0)

      owner=$(zfs get -H -o value com.kldload:owner "$name" 2>/dev/null || echo "unknown")
      app=$(zfs get -H -o value com.kldload:app "$name" 2>/dev/null || echo "unknown")
      cc=$(zfs get -H -o value com.kldload:cost-center "$name" 2>/dev/null || echo "unknown")

      labels="dataset=\"$name\",region=\"$region\",tier=\"$tier\",owner=\"$owner\",app=\"$app\",cost_center=\"$cc\""
      echo "zfs_dataset_used_bytes{$labels} $used_bytes"
      echo "zfs_dataset_available_bytes{$labels} $avail_bytes"
    done
  done
} > "$TMPFILE" && mv "$TMPFILE" "$OUTFILE"

Warranty and lifecycle alerts

#!/bin/bash
# warranty-check.sh — alert on drives approaching warranty expiry
# Read asset metadata from /etc/kldload/assets/*.json
# Run weekly from cron

WARN_DAYS=90   # alert 90 days before expiry
TODAY=$(date +%s)

for asset_file in /etc/kldload/assets/*.json; do
  [[ -f "$asset_file" ]] || continue

  asset_id=$(jq -r '.asset_id' "$asset_file")
  serial=$(jq -r '.serial' "$asset_file")
  warranty=$(jq -r '.lifecycle.warranty_expiry' "$asset_file")
  supplier=$(jq -r '.lifecycle.supplier' "$asset_file")
  rma_url=$(jq -r '.lifecycle.rma_url // "N/A"' "$asset_file")

  [[ "$warranty" == "null" || "$warranty" == "" ]] && continue

  warranty_epoch=$(date -d "$warranty" +%s 2>/dev/null) || continue
  days_left=$(( (warranty_epoch - TODAY) / 86400 ))

  if [[ "$days_left" -lt 0 ]]; then
    echo "EXPIRED  $asset_id  serial=$serial  expired=$((-days_left))d ago  supplier=$supplier  rma=$rma_url"
  elif [[ "$days_left" -lt "$WARN_DAYS" ]]; then
    echo "WARNING  $asset_id  serial=$serial  expires=${days_left}d  supplier=$supplier  rma=$rma_url"
  fi
done
The goal is zero-documentation infrastructure. The labels ARE the documentation. The ZFS properties ARE the inventory. The naming convention IS the architecture diagram. If you need to open a wiki to understand your infrastructure, the labeling is incomplete. The "walk up to any rack" test: can a new operator — someone who has never seen this infrastructure — identify every disk in 60 seconds? If yes, your labeling is correct. If they have to log into a server, open a spreadsheet, or call someone, something is missing from the label.

8. Disk Lifecycle Management

Disks follow a predictable lifecycle: procurement, installation, monitoring, replacement, and decommission. Every phase has a labeling and inventory step. Missing any step means the next person to touch the drive is missing information.

Phase 1: Procurement

#!/bin/bash
# new-asset.sh — register a new disk asset at procurement time
# Usage: new-asset.sh --region CAW1 --model "Samsung PM9A3" --capacity 3.84T \
#                     --serial S6ZUNX0R123456A --supplier "CDW Canada" \
#                     --warranty 2028-02-12 --rma https://cdw.ca/rma/...

# Parse arguments
while [[ "$#" -gt 0 ]]; do
  case $1 in
    --region)    REGION="$2";    shift ;;
    --model)     MODEL="$2";     shift ;;
    --capacity)  CAPACITY="$2";  shift ;;
    --serial)    SERIAL="$2";    shift ;;
    --supplier)  SUPPLIER="$2";  shift ;;
    --warranty)  WARRANTY="$2";  shift ;;
    --rma)       RMA_URL="$2";   shift ;;
    --reorder)   REORDER="$2";   shift ;;
  esac
  shift
done

# Generate asset ID: next sequential number for this region+class
SEQ=$(ls /etc/kldload/assets/UB-DSK-${REGION}-*.json 2>/dev/null | \
      grep -oP '\d+(?=\.json)' | sort -n | tail -1)
SEQ=$(( ${SEQ:-0} + 1 ))
ASSET_ID="UB-DSK-${REGION}-$(printf '%05d' $SEQ)"

# Write asset record
mkdir -p /etc/kldload/assets
cat > "/etc/kldload/assets/${ASSET_ID}.json" <

Phase 2: Installation

#!/bin/bash
# install-asset.sh — record disk installation into a slot
# Usage: install-asset.sh UB-DSK-CAW1-88322 /dev/disk/by-id/nvme-Samsung_PM9A3_...

ASSET_ID="$1"
DISK="$2"
ASSET_FILE="/etc/kldload/assets/${ASSET_ID}.json"

[[ -f "$ASSET_FILE" ]] || { echo "Asset not found: $ASSET_ID"; exit 1; }
[[ -e "$DISK" ]]       || { echo "Disk not found: $DISK"; exit 1; }

# Pull SMART data
SMART=$(smartctl -j -a "$(realpath "$DISK")")
SERIAL=$(echo "$SMART" | jq -r '.serial_number')
FIRMWARE=$(echo "$SMART" | jq -r '.firmware_version')

# Baseline SMART attributes
smartctl -j -a "$(realpath "$DISK")" > "/etc/kldload/smart-baseline/${ASSET_ID}.json"

# Update asset record with installation details
jq --arg installed "$(date +%Y-%m-%d)" \
   --arg firmware "$FIRMWARE" \
   --arg serial "$SERIAL" \
   --arg dc "${DC}" \
   --arg building "${BUILDING}" \
   --arg row "${ROW}" \
   --arg rack "${RACK}" \
   --arg chassis "${CHASSIS}" \
   --arg slot "${SLOT}" \
   '.lifecycle.installed = $installed |
    .hardware.firmware = $firmware |
    .serial = $serial |
    .status = "installed" |
    .location.datacenter = $dc |
    .location.building = $building |
    .location.row = $row |
    .location.rack = $rack |
    .location.chassis = $chassis |
    .location.slot = $slot' \
   "$ASSET_FILE" > "${ASSET_FILE}.tmp" && mv "${ASSET_FILE}.tmp" "$ASSET_FILE"

echo "Asset $ASSET_ID installed at ${DC}/${BUILDING}/${ROW}/${RACK}/${CHASSIS}/${SLOT}"
echo "SMART baseline saved to /etc/kldload/smart-baseline/${ASSET_ID}.json"

Phase 3: Monitoring

#!/bin/bash
# smart-check.sh — compare current SMART data against baseline
# Run daily from cron

BASELINE_DIR="/etc/kldload/smart-baseline"
ALERT_THRESHOLD=10   # alert if normalized value drops more than 10 points

for baseline_file in "$BASELINE_DIR"/*.json; do
  [[ -f "$baseline_file" ]] || continue
  asset_id=$(basename "$baseline_file" .json)

  # Find the disk by asset ID
  asset_file="/etc/kldload/assets/${asset_id}.json"
  [[ -f "$asset_file" ]] || continue
  serial=$(jq -r '.serial' "$asset_file")

  # Find current device by serial
  realdev=$(smartctl --scan-open | while read -r dev opts; do
    s=$(smartctl -i "$dev" 2>/dev/null | awk '/Serial/{print $NF}')
    [[ "$s" == "$serial" ]] && echo "$dev" && break
  done)
  [[ -z "$realdev" ]] && continue

  # Check SMART health
  passed=$(smartctl -j -H "$realdev" | jq -r '.smart_status.passed')
  if [[ "$passed" != "true" ]]; then
    echo "SMART FAIL: $asset_id  serial=$serial  device=$realdev"
  fi

  # Check critical attributes (Reallocated Sectors, Pending Sectors)
  smartctl -j -A "$realdev" | jq -r '
    .ata_smart_attributes.table[]? |
    select(.id == 5 or .id == 197 or .id == 198) |
    "\(.name) raw=\(.raw.value) normalized=\(.value)"
  ' | while read -r line; do
    raw=$(echo "$line" | grep -oP 'raw=\K\d+')
    [[ "$raw" -gt 0 ]] && echo "WARNING: $asset_id $line"
  done
done

Phase 4: Replacement

#!/bin/bash
# replace-disk.sh — guided disk replacement procedure
# Usage: replace-disk.sh UB-DSK-CAW1-88322

ASSET_ID="$1"
ASSET_FILE="/etc/kldload/assets/${ASSET_ID}.json"

[[ -f "$ASSET_FILE" ]] || { echo "Asset not found: $ASSET_ID"; exit 1; }

POOL=$(jq -r '.zfs.pool' "$ASSET_FILE")
VDEV=$(jq -r '.zfs.vdev' "$ASSET_FILE")
SLOT=$(jq -r '.location.slot' "$ASSET_FILE")
RMA_URL=$(jq -r '.lifecycle.rma_url' "$ASSET_FILE")
REORDER=$(jq -r '.lifecycle.reorder_url // "N/A"' "$ASSET_FILE")

echo "=== Disk Replacement Procedure ==="
echo "Asset:    $ASSET_ID"
echo "Pool:     $POOL"
echo "VDEV:     $VDEV"
echo "Slot:     $SLOT"
echo ""
echo "Step 1: Start RMA and order replacement"
echo "  RMA URL:     $RMA_URL"
echo "  Reorder URL: $REORDER"
echo ""
echo "Step 2: Wait for replacement to arrive and get its asset ID"
echo ""
echo "Step 3: Offline the failed vdev (if not already FAULTED)"
echo "  zpool offline $POOL $VDEV"
echo ""
echo "Step 4: Physically remove drive from $SLOT"
echo ""
echo "Step 5: Install replacement in $SLOT"
echo ""
echo "Step 6: Find new disk device path"
echo "  ls -la /dev/disk/by-slot/$SLOT"
echo ""
echo "Step 7: Replace in ZFS"
echo "  zpool replace $POOL /dev/disk/by-slot/$SLOT /dev/disk/by-slot/$SLOT"
echo ""
echo "Step 8: Monitor resilver"
echo "  watch zpool status $POOL"
echo ""
echo "Step 9: Update asset record"
echo "  install-asset.sh NEW-ASSET-ID /dev/disk/by-slot/$SLOT"

Phase 5: Decommission

#!/bin/bash
# decommission-asset.sh — remove a disk from service
# Usage: decommission-asset.sh UB-DSK-CAW1-88322

ASSET_ID="$1"
ASSET_FILE="/etc/kldload/assets/${ASSET_ID}.json"

[[ -f "$ASSET_FILE" ]] || { echo "Asset not found: $ASSET_ID"; exit 1; }

POOL=$(jq -r '.zfs.pool' "$ASSET_FILE")
VDEV=$(jq -r '.zfs.vdev' "$ASSET_FILE")

echo "Decommissioning $ASSET_ID from pool $POOL vdev $VDEV"
echo ""
echo "Manual steps required before this script proceeds:"
echo "  1. Remove from ZFS pool: zpool remove $POOL $VDEV"
echo "  2. Physical removal from rack"
echo "  3. Secure erase: nvme format --ses=1 /dev/..."
echo ""
read -p "Confirm decommission of $ASSET_ID? Type 'yes' to proceed: " CONFIRM
[[ "$CONFIRM" != "yes" ]] && { echo "Aborted."; exit 0; }

# Update status in asset record
jq --arg date "$(date +%Y-%m-%d)" \
   '.status = "decommissioned" | .lifecycle.decommissioned = $date' \
   "$ASSET_FILE" > "${ASSET_FILE}.tmp" && mv "${ASSET_FILE}.tmp" "$ASSET_FILE"

# Archive the asset record
mv "$ASSET_FILE" "/etc/kldload/assets/archive/${ASSET_ID}.json"

echo "Asset $ASSET_ID archived to /etc/kldload/assets/archive/"

9. Multi-Site Labeling

Multi-site deployments require labeling that is consistent across sites. The same conventions must apply everywhere, and the region codes in labels, pool names, and ZFS properties must all match. If your WireGuard mesh uses CA-WEST-1, your pool names use caw1, and your ZFS properties use CA-WEST-1, you can correlate across all three layers without a lookup table.

Region code mapping

# Region codes — consistent across all labeling layers
#
# Full name       Pool prefix  Property value  WireGuard zone
# --------------- ----------- --------------- ---------------
# CA-West-1       caw1        CA-WEST-1       ca-west-1
# US-East-1       use1        US-EAST-1       us-east-1
# US-West-2       usw2        US-WEST-2       us-west-2
# EU-West-1       euw1        EU-WEST-1       eu-west-1
# EU-Central-1    euc1        EU-CENTRAL-1    eu-central-1
# AP-South-1      aps1        AP-SOUTH-1      ap-south-1
# AP-East-1       ape1        AP-EAST-1       ap-east-1

Three-site example: production + DR + dev

# CA-West-1 — primary production site
prd-caw1-db-gold-nvme          # production database pool
prd-caw1-vm-gold-nvme          # production VM pool
prd-caw1-stor-silver-hdd       # production object storage pool

# EU-West-1 — DR site (replication destination)
dr-euw1-db-gold-nvme           # DR database pool (receives from prd-caw1-db-gold-nvme)
dr-euw1-vm-silver-ssd          # DR VM pool

# US-East-1 — development site
dev-use1-db-bronze-ssd         # dev database pool
dev-use1-vm-bronze-ssd         # dev VM pool

# Replication topology is encoded in ZFS properties — no separate config
# prd-caw1-db-gold-nvme/postgres:  com.kldload:dr-target = dr-euw1-db-gold-nvme/postgres
# prd-caw1-vm-gold-nvme/vms:       com.kldload:dr-target = dr-euw1-vm-silver-ssd/vms

Global inventory across all sites

#!/bin/bash
# global-inventory.sh — collect inventory from all sites via SSH

SITES=(
  "caw1-stor-01.prd.caw1.internal"
  "euw1-stor-01.dr.euw1.internal"
  "use1-stor-01.dev.use1.internal"
)

for host in "${SITES[@]}"; do
  echo "=== $host ==="
  ssh "$host" '
    zpool list -H -o name,health,cap,alloc,free | \
    awk "{printf \"%-30s %-8s %5s %10s %10s\n\", \$1, \$2, \$3, \$4, \$5}"
  '
  echo ""
done

Replication topology from tags

#!/bin/bash
# show-replication-topology.sh — visualize DR targets from ZFS properties

echo "DATASET -> DR TARGET"
echo "--------------------------------------"

for pool in $(zpool list -H -o name); do
  zfs get -r -H -o name,value com.kldload:dr-target "$pool" | \
    grep -v "^-" | \
    grep -v "none$" | \
    awk -F'\t' '{printf "%-50s -> %s\n", $1, $2}'
done

10. Capacity Planning from Labels

Cloud providers give you cost allocation by tag. OpenZFS gives you capacity allocation by tag — same concept, filesystem-native. No billing API, no cost explorer. One query against ZFS properties gives you capacity per team, per region, per application, per SLA tier.

Aggregate capacity by tag

#!/bin/bash
# capacity-by-tag.sh — aggregate used/available space by any ZFS property
# Usage: capacity-by-tag.sh com.kldload:cost-center
#        capacity-by-tag.sh com.kldload:app
#        capacity-by-tag.sh com.kldload:owner

TAG="${1:-com.kldload:cost-center}"

echo "=== Capacity by $TAG ==="
printf "%-20s %10s %10s %10s\n" "VALUE" "USED" "AVAIL" "DATASETS"
echo "------------------------------------------------------------"

declare -A used_bytes avail_bytes count

for pool in $(zpool list -H -o name); do
  zfs get -r -H -o name,value "$TAG" "$pool" | \
  grep -v "^-" | \
  while IFS=$'\t' read -r dataset tagval; do
    [[ "$tagval" == "-" || "$tagval" == "" ]] && continue
    used=$(zfs get -H -o value used "$dataset")
    avail=$(zfs get -H -o value avail "$dataset")
    echo "$tagval $used $avail"
  done
done | awk '
{
  tag=$1; used=$2; avail=$3
  count[tag]++
  # Store last value for display (proper aggregation needs numfmt)
  last_used[tag]=used; last_avail[tag]=avail
}
END {
  for (t in count)
    printf "%-20s %10s %10s %10d\n", t, last_used[t], last_avail[t], count[t]
}
' | sort

Growth tracking for procurement planning

#!/bin/bash
# track-growth.sh — record daily used space per tag for growth forecasting
# Run from cron daily: 0 6 * * * /usr/local/bin/track-growth.sh >> /var/log/zfs-growth.log

DATE=$(date +%Y-%m-%d)
TAG="${1:-com.kldload:cost-center}"

for pool in $(zpool list -H -o name); do
  zfs get -r -H -o name,value "$TAG" "$pool" | \
  grep -v "^-" | \
  while IFS=$'\t' read -r dataset tagval; do
    [[ "$tagval" == "-" ]] && continue
    used=$(zfs get -H -o value used "$dataset")
    echo "$DATE $pool $dataset $tagval $used"
  done
done

# Analyze growth rate (requires at least 30 days of history)
# awk '/2026-03/ {used[$4]=$5} /2026-02/ {prev[$4]=$5} END {
#   for (t in used) printf "%s: now=%s prev=%s delta=%s\n", t, used[t], prev[t], used[t]-prev[t]
# }' /var/log/zfs-growth.log

Grafana dashboard from ZFS property metrics

With the Prometheus exporter from section 7 running, Grafana can display capacity by any tag combination. Key panels:

  • Used bytes by com.kldload:cost-center — chargeback view, one bar per team
  • Pool capacity % by com.kldload:sla — shows gold vs silver vs bronze pools
  • Used bytes by com.kldload:app — which applications consume the most storage
  • Dataset count by com.kldload:owner — how many datasets each team owns
  • Growth rate by com.kldload:region — which sites are growing fastest
  • Days to full by pool (calculated from current capacity % and 30-day growth rate)
# Example Prometheus query: used bytes by cost-center (for Grafana bar chart)
sum by (cost_center) (zfs_dataset_used_bytes{tier="production"})

# Example: capacity % by SLA tier
avg by (sla) (zfs_pool_capacity_percent)

# Example: alert when any gold pool exceeds 85%
zfs_pool_capacity_percent{sla="gold"} > 85
Cloud providers give you cost allocation by tag. OpenZFS gives you capacity allocation by tag — same concept, filesystem-native. No billing API, no cost explorer. Just zfs get -r com.kldload:cost-center rpool | aggregate. The compression ratio is particularly powerful for capacity planning: if your pool has 2.3x compression and you are adding a new workload, you know from the first 24 hours of data how much real space it will consume. The combination of ZFS properties and Prometheus means your capacity dashboard is always live — not a spreadsheet updated quarterly, but a real-time view of every byte in every dataset, attributed to the right team, region, and application.

11. The Labeling Checklist

Use this checklist for every infrastructure change. Every new asset, every new pool, every new dataset, every new site must pass every applicable item before it is considered complete.

New disk

  • Asset ID assigned at procurement, recorded in /etc/kldload/assets/
  • Physical label printed with all fields complete (location, ZFS, hardware, lifecycle)
  • QR code with JSON metadata on the label
  • Installed using /dev/disk/by-id/ path, not raw device name
  • udev slot rule created if applicable
  • Asset record updated with install date, slot, rack, chassis, datacenter
  • SMART baseline captured to /etc/kldload/smart-baseline/
  • ZFS vdev name matches physical slot label
  • ZFS custom properties set on pool and datasets
  • Drive appears in weekly warranty-check output

New pool

  • Name follows {env}-{region}-{role}-{tier}-{media} convention
  • Name reviewed — cannot be changed later without destroying and recreating
  • All standard tags set: region, tier, sla, owner, cost-center
  • Tags set at pool level so datasets inherit
  • Monitoring alerts configured: capacity > 80%, health != ONLINE, scrub errors
  • Scrub schedule configured in systemd timers or sanoid
  • Appears in zpool list with expected name and health
  • Prometheus metrics visible in Grafana within 5 minutes of creation

New dataset

  • Created in the correct hierarchy level (pool/category/application/instance)
  • Inherits appropriate tags from parent, or has explicit tags set
  • com.kldload:backup-policy set (explicitly or inherited)
  • com.kldload:dr-target set if replication is required
  • com.kldload:owner set for quota and alert routing
  • Workload-specific properties set: recordsize, compression, sync, atime
  • Quota set if owner has a capacity allocation
  • Appears in inventory script output with correct tags
  • Replication test: verify dataset replicates to DR target correctly

New site

  • Region code assigned, documented in region code mapping table
  • Region code consistent with WireGuard mesh zone name
  • Pool names follow convention with new region code
  • ZFS properties use consistent region value (US-EAST-1 not us-east-1 or USE1)
  • DR target tags on production datasets point to the new site's pools
  • Global inventory script includes new site's hosts
  • Warranty check script has access to new site's asset records
  • Prometheus scrapes include new site's node_exporter endpoints

Monthly audit

  • Every disk in every rack has a readable physical label
  • Every pool has all required custom tags set
  • Every dataset with data has com.kldload:backup-policy and com.kldload:owner
  • No datasets tagged dr-target != none are failing replication
  • No drives within 90 days of warranty expiry without replacement ordered
  • No drives with SMART warnings without investigation open
  • Inventory script output matches physical rack counts
  • Pool capacity % for gold pools all below 80%
  • New operators can pass the "walk up to any rack" test in 60 seconds
The checklist is not optional overhead. It is the mechanism that keeps the labeling system alive over time. Infrastructure accretes. Disks get added under pressure. Datasets get created for quick tests that become permanent. Pools get created in dev environments that get promoted to production. Without a monthly audit, the labeling system degrades: orphaned datasets with no tags, pools named something expedient at 2 AM that violates the convention, physical labels that fell off in a drive swap. The audit catches these before they become operational problems. A labeling system that requires 30 minutes monthly to audit is not bureaucracy — it is the price of always knowing exactly what you have.