| pick your distro, get ZFS on root
kldload — your platform, your way, free
Source

TLS & PKI Masterclass

This guide covers the full lifecycle of certificates in a self-managed infrastructure: public HTTPS via Let's Encrypt, internal PKI with step-ca, mutual TLS between services, database client certificates, Kubernetes cert-manager, and certificate rotation. By the end you will have every connection in your stack encrypted and every certificate renewed automatically — without paying a CA or touching a certificate manually ever again.

The premise: TLS is the encryption layer for everything that is not WireGuard. HTTPS, database connections, API calls, SMTP, gRPC, metrics scrapes — they all need certificates. Public services need public certificates. Internal services need internal certificates. This masterclass teaches you to run your own Certificate Authority and never pay for or manually manage a certificate again.

What this page covers: TLS fundamentals, Let's Encrypt for public services, step-ca for internal PKI, ACME automation, mutual TLS, database certificate configs, Kubernetes cert-manager, certificate rotation strategy, ZFS-backed CA key storage, and a troubleshooting reference — all grounded in the kldload stack.

Prerequisites: a running kldload system. The Kubernetes sections assume a cluster from the Kubernetes on KVM guide. Everything else works on any kldload node.

Most people use Let's Encrypt for public services and skip encryption for internal services. That means your database traffic, your API calls between microservices, your monitoring scrapes — all unencrypted on the LAN. An attacker who gets on your network sees everything. Internal PKI fixes this. The reason people skip it is that self-signed certificates are painful: browsers complain, curl fails, every service needs special config. A proper internal CA — one that your machines trust — makes internal TLS exactly as seamless as public HTTPS. step-ca is that CA. This guide shows you how to build it.

1. TLS Fundamentals

Before running a CA you need to understand what a certificate actually is and what TLS is actually doing. This section covers the mechanics clearly, without the math.

How the TLS handshake works

When your browser connects to a server over HTTPS, it runs a handshake before any application data flows. The handshake has three jobs: agree on cipher suites, authenticate the server (and optionally the client), and establish a shared symmetric key for the session. In TLS 1.3 — the version everything should be using — this handshake takes one round trip.

Step 1 — ClientHello

The client sends the TLS version it supports, a random nonce, and a list of cipher suites it can use. In TLS 1.3 it also sends key share material for the algorithms it expects the server to choose.

// "Here are the ciphers I speak. Here's my half of the key."

Step 2 — ServerHello + Certificate

The server picks a cipher suite, sends its half of the key exchange, and presents its certificate chain. The client validates the chain against its trust store — the set of CA root certificates baked into the OS or browser.

// "Here's my half of the key. Here's my certificate. Trust me."

Step 3 — Key derivation

Both sides derive the same symmetric session key from the key exchange material. In TLS 1.3 this uses ECDHE (Elliptic Curve Diffie-Hellman Ephemeral) — every session gets a fresh key, so past sessions stay private even if the server's private key later leaks.

// Forward secrecy: compromising tomorrow's key doesn't decrypt today's traffic.

Step 4 — Finished + application data

Both sides send a Finished message (a MAC over the entire handshake transcript), proving they derived the same key and that nothing was tampered with in transit. After that, all application data is encrypted with AES-GCM or ChaCha20-Poly1305.

// Everything from here is encrypted. The handshake took one round trip.

Certificate anatomy

A certificate is a signed data structure. It binds a public key to an identity and is signed by a CA that vouches for that binding. The CA's signature is what makes the certificate trusted — any entity with the corresponding CA root in its trust store can verify the signature and accept the binding.

Subject

The entity the certificate identifies — usually a Common Name (CN) like postgres.internal. The CN is largely legacy; modern TLS validation uses the Subject Alternative Name (SAN) extension instead.

// Subject = the name on the ID card

Subject Alternative Name (SAN)

The list of DNS names, IP addresses, and URIs the certificate is valid for. This is the field clients actually check. A cert for api.example.com with a SAN of *.example.com is valid for any subdomain.

// SAN = the "also known as" list. This is what browsers check.

Issuer and signature

The CA that signed the certificate. The signature is a cryptographic hash of the certificate body, encrypted with the CA's private key. Anyone with the CA's public key (from the root cert) can verify it.

// Issuer = the authority that stamped the ID. The stamp is cryptographically unforgeable.

Validity window

Not Before and Not After timestamps. A certificate is only valid within this window. TLS clients reject certificates outside the window — even by one second. Clock skew between systems is a real operational hazard.

// Like a passport: valid from issue date to expiry. Expired = rejected.

Key type and usage

The public key algorithm (RSA 2048/4096, ECDSA P-256/P-384) and what the key is allowed to do (Key Usage: digital signature, key encipherment; Extended Key Usage: server authentication, client authentication). A CA cert has the CA:TRUE basic constraint.

// Not every key does every job. Server certs authenticate servers. Client certs authenticate clients. CA certs sign other certs.

Serial number and SKI/AKI

Each certificate has a unique serial number within its CA. Subject Key Identifier (SKI) and Authority Key Identifier (AKI) link the certificate to its issuer's public key — these are how chain building finds the right intermediate.

// Serial = the ID card number. AKI = "issued by this office." Chain building follows AKI upward to the root.

The certificate chain

Trust is hierarchical. A root CA is self-signed — it vouches for itself. Browsers and operating systems ship a curated set of root CA certificates they trust unconditionally. Everything below a root is trusted transitively: if the root is trusted, and the root signed an intermediate, and the intermediate signed a leaf, then the leaf is trusted. This chain structure lets CAs operate without exposing their root key.

Root CA

Self-signed. Ships in OS/browser trust stores. For a public CA like Let's Encrypt, this is the ISRG Root X1 certificate. For your internal CA, this is the root you generate with step-ca. Kept offline when possible.

// Root = the vault. Never used for day-to-day signing. Its compromise is catastrophic.

Intermediate CA

Signed by the root. Used for day-to-day certificate issuance. If an intermediate is compromised it can be revoked without replacing the root — all trust stores just need to distrust that intermediate. step-ca generates one automatically.

// Intermediate = the branch office. Signs certs day-to-day. Replaceable without replacing the root.

Leaf certificate

The actual certificate presented by a server or client. Signed by the intermediate. Has a short validity period (90 days for Let's Encrypt, 24 hours for step-ca defaults). Cannot sign other certificates — CA:FALSE basic constraint.

// Leaf = your passport. Has your name, your photo, signed by an authority. Valid for a limited time.
You do not need to understand the math. You need to understand: a certificate proves identity, a CA is the entity that vouches for that identity, and trust chains mean your browser trusts Let's Encrypt because your OS ships their root cert. For internal PKI, you add your internal CA's root cert to your machines' trust stores, and then every internal certificate that your CA signs is trusted automatically — no exceptions, no browser warnings, no curl -k.

2. Let's Encrypt for Public Services

Let's Encrypt is a free, automated, publicly trusted CA. It issues 90-day certificates via the ACME protocol (Automated Certificate Management Environment). Every major web framework and server has ACME support. For any service with a public DNS record, Let's Encrypt is the correct answer.

Install certbot on kldload

# CentOS / RHEL / Rocky
dnf install -y certbot python3-certbot-nginx python3-certbot-dns-cloudflare

# Debian / Ubuntu
apt install -y certbot python3-certbot-nginx python3-certbot-dns-cloudflare

HTTP-01 challenge — public web server

The HTTP-01 challenge proves domain ownership by serving a token file at http://yourdomain.com/.well-known/acme-challenge/TOKEN. Let's Encrypt fetches it over HTTP and verifies the token. This requires port 80 to be open and reachable.

# Issue a certificate for a domain with nginx running
certbot --nginx -d example.com -d www.example.com \
  --email admin@example.com --agree-tos --non-interactive

# Certbot automatically edits /etc/nginx/sites-enabled/example.conf
# to add TLS configuration and renewal hooks

DNS-01 challenge — wildcard and internal names

The DNS-01 challenge proves domain ownership by adding a TXT record to your DNS zone. It does not require port 80. It is the only challenge type that can issue wildcard certificates (*.example.com). It works for services behind firewalls as long as the DNS provider has an API.

# Cloudflare DNS-01 — create a credentials file first
cat > /etc/letsencrypt/cloudflare.ini <<'EOF'
dns_cloudflare_api_token = YOUR_CLOUDFLARE_API_TOKEN
EOF
chmod 600 /etc/letsencrypt/cloudflare.ini

# Issue wildcard certificate
certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials /etc/letsencrypt/cloudflare.ini \
  -d "*.example.com" -d "example.com" \
  --email admin@example.com --agree-tos --non-interactive

# Certificate lands at:
# /etc/letsencrypt/live/example.com/fullchain.pem
# /etc/letsencrypt/live/example.com/privkey.pem

Automatic renewal with systemd

certbot installs a systemd timer on most distributions. Check it and verify it fires:

# Check the timer is active
systemctl status certbot.timer
systemctl list-timers certbot.timer

# Test renewal without actually renewing
certbot renew --dry-run

# Force renewal (certificates not yet near expiry — useful for testing)
certbot renew --force-renewal

# View existing certificates and expiry
certbot certificates

The timer runs twice daily and renews any certificate within 30 days of expiry. After renewal certbot runs the configured deploy hook — for nginx this is nginx -s reload. You can add your own deploy hooks in /etc/letsencrypt/renewal-hooks/deploy/.

# Example: reload multiple services after renewal
cat > /etc/letsencrypt/renewal-hooks/deploy/reload-services.sh <<'EOF'
#!/bin/bash
systemctl reload nginx
systemctl reload postfix
systemctl reload dovecot
EOF
chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-services.sh

Nginx HTTPS configuration example

server {
    listen 80;
    server_name example.com www.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    server_name example.com www.example.com;

    ssl_certificate     /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;

    # Modern TLS settings
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers off;
    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 1d;

    # HSTS — tell browsers to only use HTTPS for 1 year
    add_header Strict-Transport-Security "max-age=31536000" always;

    location / {
        proxy_pass http://127.0.0.1:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-Proto https;
    }
}
Let's Encrypt changed the internet — free, automated TLS for everyone. Before it, a certificate cost $100+/year and required manual renewal. Now every kldload web service gets HTTPS for free, automatically. But it only works for services with public DNS records. Your PostgreSQL server on 10.0.0.5 does not have a public DNS record. Your internal API gateway does not have a public DNS record. Those need an internal CA — which is what the next section builds.

3. step-ca — Your Own Certificate Authority

step-ca is an open-source online CA from Smallstep. It implements the same ACME protocol that Let's Encrypt uses, so any tool that speaks ACME — certbot, acme.sh, Caddy, Traefik — can use your internal CA without modification. It also has its own CLI (step) for issuing certificates directly and a provisioner model that supports ACME, JWK, OAuth, OIDC, x5c, and more.

Install step and step-ca on kldload

# Download the step CLI
curl -Lo /tmp/step.tar.gz \
  https://github.com/smallstep/cli/releases/latest/download/step_linux_amd64.tar.gz
tar -xzf /tmp/step.tar.gz -C /tmp
install -m 0755 /tmp/step_*/bin/step /usr/local/bin/step

# Download step-ca
curl -Lo /tmp/step-ca.tar.gz \
  https://github.com/smallstep/certificates/releases/latest/download/step-ca_linux_amd64.tar.gz
tar -xzf /tmp/step-ca.tar.gz -C /tmp
install -m 0755 /tmp/step-ca_*/bin/step-ca /usr/local/bin/step-ca

# Verify
step version
step-ca version

Initialize the CA

Initialization creates the root CA key and certificate, an intermediate key and certificate signed by the root, and an initial provisioner. Run this once on the machine that will host your CA. Store the root key offline after initialization.

# Create a dedicated user for the CA
useradd --system --create-home --shell /bin/false step

# Initialize the CA as the step user
sudo -u step step ca init \
  --name "kldload Internal CA" \
  --dns "ca.internal,ca.kldload.local,$(hostname -I | awk '{print $1}')" \
  --address ":9000" \
  --provisioner "admin@example.com" \
  --password-file /dev/stdin <<<"$(cat /etc/kldload/ca-password)"

# The init creates:
# ~/.step/certs/root_ca.crt    — root certificate (distribute to trust stores)
# ~/.step/certs/intermediate_ca.crt
# ~/.step/secrets/root_ca_key  — root private key (move offline)
# ~/.step/secrets/intermediate_ca_key
# ~/.step/config/ca.json       — CA configuration

Add an ACME provisioner

The ACME provisioner lets certbot, acme.sh, and any other ACME client issue certificates from your internal CA without any Smallstep-specific client code.

# Add ACME provisioner
sudo -u step step ca provisioner add acme --type ACME \
  --admin-provisioner admin@example.com \
  --admin-subject admin@example.com

# The ACME directory will be available at:
# https://ca.internal:9000/acme/acme/directory

Run step-ca as a systemd service

cat > /etc/systemd/system/step-ca.service <<'EOF'
[Unit]
Description=step-ca Certificate Authority
After=network.target
ConditionFileNotEmpty=/home/step/.step/config/ca.json

[Service]
User=step
Group=step
ExecStart=/usr/local/bin/step-ca \
  /home/step/.step/config/ca.json \
  --password-file /etc/kldload/ca-password
Restart=on-failure
RestartSec=5
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/home/step/.step

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now step-ca
systemctl status step-ca

Bootstrap trust on your machines

Bootstrapping installs your CA's root certificate into the system trust store. Run this once on every machine that needs to trust your internal CA.

# On each machine that should trust your internal CA
step ca bootstrap --ca-url https://ca.internal:9000 \
  --fingerprint $(step certificate fingerprint /home/step/.step/certs/root_ca.crt)

# This adds the root cert to /etc/ssl/certs/ and updates the system trust store
# After this: curl https://internal-service.internal just works — no -k flag needed

# Verify
step ca health --ca-url https://ca.internal:9000

Issue certificates from the CLI

# Issue a certificate for a service
step ca certificate postgres.internal postgres.crt postgres.key \
  --ca-url https://ca.internal:9000 \
  --san postgres.internal \
  --san 10.0.0.20 \
  --not-after 8760h   # 1 year; default is 24h

# Issue a short-lived cert (recommended)
step ca certificate api.internal api.crt api.key \
  --not-after 24h

# Renew before expiry
step ca renew api.crt api.key --force

# Inspect a certificate
step certificate inspect api.crt
step-ca is what Let's Encrypt would be if it ran on your server. Same ACME protocol, same automation, but for your internal network. Your PostgreSQL server gets a cert. Your WireGuard management API gets a cert. Your Prometheus scrape targets get certs. Your internal nginx reverse proxies get certs. Everything gets a cert, every certificate renews automatically, and none of it costs anything or involves a third party. The CA's root certificate lives on your infrastructure, signed by your key. You are the trust anchor.

4. ACME for Internal Services

Once step-ca is running with an ACME provisioner, any ACME-capable tool can issue internal certificates automatically. This means you can use the same tooling for internal services as you do for public services — certbot, acme.sh, Caddy, Traefik, and cert-manager all speak ACME natively.

certbot against your internal CA

# Tell certbot to use your internal CA's ACME directory
# Note: --server points at the ACME directory URL
# Note: --no-verify-ssl is needed only if certbot itself hasn't bootstrapped trust

certbot certonly \
  --standalone \
  --server https://ca.internal:9000/acme/acme/directory \
  -d api.internal \
  --email admin@example.com \
  --agree-tos \
  --non-interactive \
  --ca-certs /home/step/.step/certs/root_ca.crt

# If you bootstrapped trust with step ca bootstrap, omit --ca-certs
certbot certonly \
  --standalone \
  --server https://ca.internal:9000/acme/acme/directory \
  -d api.internal \
  --email admin@example.com \
  --agree-tos \
  --non-interactive

acme.sh against your internal CA

# Install acme.sh
curl https://get.acme.sh | sh

# Register with your internal CA
acme.sh --register-account \
  --server https://ca.internal:9000/acme/acme/directory \
  --email admin@example.com

# Issue certificate
acme.sh --issue \
  --server https://ca.internal:9000/acme/acme/directory \
  -d api.internal \
  --standalone

# Install to /etc/ssl/api.internal/
acme.sh --install-cert -d api.internal \
  --cert-file      /etc/ssl/api.internal/cert.pem \
  --key-file       /etc/ssl/api.internal/key.pem \
  --fullchain-file /etc/ssl/api.internal/fullchain.pem \
  --reloadcmd      "systemctl reload nginx"

Nginx with auto-renewed internal TLS

# /etc/nginx/sites-available/api-internal.conf
server {
    listen 443 ssl;
    server_name api.internal;

    ssl_certificate     /etc/letsencrypt/live/api.internal/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.internal/privkey.pem;

    ssl_protocols TLSv1.3;
    ssl_prefer_server_ciphers off;

    location / {
        proxy_pass http://127.0.0.1:8080;
        proxy_set_header X-Forwarded-Proto https;
    }
}

# Renewal is handled automatically by certbot.timer
# step-ca default cert lifetime is 24h; certbot renews at 30-days-before-expiry
# For short-lived internal certs, tune RENEW_BEFORE in /etc/letsencrypt/renewal/api.internal.conf:
# renew_before_expiry = 8 hours

Caddy — zero-config internal HTTPS

Caddy has native ACME support and manages its own certificate store. Point it at your internal CA and it handles everything — no certbot, no cron jobs, no renewal scripts.

# /etc/caddy/Caddyfile
{
    acme_ca https://ca.internal:9000/acme/acme/directory
    acme_ca_root /home/step/.step/certs/root_ca.crt
}

api.internal {
    reverse_proxy localhost:8080
}

metrics.internal {
    reverse_proxy localhost:9090
}

grafana.internal {
    reverse_proxy localhost:3000
}

5. Mutual TLS

Standard TLS authenticates only the server — the client verifies the server's certificate but presents nothing itself. Mutual TLS (mTLS) requires both sides to present a certificate. The server verifies the client's certificate, and the client verifies the server's. This makes mTLS the foundation of zero-trust networking: instead of trusting everything on the LAN, every connection proves identity.

What mTLS adds to the handshake

After the server sends its certificate, it sends a CertificateRequest message. The client responds with its own certificate. The server validates the client's certificate against its CA trust store. If validation fails, the connection is rejected — no application-level auth needed.

// Standard TLS: you show me your ID, I'll let you in. // mTLS: we both show IDs. No ID = no connection.

Use cases

Service-to-service authentication in microservices, zero-trust access to internal APIs, database client authentication, preventing unauthorized clients from connecting even if they know the server address, and replacing password-based authentication entirely.

// mTLS = the client certificate IS the password. Unforgeable, rotatable, auditable.

Cilium mTLS

Cilium implements mTLS transparently for Kubernetes pods using eBPF. Pods get a SPIFFE identity. Cilium's eBPF programs handle the TLS handshake in the kernel. No application changes, no sidecar proxies. Any pod-to-pod connection can be enforced with mTLS policy.

// The application sees a plain TCP connection. mTLS happens in the kernel below it.

SPIFFE / SPIRE

SPIFFE (Secure Production Identity Framework For Everyone) is the standard for workload identity in zero-trust architectures. Each workload gets a SPIFFE ID (a URI like spiffe://example.com/service/api). SPIRE is the reference implementation — it issues SVID certificates backed by that identity.

// SPIFFE ID = the workload's passport number. SPIRE = the passport office. mTLS = showing the passport at the door.

Configure mTLS between two services with nginx

# Step 1: Issue client and server certs from your internal CA
step ca certificate server.internal server.crt server.key \
  --san server.internal --not-after 8760h

step ca certificate client-api client.crt client.key \
  --san client-api --not-after 8760h

# Step 2: Configure nginx server to require client cert
server {
    listen 443 ssl;
    server_name server.internal;

    ssl_certificate     /etc/ssl/server.crt;
    ssl_certificate_key /etc/ssl/server.key;

    # Require client certificate signed by your internal CA
    ssl_client_certificate /home/step/.step/certs/root_ca.crt;
    ssl_verify_client      on;
    ssl_verify_depth       2;

    location / {
        # Pass the verified client identity to the upstream
        proxy_set_header X-Client-CN $ssl_client_s_dn_cn;
        proxy_pass http://127.0.0.1:8080;
    }
}

# Step 3: Configure the client to present its certificate
# curl
curl --cert client.crt --key client.key \
     https://server.internal/api/v1/health

# Python requests
import requests
resp = requests.get(
    'https://server.internal/api/v1/health',
    cert=('/etc/ssl/client.crt', '/etc/ssl/client.key')
)

# Go http.Client
cert, _ := tls.LoadX509KeyPair("client.crt", "client.key")
tlsConfig := &tls.Config{Certificates: []tls.Certificate{cert}}
transport := &http.Transport{TLSClientConfig: tlsConfig}
client := &http.Client{Transport: transport}

Cilium mTLS policy

# Enable Cilium mTLS (requires Cilium 1.13+)
# In your Helm values:
# authentication:
#   mutual:
#     spire:
#       enabled: true

# Apply an mTLS policy between pods
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: require-mtls-frontend-backend
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    authentication:
      mode: "required"   # enforces mTLS for this flow
mTLS is the foundation of zero-trust networking. Instead of "trust everything on the LAN," every connection proves identity with a certificate. Cilium does this automatically for Kubernetes pods — no application changes needed. The eBPF programs handle the handshake in the kernel, below the application. Your application just opens a connection. The network layer proves it's talking to the right pod, signed by your CA, before the first byte of application data flows. This is what "zero trust" actually means in practice: cryptographic proof, not network perimeter.

6. Certificate Management for Databases

Database connections are the most commonly unencrypted internal traffic in the average infrastructure. The data is sensitive — credentials, personally identifiable information, business records — and the connection is plaintext on the LAN. Adding TLS is three config lines and a certificate. There is no reason not to do it.

PostgreSQL with TLS

PostgreSQL supports TLS natively. You need a server certificate, a server key, and optionally a CA certificate for verifying client certificates.

# Issue a certificate for PostgreSQL
step ca certificate postgres.internal \
  /var/lib/postgresql/server.crt \
  /var/lib/postgresql/server.key \
  --san postgres.internal \
  --san 10.0.0.20 \
  --not-after 8760h

chown postgres:postgres /var/lib/postgresql/server.crt /var/lib/postgresql/server.key
chmod 600 /var/lib/postgresql/server.key

# Copy the CA root cert
cp /home/step/.step/certs/root_ca.crt /var/lib/postgresql/root.crt
chown postgres:postgres /var/lib/postgresql/root.crt
# postgresql.conf — enable TLS
ssl = on
ssl_cert_file = '/var/lib/postgresql/server.crt'
ssl_key_file  = '/var/lib/postgresql/server.key'
ssl_ca_file   = '/var/lib/postgresql/root.crt'   # for client cert verification

# Enforce TLS for all connections (optional but recommended)
# In pg_hba.conf, change 'host' lines to 'hostssl':
# hostssl  all  all  0.0.0.0/0  scram-sha-256

# Require client certificates for superuser (mTLS for DBAs)
# hostssl  all  postgres  0.0.0.0/0  cert  clientcert=verify-full
# Reload PostgreSQL
systemctl reload postgresql

# Test TLS connection
psql "postgresql://user:pass@postgres.internal/db?sslmode=verify-full&sslrootcert=/etc/ssl/ca.crt"

# psql with client certificate
psql "postgresql://postgres@postgres.internal/db?sslmode=verify-full&sslcert=/etc/ssl/client.crt&sslkey=/etc/ssl/client.key&sslrootcert=/etc/ssl/ca.crt"

MySQL / MariaDB with TLS

# Issue certificate
step ca certificate mysql.internal /etc/mysql/server.crt /etc/mysql/server.key \
  --san mysql.internal --san 10.0.0.21 --not-after 8760h
cp /home/step/.step/certs/root_ca.crt /etc/mysql/ca.crt
chown mysql:mysql /etc/mysql/server.crt /etc/mysql/server.key /etc/mysql/ca.crt
# /etc/mysql/conf.d/tls.cnf
[mysqld]
ssl-ca   = /etc/mysql/ca.crt
ssl-cert = /etc/mysql/server.crt
ssl-key  = /etc/mysql/server.key
# Require TLS for all connections:
# require_secure_transport = ON
# Test
mysql --ssl-ca=/etc/ssl/ca.crt \
      --ssl-cert=/etc/ssl/client.crt \
      --ssl-key=/etc/ssl/client.key \
      -h mysql.internal -u user -p

# Verify TLS is in use
mysql> \s | grep SSL

Redis with TLS

# Redis 6+ has native TLS support
step ca certificate redis.internal /etc/redis/server.crt /etc/redis/server.key \
  --san redis.internal --san 10.0.0.22 --not-after 8760h
cp /home/step/.step/certs/root_ca.crt /etc/redis/ca.crt
# redis.conf — TLS listener
port 0              # disable plaintext
tls-port 6380
tls-cert-file  /etc/redis/server.crt
tls-key-file   /etc/redis/server.key
tls-ca-cert-file /etc/redis/ca.crt
tls-auth-clients yes    # require client certificates (mTLS)
# Test
redis-cli --tls \
  --cacert /etc/ssl/ca.crt \
  --cert   /etc/ssl/client.crt \
  --key    /etc/ssl/client.key \
  -h redis.internal -p 6380 PING
Database connections are the most common unencrypted internal traffic. Adding TLS to PostgreSQL is three lines in postgresql.conf and a certificate. Adding client certificate authentication eliminates the risk of credential theft — even if an attacker has the username and password, they cannot connect without the client certificate. Every kldload system should have TLS on every database. It is not optional.

7. TLS for WireGuard Management

WireGuard itself uses Curve25519 for key exchange and ChaCha20-Poly1305 for encryption — it does not use TLS. But everything around WireGuard does: the web UIs that manage it, the API endpoints that update peer configurations, the Prometheus exporters that scrape metrics. Those need TLS, and with your internal CA they are easy to secure.

Secure the kldload web UI

# The kldload web UI (Python websockets server) runs on port 8443 by default
# Issue a certificate for the management host
step ca certificate mgmt.internal \
  /etc/kldload/tls/server.crt \
  /etc/kldload/tls/server.key \
  --san mgmt.internal \
  --san $(hostname -I | awk '{print $1}') \
  --not-after 8760h

# The kldload web UI picks up TLS cert/key from environment:
# TLS_CERT=/etc/kldload/tls/server.crt
# TLS_KEY=/etc/kldload/tls/server.key
# Set in /etc/kldload/kldload.env and restart the webui service

Secure Prometheus and Grafana

# Issue certificates for monitoring services
step ca certificate prometheus.internal \
  /etc/prometheus/tls/server.crt \
  /etc/prometheus/tls/server.key \
  --san prometheus.internal --san 10.0.0.10 --not-after 8760h

step ca certificate grafana.internal \
  /etc/grafana/tls/server.crt \
  /etc/grafana/tls/server.key \
  --san grafana.internal --san 10.0.0.10 --not-after 8760h
# /etc/prometheus/prometheus.yml — TLS for the Prometheus web server
web:
  tls_config:
    cert_file: /etc/prometheus/tls/server.crt
    key_file:  /etc/prometheus/tls/server.key

# Scrape targets over TLS
scrape_configs:
  - job_name: node
    scheme: https
    tls_config:
      ca_file: /home/step/.step/certs/root_ca.crt
    static_configs:
      - targets: ['node1.internal:9100', 'node2.internal:9100']
# /etc/grafana/grafana.ini — TLS for Grafana
[server]
protocol   = https
cert_file  = /etc/grafana/tls/server.crt
cert_key   = /etc/grafana/tls/server.key

TLS for WireGuard exporters (node_exporter)

# node_exporter supports TLS natively since v1.1
cat > /etc/node_exporter/tls.yml <<'EOF'
tls_server_config:
  cert_file: /etc/node_exporter/server.crt
  key_file:  /etc/node_exporter/server.key
  # Optionally require client certificates for scrape auth:
  # client_ca_file: /home/step/.step/certs/root_ca.crt
  # client_auth_type: RequireAndVerifyClientCert
EOF

# Start with TLS config
node_exporter --web.config.file=/etc/node_exporter/tls.yml

8. Kubernetes Certificate Management

Kubernetes has its own internal PKI (for the API server, etcd, and kubelet), but cert-manager handles everything application-facing: TLS for Ingress resources, certificates for pods, and integration with both internal CAs and Let's Encrypt.

Install cert-manager

# Install cert-manager via Helm
helm repo add jetstack https://charts.jetstack.io
helm repo update

helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set installCRDs=true \
  --set global.leaderElection.namespace=cert-manager

# Verify
kubectl get pods -n cert-manager
kubectl get crds | grep cert-manager.io

ClusterIssuer with step-ca (internal)

# Store the step-ca provisioner password as a secret
kubectl create secret generic step-ca-provisioner-password \
  --namespace cert-manager \
  --from-literal=password="$(cat /etc/kldload/ca-password)"

# ClusterIssuer using step-ca's ACME endpoint
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: step-ca-internal
spec:
  acme:
    server: https://ca.internal:9000/acme/acme/directory
    email: admin@example.com
    caBundle: |-
      # base64-encoded root CA cert
      # $(base64 -w0 /home/step/.step/certs/root_ca.crt)
    privateKeySecretRef:
      name: step-ca-internal-acme-key
    solvers:
    - http01:
        ingress:
          class: nginx
# ClusterIssuer using Let's Encrypt (public services)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
    - dns01:
        cloudflare:
          email: admin@example.com
          apiTokenSecretRef:
            name: cloudflare-token
            key: api-token

Automatic TLS for Ingress

# Ingress with automatic TLS from internal CA
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  annotations:
    cert-manager.io/cluster-issuer: "step-ca-internal"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - api.internal
    secretName: api-tls
  rules:
  - host: api.internal
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 8080

# cert-manager watches for this Ingress, creates a CertificateRequest,
# fulfills the ACME challenge, and stores the cert in the 'api-tls' secret.
# Nginx Ingress controller picks up the secret and serves TLS automatically.

Certificate resource for pod-level TLS

# Direct Certificate resource — not tied to an Ingress
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: postgres-client-cert
  namespace: app
spec:
  secretName: postgres-client-tls
  duration: 24h
  renewBefore: 8h
  subject:
    organizations:
      - kldload
  commonName: app-service
  dnsNames:
    - app-service.app.svc.cluster.local
  usages:
    - client auth
  issuerRef:
    name: step-ca-internal
    kind: ClusterIssuer

# Mount the secret in your pod:
# volumes:
# - name: postgres-tls
#   secret:
#     secretName: postgres-client-tls
# volumeMounts:
# - name: postgres-tls
#   mountPath: /etc/ssl/postgres
#   readOnly: true

Certificate rotation in Kubernetes

# cert-manager handles rotation automatically based on duration/renewBefore
# Check certificate status
kubectl get certificates -A
kubectl describe certificate api-tls -n default

# Force renewal
kubectl delete secret api-tls
# cert-manager re-issues automatically within seconds

# View certificate expiry
kubectl get certificates -A -o custom-columns=\
'NAMESPACE:.metadata.namespace,NAME:.metadata.name,READY:.status.conditions[0].status,EXPIRY:.status.notAfter'

9. Certificate Rotation and Lifecycle

The number one cause of TLS outages is expired certificates. The fix is not "remember to renew" — it is short-lived certificates with automated renewal. If renewal breaks you notice in hours, not when production falls over.

Short-lived certificates

step-ca defaults to 24-hour certificate lifetimes. This is intentional. A compromised certificate is only valid for hours. There is no certificate revocation to manage. If a service is compromised, its cert expires before an attacker can reuse it elsewhere.

// 24-hour cert = today's newspaper. Even if someone steals it, it's worthless tomorrow.

Renewal before expiry

Renew at 2/3 of the certificate's lifetime. For a 24-hour cert, renew at 16 hours. For a 90-day cert, renew at 60 days. This gives a large buffer for renewal failures — you have 8 hours to fix the renewal system before the cert expires.

// Don't wait for the last minute. Renew with time to spare. Failures should be boring.

Automatic renewal with step

step ca renew --daemon runs in the background and renews the certificate automatically when it reaches 2/3 of its lifetime. It handles the ACME protocol, writes the new cert atomically, and runs a configured reload command.

// The renewal daemon is the automation layer. Set it and forget it.

What happens when a cert expires

Every TLS client rejects the connection immediately — including your own services. curl fails with "certificate has expired." psql fails. gRPC fails. The service is effectively down. Automatic renewal makes this a non-event. Manual renewal makes it a 2am incident.

// Expired cert = service outage. No negotiation. No grace period. Automate or suffer.

Renewal daemon for a service

# systemd service to run step renewal daemon for postgres
cat > /etc/systemd/system/step-renew-postgres.service <<'EOF'
[Unit]
Description=step-ca certificate renewal for PostgreSQL
After=step-ca.service network.target
Requires=step-ca.service

[Service]
Type=simple
User=postgres
ExecStart=/usr/local/bin/step ca renew \
  /var/lib/postgresql/server.crt \
  /var/lib/postgresql/server.key \
  --daemon \
  --exec "systemctl reload postgresql" \
  --ca-url https://ca.internal:9000 \
  --root /home/step/.step/certs/root_ca.crt
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

systemctl enable --now step-renew-postgres

Monitor certificate expiry with Prometheus

x509-certificate-exporter scrapes all certificates in your Kubernetes cluster and exposes their expiry as Prometheus metrics. Alert before they expire.

# Install x509-certificate-exporter
helm repo add enix https://charts.enix.io
helm install x509-certificate-exporter enix/x509-certificate-exporter \
  --namespace monitoring \
  --set watchDirectories[0]=/etc/ssl/certs \
  --set watchFiles[0]=/etc/letsencrypt/live/example.com/fullchain.pem
# Prometheus alert rules — fire when cert expires in less than 14 days
groups:
- name: tls-certificates
  rules:
  - alert: CertificateExpiringSoon
    expr: |
      x509_cert_not_after - time() < 14 * 24 * 3600
    for: 1h
    labels:
      severity: warning
    annotations:
      summary: "Certificate expiring soon: {{ $labels.subject_CN }}"
      description: "Certificate {{ $labels.subject_CN }} expires in {{ $value | humanizeDuration }}"

  - alert: CertificateExpired
    expr: |
      x509_cert_not_after - time() < 0
    labels:
      severity: critical
    annotations:
      summary: "Certificate EXPIRED: {{ $labels.subject_CN }}"
# For non-Kubernetes services — blackbox exporter checks TLS expiry
# prometheus.yml
scrape_configs:
  - job_name: tls-probe
    metrics_path: /probe
    params:
      module: [https_2xx]
    static_configs:
      - targets:
        - https://api.internal
        - https://postgres.internal:5432
        - https://grafana.internal:3000
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox-exporter:9115
The biggest TLS outage cause is expired certificates. Short-lived certs plus automatic renewal means this never happens. step-ca defaults to 24-hour certs. The renewal daemon renews at 16 hours. If renewal breaks for any reason — the CA is down, the network is partitioned, the daemon crashed — you have 8 hours to notice and fix it before the service goes down. With a 1-year cert and manual renewal, you have one calendar entry standing between you and a production outage. Automate certificate lifecycle from day one.

10. ZFS and Certificate Storage

Your CA's private key is the most sensitive data in your infrastructure. It signs every certificate. If it leaks, an attacker can impersonate any service in your network. Store it correctly from the start.

Encrypted ZFS dataset for CA keys

# Create an encrypted dataset for CA state
zfs create -o encryption=aes-256-gcm \
           -o keylocation=prompt \
           -o keyformat=passphrase \
           rpool/ca

# Move step-ca's home to the encrypted dataset
zfs create rpool/ca/step-ca
rsync -av /home/step/.step/ /rpool/ca/step-ca/
rm -rf /home/step/.step
ln -s /rpool/ca/step-ca /home/step/.step

# The dataset is locked at rest — loaded only when the CA is running
# To start the CA: load the key, then start the service
zfs load-key rpool/ca
systemctl start step-ca

# To stop and lock:
systemctl stop step-ca
zfs unload-key rpool/ca

Snapshot before key rotation

# Before any CA operation that modifies keys:
zfs snapshot rpool/ca@pre-rotation-$(date +%Y%m%d-%H%M%S)

# Rotate the intermediate CA key
sudo -u step step ca rekey \
  --password-file /etc/kldload/ca-password \
  --ssh

# Verify the CA is healthy after rotation
step ca health --ca-url https://ca.internal:9000

# If something went wrong, roll back
zfs rollback rpool/ca@pre-rotation-20260402-140000

Replicate CA state to DR site

# Send encrypted CA dataset to DR site (replication sends encrypted data)
# The receiving site cannot read the data without the key
zfs snapshot rpool/ca@replication-$(date +%Y%m%d)
zfs send -R rpool/ca@replication-20260402 | \
  ssh dr-site.internal zfs receive backup/ca

# Automated replication with sanoid/syncoid
cat > /etc/sanoid/syncoid-ca.conf <<'EOF'
[rpool/ca]
recursive = yes
target = backup/ca
target_host = dr-site.internal
target_port = 22
EOF

# Run from cron
0 */6 * * * /usr/sbin/syncoid rpool/ca dr-site.internal:backup/ca --no-privilege-elevation

Offline root key procedure

# After CA initialization, move the root key offline
# The CA only needs the intermediate key for day-to-day operation

# Backup root key to encrypted offline storage (USB drive with LUKS)
cryptsetup luksFormat /dev/sdb1
cryptsetup open /dev/sdb1 ca-offline
mkfs.ext4 /dev/mapper/ca-offline
mount /dev/mapper/ca-offline /mnt/ca-offline

cp /home/step/.step/secrets/root_ca_key /mnt/ca-offline/
cp /home/step/.step/certs/root_ca.crt   /mnt/ca-offline/

umount /mnt/ca-offline
cryptsetup close ca-offline

# Remove root key from online storage
# (keep only intermediate key on the CA server)
shred -u /home/step/.step/secrets/root_ca_key

# The CA can still issue certificates using the intermediate key
# The root key is only needed to sign a new intermediate (rare)
Your CA's private key is the most sensitive data in your infrastructure. It signs every certificate. If it leaks, an attacker can impersonate any service. An encrypted ZFS dataset protects it at rest — the key is never written to unencrypted disk. ZFS snapshots give you a rollback point before any key operation. ZFS replication sends encrypted blocks to your DR site — the DR site cannot read the data without the passphrase, but it can restore your CA from backup if your primary goes down. This is the correct approach: encrypted at rest, backed up, tested recovery procedure, root key offline.

11. Troubleshooting

When TLS fails it fails loudly — connections are rejected before any application data flows. These tools let you inspect exactly what is happening at the certificate layer.

openssl s_client — inspect any TLS connection

# Connect to a server and show the full certificate chain
openssl s_client -connect api.internal:443 -showcerts

# Connect and verify against a specific CA
openssl s_client -connect api.internal:443 \
  -CAfile /home/step/.step/certs/root_ca.crt

# Test with SNI (required for virtual hosting)
openssl s_client -connect api.internal:443 -servername api.internal

# Test a specific TLS version
openssl s_client -connect api.internal:443 -tls1_3

# Show certificate expiry
openssl s_client -connect api.internal:443 2>/dev/null | \
  openssl x509 -noout -dates

# Test mTLS — present a client certificate
openssl s_client -connect api.internal:443 \
  -cert /etc/ssl/client.crt \
  -key  /etc/ssl/client.key \
  -CAfile /home/step/.step/certs/root_ca.crt

Certificate inspection

# Inspect a certificate file
openssl x509 -in server.crt -noout -text

# Show just subject and SAN
openssl x509 -in server.crt -noout -subject -ext subjectAltName

# Show expiry
openssl x509 -in server.crt -noout -dates

# Verify a certificate against a CA
openssl verify -CAfile /home/step/.step/certs/root_ca.crt server.crt

# Verify the full chain
openssl verify -CAfile root_ca.crt -untrusted intermediate_ca.crt leaf.crt

# step certificate inspect (more readable output)
step certificate inspect server.crt

curl against internal CA

# If you haven't bootstrapped trust, provide the CA cert explicitly
curl --cacert /home/step/.step/certs/root_ca.crt https://api.internal/health

# Test with client certificate (mTLS)
curl --cacert /home/step/.step/certs/root_ca.crt \
     --cert   /etc/ssl/client.crt \
     --key    /etc/ssl/client.key \
     https://api.internal/health

# Verbose — shows handshake details, certificate chain, cipher suite
curl -v --cacert /home/step/.step/certs/root_ca.crt https://api.internal/health 2>&1 | \
  grep -E "(TLS|SSL|certificate|subject|issuer|expire|Verify)"

Common failure modes

Error message Root cause Fix
certificate has expired Not After timestamp is in the past Renew the certificate. Fix the renewal automation so it doesn't happen again.
certificate is not yet valid Clock skew — system clock is behind the cert's Not Before Sync NTP: chronyc makestep. Ensure all nodes run chronyd.
certificate signed by unknown authority CA root not in trust store Run step ca bootstrap on the client. Or pass --cacert to curl.
x509: certificate is valid for api.internal, not db.internal Wrong SAN — connecting to a name not in the certificate Reissue with the correct SAN, or add the name to the existing SAN list.
tls: certificate required Server requires client certificate (mTLS), client presented none Issue a client certificate with EKU=clientAuth, configure the client to present it.
remote error: tls: bad certificate Server rejected the client's certificate — wrong CA or missing clientAuth EKU Ensure client cert is signed by the CA the server trusts. Check EKU includes clientAuth.
handshake failure No cipher suite in common, or TLS version mismatch Check ssl_protocols config. Ensure both sides support TLS 1.2 or 1.3.

step-ca diagnostics

# Check CA health
step ca health --ca-url https://ca.internal:9000

# List provisioners
step ca provisioner list --ca-url https://ca.internal:9000

# View CA logs
journalctl -u step-ca -f

# List issued certificates (requires admin provisioner)
step ca admin list --ca-url https://ca.internal:9000

# Inspect the CA's own certificate
step certificate inspect \
  <(curl -sk https://ca.internal:9000/roots) \
  --format json | jq '.validity'

Related pages