TLS & PKI Masterclass
This guide covers the full lifecycle of certificates in a self-managed infrastructure: public HTTPS via Let's Encrypt, internal PKI with step-ca, mutual TLS between services, database client certificates, Kubernetes cert-manager, and certificate rotation. By the end you will have every connection in your stack encrypted and every certificate renewed automatically — without paying a CA or touching a certificate manually ever again.
The premise: TLS is the encryption layer for everything that is not WireGuard. HTTPS, database connections, API calls, SMTP, gRPC, metrics scrapes — they all need certificates. Public services need public certificates. Internal services need internal certificates. This masterclass teaches you to run your own Certificate Authority and never pay for or manually manage a certificate again.
What this page covers: TLS fundamentals, Let's Encrypt for public services, step-ca for internal PKI, ACME automation, mutual TLS, database certificate configs, Kubernetes cert-manager, certificate rotation strategy, ZFS-backed CA key storage, and a troubleshooting reference — all grounded in the kldload stack.
Prerequisites: a running kldload system. The Kubernetes sections assume a cluster from the Kubernetes on KVM guide. Everything else works on any kldload node.
1. TLS Fundamentals
Before running a CA you need to understand what a certificate actually is and what TLS is actually doing. This section covers the mechanics clearly, without the math.
How the TLS handshake works
When your browser connects to a server over HTTPS, it runs a handshake before any application data flows. The handshake has three jobs: agree on cipher suites, authenticate the server (and optionally the client), and establish a shared symmetric key for the session. In TLS 1.3 — the version everything should be using — this handshake takes one round trip.
Step 1 — ClientHello
The client sends the TLS version it supports, a random nonce, and a list of cipher suites it can use. In TLS 1.3 it also sends key share material for the algorithms it expects the server to choose.
Step 2 — ServerHello + Certificate
The server picks a cipher suite, sends its half of the key exchange, and presents its certificate chain. The client validates the chain against its trust store — the set of CA root certificates baked into the OS or browser.
Step 3 — Key derivation
Both sides derive the same symmetric session key from the key exchange material. In TLS 1.3 this uses ECDHE (Elliptic Curve Diffie-Hellman Ephemeral) — every session gets a fresh key, so past sessions stay private even if the server's private key later leaks.
Step 4 — Finished + application data
Both sides send a Finished message (a MAC over the entire handshake transcript), proving they derived the same key and that nothing was tampered with in transit. After that, all application data is encrypted with AES-GCM or ChaCha20-Poly1305.
Certificate anatomy
A certificate is a signed data structure. It binds a public key to an identity and is signed by a CA that vouches for that binding. The CA's signature is what makes the certificate trusted — any entity with the corresponding CA root in its trust store can verify the signature and accept the binding.
Subject
The entity the certificate identifies — usually a Common Name (CN) like postgres.internal. The CN is largely legacy; modern TLS validation uses the Subject Alternative Name (SAN) extension instead.
Subject Alternative Name (SAN)
The list of DNS names, IP addresses, and URIs the certificate is valid for. This is the field clients actually check. A cert for api.example.com with a SAN of *.example.com is valid for any subdomain.
Issuer and signature
The CA that signed the certificate. The signature is a cryptographic hash of the certificate body, encrypted with the CA's private key. Anyone with the CA's public key (from the root cert) can verify it.
Validity window
Not Before and Not After timestamps. A certificate is only valid within this window. TLS clients reject certificates outside the window — even by one second. Clock skew between systems is a real operational hazard.
Key type and usage
The public key algorithm (RSA 2048/4096, ECDSA P-256/P-384) and what the key is allowed to do (Key Usage: digital signature, key encipherment; Extended Key Usage: server authentication, client authentication). A CA cert has the CA:TRUE basic constraint.
Serial number and SKI/AKI
Each certificate has a unique serial number within its CA. Subject Key Identifier (SKI) and Authority Key Identifier (AKI) link the certificate to its issuer's public key — these are how chain building finds the right intermediate.
The certificate chain
Trust is hierarchical. A root CA is self-signed — it vouches for itself. Browsers and operating systems ship a curated set of root CA certificates they trust unconditionally. Everything below a root is trusted transitively: if the root is trusted, and the root signed an intermediate, and the intermediate signed a leaf, then the leaf is trusted. This chain structure lets CAs operate without exposing their root key.
Root CA
Self-signed. Ships in OS/browser trust stores. For a public CA like Let's Encrypt, this is the ISRG Root X1 certificate. For your internal CA, this is the root you generate with step-ca. Kept offline when possible.
Intermediate CA
Signed by the root. Used for day-to-day certificate issuance. If an intermediate is compromised it can be revoked without replacing the root — all trust stores just need to distrust that intermediate. step-ca generates one automatically.
Leaf certificate
The actual certificate presented by a server or client. Signed by the intermediate. Has a short validity period (90 days for Let's Encrypt, 24 hours for step-ca defaults). Cannot sign other certificates — CA:FALSE basic constraint.
curl -k.2. Let's Encrypt for Public Services
Let's Encrypt is a free, automated, publicly trusted CA. It issues 90-day certificates via the ACME protocol (Automated Certificate Management Environment). Every major web framework and server has ACME support. For any service with a public DNS record, Let's Encrypt is the correct answer.
Install certbot on kldload
# CentOS / RHEL / Rocky
dnf install -y certbot python3-certbot-nginx python3-certbot-dns-cloudflare
# Debian / Ubuntu
apt install -y certbot python3-certbot-nginx python3-certbot-dns-cloudflare
HTTP-01 challenge — public web server
The HTTP-01 challenge proves domain ownership by serving a token file at
http://yourdomain.com/.well-known/acme-challenge/TOKEN. Let's Encrypt fetches it
over HTTP and verifies the token. This requires port 80 to be open and reachable.
# Issue a certificate for a domain with nginx running
certbot --nginx -d example.com -d www.example.com \
--email admin@example.com --agree-tos --non-interactive
# Certbot automatically edits /etc/nginx/sites-enabled/example.conf
# to add TLS configuration and renewal hooks
DNS-01 challenge — wildcard and internal names
The DNS-01 challenge proves domain ownership by adding a TXT record to your DNS
zone. It does not require port 80. It is the only challenge type that can issue
wildcard certificates (*.example.com). It works for services behind firewalls as
long as the DNS provider has an API.
# Cloudflare DNS-01 — create a credentials file first
cat > /etc/letsencrypt/cloudflare.ini <<'EOF'
dns_cloudflare_api_token = YOUR_CLOUDFLARE_API_TOKEN
EOF
chmod 600 /etc/letsencrypt/cloudflare.ini
# Issue wildcard certificate
certbot certonly \
--dns-cloudflare \
--dns-cloudflare-credentials /etc/letsencrypt/cloudflare.ini \
-d "*.example.com" -d "example.com" \
--email admin@example.com --agree-tos --non-interactive
# Certificate lands at:
# /etc/letsencrypt/live/example.com/fullchain.pem
# /etc/letsencrypt/live/example.com/privkey.pem
Automatic renewal with systemd
certbot installs a systemd timer on most distributions. Check it and verify it fires:
# Check the timer is active
systemctl status certbot.timer
systemctl list-timers certbot.timer
# Test renewal without actually renewing
certbot renew --dry-run
# Force renewal (certificates not yet near expiry — useful for testing)
certbot renew --force-renewal
# View existing certificates and expiry
certbot certificates
The timer runs twice daily and renews any certificate within 30 days of expiry.
After renewal certbot runs the configured deploy hook — for nginx this is
nginx -s reload. You can add your own deploy hooks in
/etc/letsencrypt/renewal-hooks/deploy/.
# Example: reload multiple services after renewal
cat > /etc/letsencrypt/renewal-hooks/deploy/reload-services.sh <<'EOF'
#!/bin/bash
systemctl reload nginx
systemctl reload postfix
systemctl reload dovecot
EOF
chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-services.sh
Nginx HTTPS configuration example
server {
listen 80;
server_name example.com www.example.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
server_name example.com www.example.com;
ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
# Modern TLS settings
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers off;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 1d;
# HSTS — tell browsers to only use HTTPS for 1 year
add_header Strict-Transport-Security "max-age=31536000" always;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-Proto https;
}
}
3. step-ca — Your Own Certificate Authority
step-ca is an open-source online CA from Smallstep. It implements the same ACME
protocol that Let's Encrypt uses, so any tool that speaks ACME — certbot, acme.sh,
Caddy, Traefik — can use your internal CA without modification. It also has its own
CLI (step) for issuing certificates directly and a provisioner model that supports
ACME, JWK, OAuth, OIDC, x5c, and more.
Install step and step-ca on kldload
# Download the step CLI
curl -Lo /tmp/step.tar.gz \
https://github.com/smallstep/cli/releases/latest/download/step_linux_amd64.tar.gz
tar -xzf /tmp/step.tar.gz -C /tmp
install -m 0755 /tmp/step_*/bin/step /usr/local/bin/step
# Download step-ca
curl -Lo /tmp/step-ca.tar.gz \
https://github.com/smallstep/certificates/releases/latest/download/step-ca_linux_amd64.tar.gz
tar -xzf /tmp/step-ca.tar.gz -C /tmp
install -m 0755 /tmp/step-ca_*/bin/step-ca /usr/local/bin/step-ca
# Verify
step version
step-ca version
Initialize the CA
Initialization creates the root CA key and certificate, an intermediate key and certificate signed by the root, and an initial provisioner. Run this once on the machine that will host your CA. Store the root key offline after initialization.
# Create a dedicated user for the CA
useradd --system --create-home --shell /bin/false step
# Initialize the CA as the step user
sudo -u step step ca init \
--name "kldload Internal CA" \
--dns "ca.internal,ca.kldload.local,$(hostname -I | awk '{print $1}')" \
--address ":9000" \
--provisioner "admin@example.com" \
--password-file /dev/stdin <<<"$(cat /etc/kldload/ca-password)"
# The init creates:
# ~/.step/certs/root_ca.crt — root certificate (distribute to trust stores)
# ~/.step/certs/intermediate_ca.crt
# ~/.step/secrets/root_ca_key — root private key (move offline)
# ~/.step/secrets/intermediate_ca_key
# ~/.step/config/ca.json — CA configuration
Add an ACME provisioner
The ACME provisioner lets certbot, acme.sh, and any other ACME client issue certificates from your internal CA without any Smallstep-specific client code.
# Add ACME provisioner
sudo -u step step ca provisioner add acme --type ACME \
--admin-provisioner admin@example.com \
--admin-subject admin@example.com
# The ACME directory will be available at:
# https://ca.internal:9000/acme/acme/directory
Run step-ca as a systemd service
cat > /etc/systemd/system/step-ca.service <<'EOF'
[Unit]
Description=step-ca Certificate Authority
After=network.target
ConditionFileNotEmpty=/home/step/.step/config/ca.json
[Service]
User=step
Group=step
ExecStart=/usr/local/bin/step-ca \
/home/step/.step/config/ca.json \
--password-file /etc/kldload/ca-password
Restart=on-failure
RestartSec=5
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/home/step/.step
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now step-ca
systemctl status step-ca
Bootstrap trust on your machines
Bootstrapping installs your CA's root certificate into the system trust store. Run this once on every machine that needs to trust your internal CA.
# On each machine that should trust your internal CA
step ca bootstrap --ca-url https://ca.internal:9000 \
--fingerprint $(step certificate fingerprint /home/step/.step/certs/root_ca.crt)
# This adds the root cert to /etc/ssl/certs/ and updates the system trust store
# After this: curl https://internal-service.internal just works — no -k flag needed
# Verify
step ca health --ca-url https://ca.internal:9000
Issue certificates from the CLI
# Issue a certificate for a service
step ca certificate postgres.internal postgres.crt postgres.key \
--ca-url https://ca.internal:9000 \
--san postgres.internal \
--san 10.0.0.20 \
--not-after 8760h # 1 year; default is 24h
# Issue a short-lived cert (recommended)
step ca certificate api.internal api.crt api.key \
--not-after 24h
# Renew before expiry
step ca renew api.crt api.key --force
# Inspect a certificate
step certificate inspect api.crt
4. ACME for Internal Services
Once step-ca is running with an ACME provisioner, any ACME-capable tool can issue internal certificates automatically. This means you can use the same tooling for internal services as you do for public services — certbot, acme.sh, Caddy, Traefik, and cert-manager all speak ACME natively.
certbot against your internal CA
# Tell certbot to use your internal CA's ACME directory
# Note: --server points at the ACME directory URL
# Note: --no-verify-ssl is needed only if certbot itself hasn't bootstrapped trust
certbot certonly \
--standalone \
--server https://ca.internal:9000/acme/acme/directory \
-d api.internal \
--email admin@example.com \
--agree-tos \
--non-interactive \
--ca-certs /home/step/.step/certs/root_ca.crt
# If you bootstrapped trust with step ca bootstrap, omit --ca-certs
certbot certonly \
--standalone \
--server https://ca.internal:9000/acme/acme/directory \
-d api.internal \
--email admin@example.com \
--agree-tos \
--non-interactive
acme.sh against your internal CA
# Install acme.sh
curl https://get.acme.sh | sh
# Register with your internal CA
acme.sh --register-account \
--server https://ca.internal:9000/acme/acme/directory \
--email admin@example.com
# Issue certificate
acme.sh --issue \
--server https://ca.internal:9000/acme/acme/directory \
-d api.internal \
--standalone
# Install to /etc/ssl/api.internal/
acme.sh --install-cert -d api.internal \
--cert-file /etc/ssl/api.internal/cert.pem \
--key-file /etc/ssl/api.internal/key.pem \
--fullchain-file /etc/ssl/api.internal/fullchain.pem \
--reloadcmd "systemctl reload nginx"
Nginx with auto-renewed internal TLS
# /etc/nginx/sites-available/api-internal.conf
server {
listen 443 ssl;
server_name api.internal;
ssl_certificate /etc/letsencrypt/live/api.internal/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/api.internal/privkey.pem;
ssl_protocols TLSv1.3;
ssl_prefer_server_ciphers off;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header X-Forwarded-Proto https;
}
}
# Renewal is handled automatically by certbot.timer
# step-ca default cert lifetime is 24h; certbot renews at 30-days-before-expiry
# For short-lived internal certs, tune RENEW_BEFORE in /etc/letsencrypt/renewal/api.internal.conf:
# renew_before_expiry = 8 hours
Caddy — zero-config internal HTTPS
Caddy has native ACME support and manages its own certificate store. Point it at your internal CA and it handles everything — no certbot, no cron jobs, no renewal scripts.
# /etc/caddy/Caddyfile
{
acme_ca https://ca.internal:9000/acme/acme/directory
acme_ca_root /home/step/.step/certs/root_ca.crt
}
api.internal {
reverse_proxy localhost:8080
}
metrics.internal {
reverse_proxy localhost:9090
}
grafana.internal {
reverse_proxy localhost:3000
}
5. Mutual TLS
Standard TLS authenticates only the server — the client verifies the server's certificate but presents nothing itself. Mutual TLS (mTLS) requires both sides to present a certificate. The server verifies the client's certificate, and the client verifies the server's. This makes mTLS the foundation of zero-trust networking: instead of trusting everything on the LAN, every connection proves identity.
What mTLS adds to the handshake
After the server sends its certificate, it sends a CertificateRequest message. The client responds with its own certificate. The server validates the client's certificate against its CA trust store. If validation fails, the connection is rejected — no application-level auth needed.
Use cases
Service-to-service authentication in microservices, zero-trust access to internal APIs, database client authentication, preventing unauthorized clients from connecting even if they know the server address, and replacing password-based authentication entirely.
Cilium mTLS
Cilium implements mTLS transparently for Kubernetes pods using eBPF. Pods get a SPIFFE identity. Cilium's eBPF programs handle the TLS handshake in the kernel. No application changes, no sidecar proxies. Any pod-to-pod connection can be enforced with mTLS policy.
SPIFFE / SPIRE
SPIFFE (Secure Production Identity Framework For Everyone) is the standard for workload identity in zero-trust architectures. Each workload gets a SPIFFE ID (a URI like spiffe://example.com/service/api). SPIRE is the reference implementation — it issues SVID certificates backed by that identity.
Configure mTLS between two services with nginx
# Step 1: Issue client and server certs from your internal CA
step ca certificate server.internal server.crt server.key \
--san server.internal --not-after 8760h
step ca certificate client-api client.crt client.key \
--san client-api --not-after 8760h
# Step 2: Configure nginx server to require client cert
server {
listen 443 ssl;
server_name server.internal;
ssl_certificate /etc/ssl/server.crt;
ssl_certificate_key /etc/ssl/server.key;
# Require client certificate signed by your internal CA
ssl_client_certificate /home/step/.step/certs/root_ca.crt;
ssl_verify_client on;
ssl_verify_depth 2;
location / {
# Pass the verified client identity to the upstream
proxy_set_header X-Client-CN $ssl_client_s_dn_cn;
proxy_pass http://127.0.0.1:8080;
}
}
# Step 3: Configure the client to present its certificate
# curl
curl --cert client.crt --key client.key \
https://server.internal/api/v1/health
# Python requests
import requests
resp = requests.get(
'https://server.internal/api/v1/health',
cert=('/etc/ssl/client.crt', '/etc/ssl/client.key')
)
# Go http.Client
cert, _ := tls.LoadX509KeyPair("client.crt", "client.key")
tlsConfig := &tls.Config{Certificates: []tls.Certificate{cert}}
transport := &http.Transport{TLSClientConfig: tlsConfig}
client := &http.Client{Transport: transport}
Cilium mTLS policy
# Enable Cilium mTLS (requires Cilium 1.13+)
# In your Helm values:
# authentication:
# mutual:
# spire:
# enabled: true
# Apply an mTLS policy between pods
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: require-mtls-frontend-backend
spec:
endpointSelector:
matchLabels:
app: backend
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
authentication:
mode: "required" # enforces mTLS for this flow
6. Certificate Management for Databases
Database connections are the most commonly unencrypted internal traffic in the average infrastructure. The data is sensitive — credentials, personally identifiable information, business records — and the connection is plaintext on the LAN. Adding TLS is three config lines and a certificate. There is no reason not to do it.
PostgreSQL with TLS
PostgreSQL supports TLS natively. You need a server certificate, a server key, and optionally a CA certificate for verifying client certificates.
# Issue a certificate for PostgreSQL
step ca certificate postgres.internal \
/var/lib/postgresql/server.crt \
/var/lib/postgresql/server.key \
--san postgres.internal \
--san 10.0.0.20 \
--not-after 8760h
chown postgres:postgres /var/lib/postgresql/server.crt /var/lib/postgresql/server.key
chmod 600 /var/lib/postgresql/server.key
# Copy the CA root cert
cp /home/step/.step/certs/root_ca.crt /var/lib/postgresql/root.crt
chown postgres:postgres /var/lib/postgresql/root.crt
# postgresql.conf — enable TLS
ssl = on
ssl_cert_file = '/var/lib/postgresql/server.crt'
ssl_key_file = '/var/lib/postgresql/server.key'
ssl_ca_file = '/var/lib/postgresql/root.crt' # for client cert verification
# Enforce TLS for all connections (optional but recommended)
# In pg_hba.conf, change 'host' lines to 'hostssl':
# hostssl all all 0.0.0.0/0 scram-sha-256
# Require client certificates for superuser (mTLS for DBAs)
# hostssl all postgres 0.0.0.0/0 cert clientcert=verify-full
# Reload PostgreSQL
systemctl reload postgresql
# Test TLS connection
psql "postgresql://user:pass@postgres.internal/db?sslmode=verify-full&sslrootcert=/etc/ssl/ca.crt"
# psql with client certificate
psql "postgresql://postgres@postgres.internal/db?sslmode=verify-full&sslcert=/etc/ssl/client.crt&sslkey=/etc/ssl/client.key&sslrootcert=/etc/ssl/ca.crt"
MySQL / MariaDB with TLS
# Issue certificate
step ca certificate mysql.internal /etc/mysql/server.crt /etc/mysql/server.key \
--san mysql.internal --san 10.0.0.21 --not-after 8760h
cp /home/step/.step/certs/root_ca.crt /etc/mysql/ca.crt
chown mysql:mysql /etc/mysql/server.crt /etc/mysql/server.key /etc/mysql/ca.crt
# /etc/mysql/conf.d/tls.cnf
[mysqld]
ssl-ca = /etc/mysql/ca.crt
ssl-cert = /etc/mysql/server.crt
ssl-key = /etc/mysql/server.key
# Require TLS for all connections:
# require_secure_transport = ON
# Test
mysql --ssl-ca=/etc/ssl/ca.crt \
--ssl-cert=/etc/ssl/client.crt \
--ssl-key=/etc/ssl/client.key \
-h mysql.internal -u user -p
# Verify TLS is in use
mysql> \s | grep SSL
Redis with TLS
# Redis 6+ has native TLS support
step ca certificate redis.internal /etc/redis/server.crt /etc/redis/server.key \
--san redis.internal --san 10.0.0.22 --not-after 8760h
cp /home/step/.step/certs/root_ca.crt /etc/redis/ca.crt
# redis.conf — TLS listener
port 0 # disable plaintext
tls-port 6380
tls-cert-file /etc/redis/server.crt
tls-key-file /etc/redis/server.key
tls-ca-cert-file /etc/redis/ca.crt
tls-auth-clients yes # require client certificates (mTLS)
# Test
redis-cli --tls \
--cacert /etc/ssl/ca.crt \
--cert /etc/ssl/client.crt \
--key /etc/ssl/client.key \
-h redis.internal -p 6380 PING
7. TLS for WireGuard Management
WireGuard itself uses Curve25519 for key exchange and ChaCha20-Poly1305 for encryption — it does not use TLS. But everything around WireGuard does: the web UIs that manage it, the API endpoints that update peer configurations, the Prometheus exporters that scrape metrics. Those need TLS, and with your internal CA they are easy to secure.
Secure the kldload web UI
# The kldload web UI (Python websockets server) runs on port 8443 by default
# Issue a certificate for the management host
step ca certificate mgmt.internal \
/etc/kldload/tls/server.crt \
/etc/kldload/tls/server.key \
--san mgmt.internal \
--san $(hostname -I | awk '{print $1}') \
--not-after 8760h
# The kldload web UI picks up TLS cert/key from environment:
# TLS_CERT=/etc/kldload/tls/server.crt
# TLS_KEY=/etc/kldload/tls/server.key
# Set in /etc/kldload/kldload.env and restart the webui service
Secure Prometheus and Grafana
# Issue certificates for monitoring services
step ca certificate prometheus.internal \
/etc/prometheus/tls/server.crt \
/etc/prometheus/tls/server.key \
--san prometheus.internal --san 10.0.0.10 --not-after 8760h
step ca certificate grafana.internal \
/etc/grafana/tls/server.crt \
/etc/grafana/tls/server.key \
--san grafana.internal --san 10.0.0.10 --not-after 8760h
# /etc/prometheus/prometheus.yml — TLS for the Prometheus web server
web:
tls_config:
cert_file: /etc/prometheus/tls/server.crt
key_file: /etc/prometheus/tls/server.key
# Scrape targets over TLS
scrape_configs:
- job_name: node
scheme: https
tls_config:
ca_file: /home/step/.step/certs/root_ca.crt
static_configs:
- targets: ['node1.internal:9100', 'node2.internal:9100']
# /etc/grafana/grafana.ini — TLS for Grafana
[server]
protocol = https
cert_file = /etc/grafana/tls/server.crt
cert_key = /etc/grafana/tls/server.key
TLS for WireGuard exporters (node_exporter)
# node_exporter supports TLS natively since v1.1
cat > /etc/node_exporter/tls.yml <<'EOF'
tls_server_config:
cert_file: /etc/node_exporter/server.crt
key_file: /etc/node_exporter/server.key
# Optionally require client certificates for scrape auth:
# client_ca_file: /home/step/.step/certs/root_ca.crt
# client_auth_type: RequireAndVerifyClientCert
EOF
# Start with TLS config
node_exporter --web.config.file=/etc/node_exporter/tls.yml
8. Kubernetes Certificate Management
Kubernetes has its own internal PKI (for the API server, etcd, and kubelet), but cert-manager handles everything application-facing: TLS for Ingress resources, certificates for pods, and integration with both internal CAs and Let's Encrypt.
Install cert-manager
# Install cert-manager via Helm
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set installCRDs=true \
--set global.leaderElection.namespace=cert-manager
# Verify
kubectl get pods -n cert-manager
kubectl get crds | grep cert-manager.io
ClusterIssuer with step-ca (internal)
# Store the step-ca provisioner password as a secret
kubectl create secret generic step-ca-provisioner-password \
--namespace cert-manager \
--from-literal=password="$(cat /etc/kldload/ca-password)"
# ClusterIssuer using step-ca's ACME endpoint
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: step-ca-internal
spec:
acme:
server: https://ca.internal:9000/acme/acme/directory
email: admin@example.com
caBundle: |-
# base64-encoded root CA cert
# $(base64 -w0 /home/step/.step/certs/root_ca.crt)
privateKeySecretRef:
name: step-ca-internal-acme-key
solvers:
- http01:
ingress:
class: nginx
# ClusterIssuer using Let's Encrypt (public services)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@example.com
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- dns01:
cloudflare:
email: admin@example.com
apiTokenSecretRef:
name: cloudflare-token
key: api-token
Automatic TLS for Ingress
# Ingress with automatic TLS from internal CA
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
annotations:
cert-manager.io/cluster-issuer: "step-ca-internal"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- api.internal
secretName: api-tls
rules:
- host: api.internal
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8080
# cert-manager watches for this Ingress, creates a CertificateRequest,
# fulfills the ACME challenge, and stores the cert in the 'api-tls' secret.
# Nginx Ingress controller picks up the secret and serves TLS automatically.
Certificate resource for pod-level TLS
# Direct Certificate resource — not tied to an Ingress
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: postgres-client-cert
namespace: app
spec:
secretName: postgres-client-tls
duration: 24h
renewBefore: 8h
subject:
organizations:
- kldload
commonName: app-service
dnsNames:
- app-service.app.svc.cluster.local
usages:
- client auth
issuerRef:
name: step-ca-internal
kind: ClusterIssuer
# Mount the secret in your pod:
# volumes:
# - name: postgres-tls
# secret:
# secretName: postgres-client-tls
# volumeMounts:
# - name: postgres-tls
# mountPath: /etc/ssl/postgres
# readOnly: true
Certificate rotation in Kubernetes
# cert-manager handles rotation automatically based on duration/renewBefore
# Check certificate status
kubectl get certificates -A
kubectl describe certificate api-tls -n default
# Force renewal
kubectl delete secret api-tls
# cert-manager re-issues automatically within seconds
# View certificate expiry
kubectl get certificates -A -o custom-columns=\
'NAMESPACE:.metadata.namespace,NAME:.metadata.name,READY:.status.conditions[0].status,EXPIRY:.status.notAfter'
9. Certificate Rotation and Lifecycle
The number one cause of TLS outages is expired certificates. The fix is not "remember to renew" — it is short-lived certificates with automated renewal. If renewal breaks you notice in hours, not when production falls over.
Short-lived certificates
step-ca defaults to 24-hour certificate lifetimes. This is intentional. A compromised certificate is only valid for hours. There is no certificate revocation to manage. If a service is compromised, its cert expires before an attacker can reuse it elsewhere.
Renewal before expiry
Renew at 2/3 of the certificate's lifetime. For a 24-hour cert, renew at 16 hours. For a 90-day cert, renew at 60 days. This gives a large buffer for renewal failures — you have 8 hours to fix the renewal system before the cert expires.
Automatic renewal with step
step ca renew --daemon runs in the background and renews the certificate automatically when it reaches 2/3 of its lifetime. It handles the ACME protocol, writes the new cert atomically, and runs a configured reload command.
What happens when a cert expires
Every TLS client rejects the connection immediately — including your own services. curl fails with "certificate has expired." psql fails. gRPC fails. The service is effectively down. Automatic renewal makes this a non-event. Manual renewal makes it a 2am incident.
Renewal daemon for a service
# systemd service to run step renewal daemon for postgres
cat > /etc/systemd/system/step-renew-postgres.service <<'EOF'
[Unit]
Description=step-ca certificate renewal for PostgreSQL
After=step-ca.service network.target
Requires=step-ca.service
[Service]
Type=simple
User=postgres
ExecStart=/usr/local/bin/step ca renew \
/var/lib/postgresql/server.crt \
/var/lib/postgresql/server.key \
--daemon \
--exec "systemctl reload postgresql" \
--ca-url https://ca.internal:9000 \
--root /home/step/.step/certs/root_ca.crt
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
systemctl enable --now step-renew-postgres
Monitor certificate expiry with Prometheus
x509-certificate-exporter scrapes all certificates in your Kubernetes cluster and exposes their expiry as Prometheus metrics. Alert before they expire.
# Install x509-certificate-exporter
helm repo add enix https://charts.enix.io
helm install x509-certificate-exporter enix/x509-certificate-exporter \
--namespace monitoring \
--set watchDirectories[0]=/etc/ssl/certs \
--set watchFiles[0]=/etc/letsencrypt/live/example.com/fullchain.pem
# Prometheus alert rules — fire when cert expires in less than 14 days
groups:
- name: tls-certificates
rules:
- alert: CertificateExpiringSoon
expr: |
x509_cert_not_after - time() < 14 * 24 * 3600
for: 1h
labels:
severity: warning
annotations:
summary: "Certificate expiring soon: {{ $labels.subject_CN }}"
description: "Certificate {{ $labels.subject_CN }} expires in {{ $value | humanizeDuration }}"
- alert: CertificateExpired
expr: |
x509_cert_not_after - time() < 0
labels:
severity: critical
annotations:
summary: "Certificate EXPIRED: {{ $labels.subject_CN }}"
# For non-Kubernetes services — blackbox exporter checks TLS expiry
# prometheus.yml
scrape_configs:
- job_name: tls-probe
metrics_path: /probe
params:
module: [https_2xx]
static_configs:
- targets:
- https://api.internal
- https://postgres.internal:5432
- https://grafana.internal:3000
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter:9115
10. ZFS and Certificate Storage
Your CA's private key is the most sensitive data in your infrastructure. It signs every certificate. If it leaks, an attacker can impersonate any service in your network. Store it correctly from the start.
Encrypted ZFS dataset for CA keys
# Create an encrypted dataset for CA state
zfs create -o encryption=aes-256-gcm \
-o keylocation=prompt \
-o keyformat=passphrase \
rpool/ca
# Move step-ca's home to the encrypted dataset
zfs create rpool/ca/step-ca
rsync -av /home/step/.step/ /rpool/ca/step-ca/
rm -rf /home/step/.step
ln -s /rpool/ca/step-ca /home/step/.step
# The dataset is locked at rest — loaded only when the CA is running
# To start the CA: load the key, then start the service
zfs load-key rpool/ca
systemctl start step-ca
# To stop and lock:
systemctl stop step-ca
zfs unload-key rpool/ca
Snapshot before key rotation
# Before any CA operation that modifies keys:
zfs snapshot rpool/ca@pre-rotation-$(date +%Y%m%d-%H%M%S)
# Rotate the intermediate CA key
sudo -u step step ca rekey \
--password-file /etc/kldload/ca-password \
--ssh
# Verify the CA is healthy after rotation
step ca health --ca-url https://ca.internal:9000
# If something went wrong, roll back
zfs rollback rpool/ca@pre-rotation-20260402-140000
Replicate CA state to DR site
# Send encrypted CA dataset to DR site (replication sends encrypted data)
# The receiving site cannot read the data without the key
zfs snapshot rpool/ca@replication-$(date +%Y%m%d)
zfs send -R rpool/ca@replication-20260402 | \
ssh dr-site.internal zfs receive backup/ca
# Automated replication with sanoid/syncoid
cat > /etc/sanoid/syncoid-ca.conf <<'EOF'
[rpool/ca]
recursive = yes
target = backup/ca
target_host = dr-site.internal
target_port = 22
EOF
# Run from cron
0 */6 * * * /usr/sbin/syncoid rpool/ca dr-site.internal:backup/ca --no-privilege-elevation
Offline root key procedure
# After CA initialization, move the root key offline
# The CA only needs the intermediate key for day-to-day operation
# Backup root key to encrypted offline storage (USB drive with LUKS)
cryptsetup luksFormat /dev/sdb1
cryptsetup open /dev/sdb1 ca-offline
mkfs.ext4 /dev/mapper/ca-offline
mount /dev/mapper/ca-offline /mnt/ca-offline
cp /home/step/.step/secrets/root_ca_key /mnt/ca-offline/
cp /home/step/.step/certs/root_ca.crt /mnt/ca-offline/
umount /mnt/ca-offline
cryptsetup close ca-offline
# Remove root key from online storage
# (keep only intermediate key on the CA server)
shred -u /home/step/.step/secrets/root_ca_key
# The CA can still issue certificates using the intermediate key
# The root key is only needed to sign a new intermediate (rare)
11. Troubleshooting
When TLS fails it fails loudly — connections are rejected before any application data flows. These tools let you inspect exactly what is happening at the certificate layer.
openssl s_client — inspect any TLS connection
# Connect to a server and show the full certificate chain
openssl s_client -connect api.internal:443 -showcerts
# Connect and verify against a specific CA
openssl s_client -connect api.internal:443 \
-CAfile /home/step/.step/certs/root_ca.crt
# Test with SNI (required for virtual hosting)
openssl s_client -connect api.internal:443 -servername api.internal
# Test a specific TLS version
openssl s_client -connect api.internal:443 -tls1_3
# Show certificate expiry
openssl s_client -connect api.internal:443 2>/dev/null | \
openssl x509 -noout -dates
# Test mTLS — present a client certificate
openssl s_client -connect api.internal:443 \
-cert /etc/ssl/client.crt \
-key /etc/ssl/client.key \
-CAfile /home/step/.step/certs/root_ca.crt
Certificate inspection
# Inspect a certificate file
openssl x509 -in server.crt -noout -text
# Show just subject and SAN
openssl x509 -in server.crt -noout -subject -ext subjectAltName
# Show expiry
openssl x509 -in server.crt -noout -dates
# Verify a certificate against a CA
openssl verify -CAfile /home/step/.step/certs/root_ca.crt server.crt
# Verify the full chain
openssl verify -CAfile root_ca.crt -untrusted intermediate_ca.crt leaf.crt
# step certificate inspect (more readable output)
step certificate inspect server.crt
curl against internal CA
# If you haven't bootstrapped trust, provide the CA cert explicitly
curl --cacert /home/step/.step/certs/root_ca.crt https://api.internal/health
# Test with client certificate (mTLS)
curl --cacert /home/step/.step/certs/root_ca.crt \
--cert /etc/ssl/client.crt \
--key /etc/ssl/client.key \
https://api.internal/health
# Verbose — shows handshake details, certificate chain, cipher suite
curl -v --cacert /home/step/.step/certs/root_ca.crt https://api.internal/health 2>&1 | \
grep -E "(TLS|SSL|certificate|subject|issuer|expire|Verify)"
Common failure modes
| Error message | Root cause | Fix |
|---|---|---|
certificate has expired |
Not After timestamp is in the past | Renew the certificate. Fix the renewal automation so it doesn't happen again. |
certificate is not yet valid |
Clock skew — system clock is behind the cert's Not Before | Sync NTP: chronyc makestep. Ensure all nodes run chronyd. |
certificate signed by unknown authority |
CA root not in trust store | Run step ca bootstrap on the client. Or pass --cacert to curl. |
x509: certificate is valid for api.internal, not db.internal |
Wrong SAN — connecting to a name not in the certificate | Reissue with the correct SAN, or add the name to the existing SAN list. |
tls: certificate required |
Server requires client certificate (mTLS), client presented none | Issue a client certificate with EKU=clientAuth, configure the client to present it. |
remote error: tls: bad certificate |
Server rejected the client's certificate — wrong CA or missing clientAuth EKU | Ensure client cert is signed by the CA the server trusts. Check EKU includes clientAuth. |
handshake failure |
No cipher suite in common, or TLS version mismatch | Check ssl_protocols config. Ensure both sides support TLS 1.2 or 1.3. |
step-ca diagnostics
# Check CA health
step ca health --ca-url https://ca.internal:9000
# List provisioners
step ca provisioner list --ca-url https://ca.internal:9000
# View CA logs
journalctl -u step-ca -f
# List issued certificates (requires admin provisioner)
step ca admin list --ca-url https://ca.internal:9000
# Inspect the CA's own certificate
step certificate inspect \
<(curl -sk https://ca.internal:9000/roots) \
--format json | jq '.validity'
Related pages
- WireGuard Masterclass — Curve25519 transport encryption for host tunnels
- Cilium Masterclass — eBPF-accelerated mTLS for Kubernetes pods
- nftables Masterclass — firewall rules that restrict CA port access
- ZFS Encryption — protecting CA key material at rest
- Kubernetes on KVM — the cluster that cert-manager runs on
- Monitoring Stack Glossary (355 terms) Help & Links — Prometheus alerting on certificate expiry
- Security — kldload security posture overview