Tutorials / Virtualization

NVIDIA on kldload

NVIDIA GPU support works on every distro kldload supports — CentOS, RHEL, Rocky, Fedora, Debian, Ubuntu, Alpine, and Arch. This page covers driver installation, CUDA toolkit, container GPU sharing, AI inference with Ollama, and the ZFS memory balancing act. The ai profile bakes NVIDIA + Ollama in at install time; this page is for everyone else.

GPUs on Linux are simultaneously the most powerful hardware you can add and the most annoying to set up. The driver situation is fragmented: NVIDIA's proprietary driver, the open-source nvidia-open kernel module (Turing+), and the community nouveau driver all exist for different use cases. DKMS rebuilds break on kernel updates. Secure Boot requires module signing. Every distro has a different installation path. This page gives you the exact commands for each distro, the common failure modes, and the container GPU sharing trick that makes one GPU serve your entire stack — transcoding, AI inference, and monitoring simultaneously.

The easy path: Select the ai profile during kldload install. It installs NVIDIA drivers, Ollama, and the container toolkit automatically on first boot. You boot into a working GPU + AI system. This page covers the manual path for other profiles (desktop, server, kvm) and post-install setup.

During install (CentOS/RHEL only)

Set KLDLOAD_NVIDIA_DRIVERS=1 in the answers file before starting the install. The installer will:

Add the NVIDIA CUDA repository for your RHEL version
Install nvidia-driver, nvidia-driver-libs, and nvidia-driver-cuda

Web UI

Select the NVIDIA option in the hardware section of the web UI before clicking Install.

Unattended install

cat > /tmp/answers.env << ‘EOF’
KLDLOAD_DISTRO=centos
KLDLOAD_DISK=/dev/vda
KLDLOAD_HOSTNAME=gpu-node
KLDLOAD_USERNAME=admin
KLDLOAD_PASSWORD=changeme
KLDLOAD_PROFILE=desktop
KLDLOAD_NVIDIA_DRIVERS=1
EOF

kldload-install-target --config /tmp/answers.env

Post-install: CentOS / RHEL

If you didn’t enable NVIDIA during install, add it afterward. The CUDA repo URL is detected from your OS version automatically:

# Detect RHEL major version
RHEL_VER=$(. /etc/os-release && echo $VERSION_ID | cut -d. -f1)

# Add the CUDA repo
dnf install -y \
  https://developer.download.nvidia.com/compute/cuda/repos/rhel${RHEL_VER}/x86_64/cuda-repo-rhel${RHEL_VER}-12.9.0-1.x86_64.rpm

# Modern GPUs (Turing / RTX 20 series and newer) — prefer open module
dnf install -y nvidia-open nvidia-driver-libs nvidia-driver-cuda

# Legacy GPUs (pre-Turing) — proprietary driver
# dnf install -y nvidia-driver nvidia-driver-libs nvidia-driver-cuda

# Reboot to load the kernel module
reboot

Post-install: Debian

Debian installs need the non-free repo and the nvidia-driver package. The correct repo codename is detected automatically:

# Detect Debian codename (bookworm, trixie, etc.)
DEBIAN_CODENAME=$(. /etc/os-release && echo $VERSION_CODENAME)

# Add non-free to sources
cat > /etc/apt/sources.list.d/nvidia.list << EOF
deb http://deb.debian.org/debian ${DEBIAN_CODENAME} main contrib non-free non-free-firmware
EOF

apt update
apt install -y nvidia-driver firmware-misc-nonfree

reboot

This requires internet access — the NVIDIA driver is not included in the offline darksite.

Post-install: Ubuntu

Ubuntu is the simplest platform — the ubuntu-drivers tool detects and installs the correct driver automatically:

# Install the driver detection tool
apt install -y ubuntu-drivers-common

# Auto-install the recommended driver for your GPU
ubuntu-drivers autoinstall

reboot

To install a specific version instead:

# List available drivers
ubuntu-drivers devices

# Install a specific version
apt install -y nvidia-driver-570

reboot

Post-install: Proxmox

Proxmox uses a custom kernel (pve-kernel) which requires matching pve-headers — not standard linux-headers. This is the most common failure point.

# Install the correct headers for the running Proxmox kernel
apt install -y pve-headers-$(uname -r) build-essential dkms

# Download and install the NVIDIA driver with DKMS support
NVIDIA_VERSION=570.144
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/${NVIDIA_VERSION}/NVIDIA-Linux-x86_64-${NVIDIA_VERSION}.run

chmod +x NVIDIA-Linux-x86_64-${NVIDIA_VERSION}.run
./NVIDIA-Linux-x86_64-${NVIDIA_VERSION}.run --dkms

reboot

After reboot, verify the driver loaded and /dev/nvidia* devices exist:

nvidia-smi
ls -al /dev/nvidia*

With drivers installed on the Proxmox host, every LXC container can share the GPU directly via CUDA time-slicing — no PCIe passthrough required. See GPU sharing below.

Verify

nvidia-smi

Expected output shows your GPU model, driver version, CUDA version, temperature, and memory usage.

CUDA toolkit

For GPU computing (machine learning, rendering, etc.), install the full CUDA toolkit after the driver is working.

CentOS / RHEL

dnf install -y cuda-toolkit

Debian

# Detect Debian version for the correct repo
DEBIAN_VERSION_ID=$(. /etc/os-release && echo $VERSION_ID)

curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/debian${DEBIAN_VERSION_ID}/x86_64/cuda-keyring_1.1-1_all.deb \
  -o /tmp/cuda-keyring.deb
dpkg -i /tmp/cuda-keyring.deb
apt update
apt install -y cuda-toolkit

Ubuntu

# Ubuntu uses the same NVIDIA CUDA repo
UBUNTU_VERSION=$(. /etc/os-release && echo $VERSION_ID | tr -d ‘.’)

curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb \
  -o /tmp/cuda-keyring.deb
dpkg -i /tmp/cuda-keyring.deb
apt update
apt install -y cuda-toolkit

Verify CUDA

nvcc --version

# Compile and run a sample
cat > /tmp/hello.cu << ‘EOF’
#include <stdio.h>
__global__ void hello() { printf("Hello from GPU thread %d\n", threadIdx.x); }
int main() { hello<<<1, 8>>>(); cudaDeviceSynchronize(); }
EOF

nvcc /tmp/hello.cu -o /tmp/hello_cuda && /tmp/hello_cuda

This is the part nobody tells you about. With NVIDIA drivers installed on the host and containers running on top, every container can share the same GPU simultaneously. No passthrough. No SR-IOV. The kernel handles time-slicing natively.

This changes the economics of GPU usage completely. Traditional thinking: one GPU per VM via PCIe passthrough. That means a $1500 GPU serves one workload. With containers on bare metal (Docker or Podman), that same GPU serves every container simultaneously — Jellyfin transcoding, Ollama running inference, a monitoring container scraping GPU metrics, and a training job, all at the same time. The CUDA driver time-slices across streaming multiprocessors automatically. No vGPU license (that's an enterprise upsell for the same kernel feature). No configuration. Just --gpus all and every container sees the GPU. On kldload with ZFS, each container's data lives on its own dataset — snapshotable, compressed, replicated. The GPU is shared; the storage is isolated.

What this means in practice

You can run Jellyfin transcoding a 4K stream, an AI inference container running Ollama, and a monitoring container scraping GPU metrics — all at the same time, on one GPU, on one machine.

With PCIe passthrough, one VM locks the GPU. Nobody else can touch it. With containers on bare metal, every container gets a share. That’s the difference.

Install the NVIDIA Container Toolkit

# Add the NVIDIA container toolkit repo
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg

curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed ‘s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#’ \
  > /etc/apt/sources.list.d/nvidia-container-toolkit.list

apt update && apt install -y nvidia-container-toolkit

# Configure Docker to use the NVIDIA runtime
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker

Run GPU-accelerated containers (Docker)

# Jellyfin with GPU transcoding
docker run -d --name jellyfin --gpus all \
  -p 8096:8096 \
  -v /srv/media:/media \
  -v /srv/jellyfin/config:/config \
  jellyfin/jellyfin

# Ollama for local AI inference
docker run -d --name ollama --gpus all \
  -p 11434:11434 \
  -v /srv/ollama:/root/.ollama \
  ollama/ollama

# Pull a model and run it
docker exec ollama ollama pull llama3
docker exec ollama ollama run llama3 "Explain ZFS in one sentence"

Run GPU-accelerated containers (Podman)

# Podman uses CDI (Container Device Interface) for GPU access
# Generate the CDI spec
nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

# Podman uses --device instead of --gpus
podman run -d --name ollama \
  --device nvidia.com/gpu=all \
  -p 11434:11434 \
  -v /srv/ollama:/root/.ollama \
  docker.io/ollama/ollama

podman exec ollama ollama pull llama3
podman exec ollama ollama run llama3 "What is ZFS?"

Verify GPU sharing

# Watch GPU utilization across all containers
watch -n 1 nvidia-smi

# You’ll see multiple processes sharing the GPU:
#   jellyfin  — video transcode
#   ollama    — model inference
#   Each gets a slice of GPU time automatically

Why this works

NVIDIA’s CUDA driver handles time-slicing at the kernel level. When multiple processes request GPU compute, the driver schedules them across available SMs (streaming multiprocessors). No configuration needed — it just works.

On bare metal with containers, the GPU sharing is native — CUDA handles time-slicing automatically. No additional configuration or licensing required.

Nouveau vs NVIDIA

kldload’s CentOS kernel ships with the open-source nouveau driver loaded by default. Installing the proprietary NVIDIA driver blacklists nouveau automatically. If you need to revert:

# CentOS / RHEL
dnf remove -y ‘nvidia-driver*’
rm -f /etc/modprobe.d/nvidia.conf
dracut --force -k $(uname -r)
reboot

# Debian / Ubuntu / Proxmox
apt purge -y ‘nvidia-driver*’ ‘nvidia-open*’
rm -f /etc/modprobe.d/nvidia.conf /etc/modprobe.d/blacklist-nouveau.conf
update-initramfs -u -k $(uname -r)
reboot

ZFS and NVIDIA memory

This is the memory balancing act that every GPU + ZFS system needs to get right. ZFS's ARC wants all your RAM for caching. NVIDIA's driver pins GPU management memory in system RAM (separate from VRAM). Your AI model's inference context lives in system RAM too. If ARC takes 80% of a 32GB system, the GPU driver and your model fight over the remaining 6.4GB. The fix: cap ARC explicitly. The rule of thumb below works for most configurations, but monitor with arc_summary and nvidia-smi after setup.

Both ZFS ARC and NVIDIA drivers use large amounts of memory. On systems with GPUs, cap ZFS ARC to leave room:

# Check current ARC max
cat /proc/spl/kstat/zfs/arcstats | grep c_max

# Limit ARC to 4GB (persistent across reboots)
echo "options zfs zfs_arc_max=4294967296" > /etc/modprobe.d/zfs-arc.conf

# Apply on RHEL/CentOS
dracut --force -k $(uname -r)

# Apply on Debian / Ubuntu / Proxmox
update-initramfs -u -k $(uname -r)

A reasonable rule of thumb: total RAM minus GPU VRAM minus 2GB for the OS, then give half of what remains to ARC.

Secure Boot

The proprietary NVIDIA kernel module is not signed for Secure Boot. If Secure Boot is enabled, you need to either:

Option 1 — Sign the module with your MOK key (kldload sets up MOK infrastructure during install):

# Find the MOK key kldload created
ls /var/lib/kldload/mok/

# Sign the NVIDIA module
/usr/src/kernels/$(uname -r)/scripts/sign-file sha256 \
  /var/lib/kldload/mok/MOK.priv \
  /var/lib/kldload/mok/MOK.der \
  $(modinfo -n nvidia)

reboot

Option 2 — Disable Secure Boot in UEFI firmware settings.

The most common NVIDIA + ZFS failure mode: a kernel update triggers DKMS to rebuild both the ZFS module and the NVIDIA module. One of them fails (usually NVIDIA — its DKMS build is fragile). You reboot and either ZFS can't import the pool (no boot) or NVIDIA doesn't load (no GPU). kldload's boot environment snapshots protect you from this: the pre-update snapshot has both modules working. kbe activate <pre-update-snapshot> and reboot. You're back to the last working state while you figure out the DKMS failure. This is why ZFS on root matters for GPU systems — it's the safety net for the DKMS house of cards.

Troubleshooting

# Check if the module loaded
lsmod | grep nvidia

# If not, check for errors
dmesg | grep -i nvidia

# Kernel updated but DKMS didn’t rebuild the module
dkms status
dkms autoinstall -k $(uname -r)

# Check current desktop session type
echo $XDG_SESSION_TYPE

# Proxmox: verify /dev/nvidia* devices exist for LXC sharing
ls -al /dev/nvidia*

← Architecture overview Architecture →