| pick your distro, get ZFS on root
kldload — your platform, your way, free
Source
← Back to Overview

VDI Desktop — your desktop, streamed anywhere.

Run a full Linux desktop on a server and access it from any browser, any device, anywhere. Three streaming protocols — pick the one that fits your latency and compatibility needs. Add an NVIDIA GPU and you get hardware-accelerated encoding. The entire stack is open source.

What this replaces: Commercial VDI solutions that require per-seat licensing, dedicated connection brokers, and proprietary clients. This uses native Linux display capture, open streaming protocols, and a browser.

Architecture

How it works

A headless Wayland compositor (mutter --headless) renders a virtual desktop at any resolution. wf-recorder captures the framebuffer and encodes it to H.264 (CPU or NVIDIA NVENC). The encoded stream is published to mediamtx via SRT, which re-publishes it as HLS, WebRTC, or SRT — your choice.

nginx sits in front as a reverse proxy, serving HLS segments and proxying WebRTC connections. Users connect with a browser. No client software needed.

Server captures desktop → encodes video → mediamtx distributes → browser plays it. That’s it.

HLS

Port 8888
HTTP-based
2–5s latency
Works everywhere

WebRTC

Port 8889
UDP, peer-to-peer
<200ms latency
Browser-native

SRT

Port 8890
UDP, reliable
<500ms latency
Professional broadcast

Step 1: Install with VDI profile

Select the VDI profile in the web UI, or use an answers file:

Unattended install

# VDI server with NVIDIA GPU encoding
cat > /tmp/answers.env << 'EOF'
KLDLOAD_DISTRO=debian
KLDLOAD_DISK=/dev/sda
KLDLOAD_HOSTNAME=vdi-server
KLDLOAD_USERNAME=admin
KLDLOAD_PASSWORD=changeme
KLDLOAD_PROFILE=vdi
KLDLOAD_NVIDIA_DRIVERS=1
EOF

kldload-install-target --config /tmp/answers.env
The VDI profile installs: mutter, wf-recorder, ffmpeg, pipewire, nginx, and all Wayland dependencies. mediamtx is fetched from GitHub on first boot.

Step 2: First boot (automatic)

kldload-firstboot detects the VDI profile and configures everything:

✓ mediamtx installed

Latest stable from GitHub. Configured for SRT ingest on :8890, HLS on :8888, WebRTC on :8889.

✓ nginx reverse proxy

HLS segments served with CORS headers. WebRTC proxied with WebSocket upgrade. Health endpoint at /health.

✓ Session launcher

kldload-vdi-session script starts a headless Wayland desktop and streams it via SRT to mediamtx.

✓ Systemd services

mediamtx and nginx enabled at boot. Sessions managed individually.

Step 3: Launch a desktop session

Start a session

# Launch session 1 (each session gets its own virtual desktop)
kldload-vdi-session 1 &

# Launch session 2 for a second user
kldload-vdi-session 2 &

# Each session is an independent Wayland desktop streaming to mediamtx

Connect from a browser

# HLS (works on any device, any browser)
http://vdi-server/hls/session1

# WebRTC (lowest latency, Chrome/Firefox/Edge)
http://vdi-server/webrtc/session1

# SRT (professional, use VLC or OBS)
srt://vdi-server:8890?streamid=read:session1
No client to install. No plugin. Open a browser and you have a desktop.

GPU-accelerated encoding

With NVIDIA GPU

If NVIDIA drivers are installed, wf-recorder uses NVENC automatically. The GPU encodes video while the CPU stays free for user applications. One GPU can encode 10+ sessions simultaneously.

# Manual session with NVENC (this is what kldload-vdi-session does internally)
mutter --wayland --headless --virtual-monitor 1920x1080 &
sleep 2
wf-recorder --audio --codec h264_nvenc \
  --file "srt://127.0.0.1:8890?streamid=publish:session1&pkt_size=1316"

# Check GPU utilization
nvidia-smi
# You'll see the NVENC encoder process using the GPU's video engine

Without GPU (CPU encoding)

No GPU? No problem. libx264 with ultrafast preset handles 1080p on any modern CPU. Quality is slightly lower and CPU usage is higher, but it works.

wf-recorder --audio --codec libx264 \
  --params "preset=ultrafast,tune=zerolatency" \
  --file "srt://127.0.0.1:8890?streamid=publish:session1&pkt_size=1316"

Scaling — multiple users, one server

Session management

# Launch sessions for 10 users
for i in $(seq 1 10); do
  kldload-vdi-session "$i" &
  echo "Session $i started — http://vdi-server/webrtc/session${i}"
done

# List active sessions
ps aux | grep kldload-vdi-session

# Kill a specific session
kill $(pgrep -f "kldload-vdi-session 3")

Resource planning

CPU encoding ~2 cores per 1080p session NVENC encoding ~10–20 sessions per GPU (NVENC has dedicated silicon) RAM ~500MB per session (Wayland compositor + apps) Network ~3–8 Mbps per session at 1080p
A single server with 32GB RAM and one NVIDIA GPU can serve 15+ concurrent desktop sessions.

ZFS integration

Per-user datasets

# Each VDI user gets their own ZFS dataset
# (adduser.local hook creates this automatically on user creation)
zfs list -r rpool/home
# NAME                  USED  AVAIL  REFER  MOUNTPOINT
# rpool/home            1.2G  60G    96K    /home
# rpool/home/alice      400M  60G    400M   /home/alice
# rpool/home/bob        350M  60G    350M   /home/bob

# Set per-user quotas
zfs set quota=10G rpool/home/alice
zfs set quota=10G rpool/home/bob

# Snapshot all user data before maintenance
ksnap /home

# User broke their desktop? Roll back their home only
ksnap rollback /home/alice
Each user’s home directory is an independent ZFS dataset. Snapshot one, rollback one, quota one — without affecting anyone else.

Input forwarding

Keyboard and mouse

WebRTC handles input natively — keyboard and mouse events travel over the same WebRTC data channel as the video. For HLS/SRT (video-only protocols), you need a separate input channel:

# evemu-tools for input injection (installed by VDI profile)
# xdotool for keyboard/mouse simulation
# xclip for clipboard sync

# The kldload-webui provides a thin JavaScript layer that captures
# keyboard/mouse events and sends them to the server via WebSocket.
# The server injects them into the Wayland session via evemu.
WebRTC is the best experience — video and input in one connection. HLS/SRT give you view-only unless you add the input layer.

Remote access over WireGuard

Secure VDI over the internet

# On the VDI server: WireGuard is already installed (VDI profile)
cat > /etc/wireguard/wg0.conf << 'WG'
[Interface]
Address = 10.99.0.1/24
ListenPort = 51820
PrivateKey = $(wg genkey)

[Peer]
PublicKey = CLIENT_PUBKEY
AllowedIPs = 10.99.0.2/32
WG

wg-quick up wg0

# Client connects via WireGuard, then opens browser to:
# http://10.99.0.1/webrtc/session1
#
# All traffic is encrypted. No VPN client beyond WireGuard.
# No port forwarding. No exposure to the public internet.
WireGuard encrypts the tunnel. WebRTC encrypts the stream. Two layers of encryption, zero configuration complexity.

Audio, microphone & clipboard

Video streams over WebRTC/SRT. Everything else — audio output, microphone input, clipboard sync, USB forwarding — rides the WireGuard back plane. Encrypted, low latency, always on.

PipeWire audio over WireGuard

# Server side: PipeWire is already installed (VDI profile)
# It captures audio from the Wayland session natively

# wf-recorder captures audio alongside video when --audio is set
wf-recorder --audio --codec h264_nvenc \
  --file "srt://127.0.0.1:8890?streamid=publish:session1&pkt_size=1316"

# For WebRTC: audio is included in the WebRTC stream automatically
# Nothing to configure — PipeWire → wf-recorder → mediamtx → browser

Microphone forwarding (client → server)

# Client sends mic audio over WireGuard to a PulseAudio/PipeWire network sink
# On the VDI server: create a network source
pactl load-module module-native-protocol-tcp auth-ip-acl=10.99.0.0/24

# On the client: forward mic to the server over WireGuard
PULSE_SERVER=tcp:10.99.0.1 parecord --format=s16le | \
  ssh 10.99.0.1 "pacat --playback --format=s16le"

# Or use PipeWire's native network streaming (simpler)
# Client and server discover each other via the WireGuard subnet
Video goes out via WebRTC. Mic comes back via WireGuard. Two paths, both encrypted, both low latency.

Clipboard sync

# xclip is installed by the VDI profile
# Clipboard data travels over the WebRTC data channel (WebRTC mode)
# or via a small WebSocket service over WireGuard (HLS/SRT mode)

# Simple clipboard relay over WireGuard:
# Server watches clipboard, sends changes to client
while true; do
  NEW=$(wl-paste 2>/dev/null)
  if [[ "$NEW" != "$LAST" ]]; then
    echo "$NEW" | socat - TCP:10.99.0.2:9999
    LAST="$NEW"
  fi
  sleep 0.5
done

The WireGuard back plane

All non-video traffic rides the WireGuard tunnel between client and server:

Video WebRTC / SRT / HLS (direct or via nginx) Audio out Embedded in video stream (wf-recorder --audio) Mic in PipeWire network stream over WireGuard Clipboard WebSocket or data channel over WireGuard Keyboard/mouse WebRTC data channel or evemu over WireGuard USB devices usbredir over WireGuard (optional) File transfer SFTP/SCP over WireGuard
WireGuard is the nervous system. Video is the only thing that takes a different path — because it needs the bandwidth.

Streaming protocols — when to use which

Protocol Latency Transport Input Best for
WebRTC <200ms UDP Native Interactive desktop use
SRT <500ms UDP Separate Reliable streaming over bad networks
HLS 2–5s HTTP Separate View-only, maximum compatibility

Encoding: H.264 vs H.265

H.264 (AVC)

Universal support. Every browser, every device. Lower compression efficiency but faster encoding. Use this for VDI — latency matters more than file size.

H.265 (HEVC)

50% better compression at the same quality. But browser support is incomplete (no Firefox on Linux). Better for recording/archiving than live streaming.

For VDI: always H.264. It’s fast, it’s everywhere, and the quality difference at VDI bitrates is invisible.

Troubleshooting

# Check mediamtx is running
systemctl status mediamtx

# Check active streams
curl -s http://localhost:9997/v3/paths/list | jq .

# Check nginx proxy
curl -s http://localhost/health

# Check if wf-recorder is capturing
ps aux | grep wf-recorder

# Check NVENC availability
ffmpeg -encoders 2>/dev/null | grep nvenc

# Test with VLC (SRT direct)
# vlc srt://vdi-server:8890?streamid=read:session1