gekro
GitHub LinkedIn

The Linux Edge: Why AI Breathes Better in the Kernel

8 min read
Intermediate February 22, 2026

TL;DR

Why I stopped fighting Windows registry errors and moved my entire AI engineering lab to WSL2 and Ubuntu Server.
Windows is a great OS for humans, but it's a claustrophobic environment for agents. I moved Gekro to a WSL2/Ubuntu core because AI libraries expect a POSIX-compliant heart. This post breaks down the CUDA-passthrough setup and why Docker on Linux is the only way to manage a growing fleet of autonomous services.

If you’re building an AI lab on a consumer operating system, you’re fighting a war on two fronts: your code and your OS. In the AI era, Linux isn’t just an option; it’s the native tongue of intelligence. I spent six months trying to get complex Python dependencies to play nice with Windows paths and DLLs before I finally accepted the truth: if the model weights were trained on Linux clusters, the inference should happen on Linux kernels. The moment I committed to a headless Ubuntu workflow in DFW, my debugging time dropped by 80%.

The Architecture

The “Edge” in my lab refers to the direct, unmediated access to hardware. Windows adds a layer of “Desktop Window Manager” (DWM) noise between your code and your GPU. Linux allows for Hardware Passthrough that feels like bare metal.

FeatureWindows (Native)WSL2 (Ubuntu 22.04+)Ubuntu Server (Bare Metal)
GPU AccessDirectX / CUDA (Heavy)CUDA Passthrough (Near-Native)Direct CUDA (Fastest)
I/O PerformanceFast (NTFS)Fast (inside VHDX)Extreme (ext4/zfs)
Docker EngineHyper-V VM (Slow)WSL2 Backend (Efficient)Native Cgroups (Instant)
StabilityAutomatic Updates (Risk)Managed by User99.9% Uptime

In my lab, I use WSL2 as the “Operator Console” and Ubuntu Server on my Raspberry Pi cluster for the “Production Edge.”

The Build

Setting up a high-velocity AI environment requires more than just apt install. You need to bridge the Windows hardware with the Linux logic.

1. Enabling CUDA in WSL2

The “Secret Sauce” of a Windows-based lab is the NVIDIA WSL driver. It allows the Linux kernel to “see” your PC’s GPU without a full virtual machine overhead.

# Verify GPU visibility inside WSL2
nvidia-smi

# If you see your GPU here, you're ready for local LLMs
# If not, you need to install the 'NVIDIA Game Ready' or 'Studio' driver on the Windows host.

2. The Docker-AI Stack

I never install pip packages on the host. I use Docker containers for every model. This prevents “Dependency Hell” where one agent needs Python 3.10 and another needs 3.12.

docker-compose.yml for a Local Gekro Node:

services:
  ollama:
    image: ollama/ollama
    volumes:
      - ./ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  summarizer-agent:
    build: ./agents/summarizer
    environment:
      - OLLAMA_HOST=http://ollama:11434
    depends_on:
      - ollama

3. WSL2 Note: Memory Management

By default, WSL2 will try to eat all your Windows RAM. I limit it to 16GB so my Tesla telemetry scripts in Windows don’t crash. Create a .wslconfig in your User folder:

[wsl2]
memory=16GB
processors=8

The Tradeoffs

Let’s talk about the pain: Symlinks and File Performance. If you store your code on the Windows C: drive but try to run it inside WSL2, it will be 10x slower because of the 9P protocol translation. I lost a week of productivity wondering why my vector database was crawling before I realized I had to move the entire project folder into the Linux file system (/home/rohit/gkro).

Also, VRAM is a finite resource. If I have Chrome open with 50 tabs in Windows, it’s stealing VRAM from my Llama 3 instance in WSL2. You have to learn to be a ruthless scavenger of memory in a hybrid environment.

Where This Goes

I’m moving toward a Single-Pane-of-Glass management layer—a custom dashboard in Astro that monitors the CPU/GPU temperature and RAM usage of all my Linux nodes (Pis and PCs) in real-time. The goal is to treat my home network like a mini-AWS region, where the OS is just a detail and the agents are the primary citizens.

The Lab Newsletter

Get notified when I build something worth sharing.

No filler. No frequency commitment. Just the experiments, tools, and breakdowns I'd want to read myself.

Add PUBLIC_BEEHIIV_PUBLICATION_ID to .env.local to enable the real Beehiiv embed.

Discussion

Comments

Powered by GitHub Discussions — requires a GitHub account.