AI Runner
June 6, 2026 ยท View on GitHub
Edge AI inference engine with a web GUI โ LLMs, image generation, voice chat, and agents running entirely on your hardware, at the edge.
๐ Report Bug ยท โจ Request Feature ยท ๐ก๏ธ Report Vulnerability ยท ๐ Wiki
What is AI Runner?
AI Runner is a privacy-first edge AI platform โ all inference runs locally on your own hardware, not in the cloud. It runs a Python backend that handles model inference and exposes a REST API, paired with a React web frontend you access in your browser. Your prompts, images, and voice data never leave your machine.
Architecture at a glance:
client/ โ React + Vite frontend (port 5173)
server/src/ โ Python inference backend (port 8080)
โจ Features
| Feature | Description |
|---|---|
| ๐ค LLM Chat | Local LLMs via llama.cpp (GGUF), with optional OpenRouter/OpenAI backends |
| ๐ฃ๏ธ Voice Chat | Real-time speech-to-text and text-to-speech for hands-free conversations |
| ๐จ Image Generation | Stable Diffusion (SD 1.5, SDXL) and FLUX with LoRA and inpainting |
| ๐ง AI Agents | Configurable personalities, moods, RAG-enhanced memory, and tool use |
| ๐ Privacy First | Runs fully offline by default โ no data leaves your machine |
| ๐ Web UI | React frontend, accessible from any browser on your local network |
| โก Optimized | GGUF quantization, attention slicing, and VRAM offloading for lower-end hardware |
โ๏ธ System Requirements
| Minimum | Recommended | |
|---|---|---|
| OS | Ubuntu 22.04 | Ubuntu 24.04 |
| CPU | Ryzen 2700K / i7-8700K | Ryzen 5800X / i7-11700K |
| RAM | 16 GB | 32 GB |
| GPU | NVIDIA RTX 3060 | NVIDIA RTX 4080+ |
| Storage | 22 GB SSD | 100 GB+ SSD |
| Python | 3.13.3+ | 3.13.3+ |
๐ Quick Start
Install
Clone the repo and run the install script:
git clone https://github.com/Capsize-Games/airunner.git
cd airunner
./scripts/install.sh
This installs the Python backend and all frontend dependencies.
Run
./scripts/run_web.sh
Then open your browser at http://localhost:5173.
The backend API is available at http://localhost:8080.
Logs
All server and runtime (art, TTS, STT, LLM) logs go to a single file:
tail -f build/logs/server.log
๐ฆ End-User Bundle (Desktop Application)
For non-developer users, AI Runner provides a self-contained desktop application
via Electron. The bundle includes an embedded Python runtime, all Python
dependencies, CUDA-accelerated llama.cpp and whisper.cpp binaries, and the
compiled React frontend โ all in a single installable package.
No Python, Node.js, CMake, C++ compiler, or CUDA toolkit is required. Only an NVIDIA GPU driver (525+) is needed.
Platforms
| Platform | Installer Format | GPU |
|---|---|---|
| Linux | .AppImage, .deb | NVIDIA (CUDA, Ampere+) |
| Windows | .exe (NSIS) | NVIDIA (CUDA, Ampere+) |
Download
Pre-built installers are attached to each GitHub Release
tagged with a v* version. Look for artifacts named:
airunner-bundle-linux-*.AppImageorairunner-bundle-linux-*.debairunner-bundle-win32-*.exe
How it works
Electron app
โโโ main process (Node.js)
โ โโโ Spawns the embedded Python backend as a child process
โ โโโ Polls GET /health until the backend is ready
โ โโโ Loads the React frontend once the backend is healthy
โ โโโ Kills the backend on app quit
โโโ renderer process
โโโ Loads http://localhost:8080 (served by the Python backend)
electron/resources/
โโโ python/ โ embedded CPython 3.13 + all pip dependencies + CUDA native libs
โโโ web/ โ compiled React frontend (client/dist/)
Building from source (for maintainers)
# Linux
./package/build_bundle.sh
# Windows (PowerShell)
.\package\build_bundle.ps1
Prerequisites on the build host: CUDA toolkit 12.x, CMake โฅ 3.24, Node.js โฅ 20, and a C++ compiler. These are not required on the end user's machine.
๐พ Manual Installation (Advanced)
If you need fine-grained control, the install script supports three modes:
# Developer mode โ installs from source (default for contributors)
./scripts/install.sh
# Distributed mode โ for server/multi-machine deployments
./deployment/install_distributed.sh
# Single-package mode โ installs a prebuilt self-contained bundle
./package/build_bundle.sh
Python dependencies
Python 3.13.3+ is required. We recommend pyenv + venv.
Install PyTorch first:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
Then install the backend package:
pip install -e "server/src/.[core,llm-native,stt-native,art-python,tts-python]"
llama-cpp-python (CUDA build)
CMAKE_ARGS="-DGGML_CUDA=on -DGGML_CUDA_ARCHITECTURES=90" FORCE_CMAKE=1 \
pip install --no-binary=:all: --no-cache-dir "llama-cpp-python==0.3.21"
90targets RTX 4090/5080-class GPUs. Drop-DGGML_CUDA_ARCHITECTURESto auto-detect your GPU.
๐ค Models
Essential TTS/STT models download automatically on first run. LLM and image models must be configured manually.
| Category | Model | Size |
|---|---|---|
| LLM (default) | Ministral-8B-Instruct (GGUF) | ~4 GB |
| Image | Stable Diffusion 1.5 | ~2 GB |
| Image | SDXL 1.0 | ~6 GB |
| Image | FLUX.1 Dev/Schnell (GGUF) | 8โ12 GB |
| TTS | OpenVoice | 654 MB |
| STT | Whisper Tiny | 155 MB |
Place art models in ~/.local/share/airunner/art/models/.
๐ HTTPS
The local server uses HTTPS by default. Certificates are auto-generated at ~/.local/share/airunner/certs/.
For browser-trusted certificates, install mkcert:
sudo apt install libnss3-tools
mkcert -install
airunner-generate-cert
๐งช Testing
# Run the full test suite
airunner-tests
# Run daemon-safe tests directly
pytest server/src/
# With coverage
airunner-test-coverage-report
โ๏ธ Colorado AI Act Notice
Effective February 1, 2026, the Colorado AI Act (SB 24-205) regulates high-risk AI systems. If you use AI Runner to make decisions with legal or significant effects on individuals (employment screening, loan eligibility, housing, etc.), you may be classified as a deployer of a high-risk AI system and subject to compliance obligations.
AI Runner is designed to run fully locally with no external data transmission by default. Optional features that do connect externally: model downloads (HuggingFace/CivitAI), web search (DuckDuckGo), weather prompts (Open-Meteo), and external LLM providers (OpenRouter/OpenAI) if configured. We recommend using a VPN when using these features.
๐ค Contributing
See CONTRIBUTING.md and the Development Wiki.
๐ Documentation
License
MIT License โ see LICENSE for details.
