README.md
April 14, 2026 ยท View on GitHub
llama-panel
A native macOS desktop app for managing and interacting with llama-server instances. Built with Tauri.
Features
- Download models from HuggingFace -- live search, automatic split-file handling, download progress with ETA
- Run multiple servers simultaneously -- each model gets its own random port, manage them all from one place
- Auto-detects llama-server -- finds the binary on your PATH or common install locations
- Downloads to the standard HuggingFace cache (
~/.cache/huggingface/hub/) -- shared withhuggingface-cli, LM Studio, and other tools - Open llama-server's built-in web UI in your browser with one click
- Configure server options -- context size, GPU layers, flash attention, parallel slots, and more
- Tune parameters with interactive sliders and presets (Creative, Balanced, Precise, Deterministic)
- Playground for completions and chat with performance metrics
- Live server log -- see model loading progress, layer offloading, and errors in real time
- Slot monitor with real-time polling
Model Management
Search HuggingFace for GGUF models, download them, and start serving with a few clicks. Split models (multi-file GGUFs) are detected and downloaded as a bundle automatically.
- Live search -- type to search HuggingFace for GGUF models, see download counts and likes
- Smart file picker -- shows available quantizations with file sizes, groups split models into bundles
- Download progress -- real-time progress bar with ETA, visible from any tab
- Popular model suggestions -- quick-pick chips for Gemma, Qwen, Llama, Mistral, Phi, and more
- Multiple servers -- run several models at once, each on its own port. Connect, open in browser, or stop individually
Server Configuration
Configure llama-server options from the Server tab. Settings apply when starting any model.
- Context & Memory -- context size, GPU layers, batch size, flash attention
- Slots & Parallelism -- parallel slots, slot monitoring, continuous batching
- Endpoints & API -- expose properties, enable metrics, listen host
- Server log -- live stderr output from the running server process
- Running servers list -- see all active servers with stop/open/connect controls
Integrations
- OpenCode support โ OpenCode can connect directly to your llama-server instance via the integration panel, enabling seamless model management and inference from within the OpenCode IDE.
Install
Homebrew (recommended)
brew tap AlexsJones/llama-panel
brew install llama-panel
This installs the .app bundle to /Applications and a llama-panel command on your PATH.
Download from GitHub Releases
Grab the latest .tar.gz from Releases, extract it, and drag llama-panel.app to /Applications:
tar -xzf llama-panel-v*.tar.gz
mv llama-panel.app /Applications/
From source
Requires Rust and the Tauri CLI:
cargo install tauri-cli
cargo tauri build
The .app bundle will be in target/release/bundle/macos/.
Usage
Launch from Spotlight, the Applications folder, or the command line:
llama-panel
Development
# Install Tauri CLI
cargo install tauri-cli
# Run in dev mode (hot-reload for the UI)
cargo tauri dev
The frontend is vanilla HTML/CSS/JS in ui/ -- no build step required.