README.md

April 14, 2026 · View on GitHub

llama-panel

A native macOS desktop app for managing and interacting with llama-server instances. Built with Tauri.

llama-panel demo

Features

Download models from HuggingFace -- live search, automatic split-file handling, download progress with ETA
Run multiple servers simultaneously -- each model gets its own random port, manage them all from one place
Auto-detects llama-server -- finds the binary on your PATH or common install locations
Downloads to the standard HuggingFace cache (~/.cache/huggingface/hub/) -- shared with huggingface-cli, LM Studio, and other tools
Open llama-server's built-in web UI in your browser with one click
Configure server options -- context size, GPU layers, flash attention, parallel slots, and more
Tune parameters with interactive sliders and presets (Creative, Balanced, Precise, Deterministic)
Playground for completions and chat with performance metrics
Live server log -- see model loading progress, layer offloading, and errors in real time
Slot monitor with real-time polling

Model Management

Search HuggingFace for GGUF models, download them, and start serving with a few clicks. Split models (multi-file GGUFs) are detected and downloaded as a bundle automatically.

Model download and serve

Live search -- type to search HuggingFace for GGUF models, see download counts and likes
Smart file picker -- shows available quantizations with file sizes, groups split models into bundles
Download progress -- real-time progress bar with ETA, visible from any tab
Popular model suggestions -- quick-pick chips for Gemma, Qwen, Llama, Mistral, Phi, and more
Multiple servers -- run several models at once, each on its own port. Connect, open in browser, or stop individually

Server Configuration

Configure llama-server options from the Server tab. Settings apply when starting any model.

Server configuration

Context & Memory -- context size, GPU layers, batch size, flash attention
Slots & Parallelism -- parallel slots, slot monitoring, continuous batching
Endpoints & API -- expose properties, enable metrics, listen host
Server log -- live stderr output from the running server process
Running servers list -- see all active servers with stop/open/connect controls

Integrations

OpenCode support – OpenCode can connect directly to your llama-server instance via the integration panel, enabling seamless model management and inference from within the OpenCode IDE.

Integration panel

Install

Homebrew (recommended)

brew tap AlexsJones/llama-panel
brew install llama-panel

This installs the .app bundle to /Applications and a llama-panel command on your PATH.

Download from GitHub Releases

Grab the latest .tar.gz from Releases, extract it, and drag llama-panel.app to /Applications:

tar -xzf llama-panel-v*.tar.gz
mv llama-panel.app /Applications/

From source

Requires Rust and the Tauri CLI:

cargo install tauri-cli
cargo tauri build

The .app bundle will be in target/release/bundle/macos/.

Usage

Launch from Spotlight, the Applications folder, or the command line:

llama-panel

Development

# Install Tauri CLI
cargo install tauri-cli

# Run in dev mode (hot-reload for the UI)
cargo tauri dev

The frontend is vanilla HTML/CSS/JS in ui/ -- no build step required.

License

MIT