README.md

April 14, 2026 ยท View on GitHub

llama-panel

llama-panel

A native macOS desktop app for managing and interacting with llama-server instances. Built with Tauri.

llama-panel demo

Features

  • Download models from HuggingFace -- live search, automatic split-file handling, download progress with ETA
  • Run multiple servers simultaneously -- each model gets its own random port, manage them all from one place
  • Auto-detects llama-server -- finds the binary on your PATH or common install locations
  • Downloads to the standard HuggingFace cache (~/.cache/huggingface/hub/) -- shared with huggingface-cli, LM Studio, and other tools
  • Open llama-server's built-in web UI in your browser with one click
  • Configure server options -- context size, GPU layers, flash attention, parallel slots, and more
  • Tune parameters with interactive sliders and presets (Creative, Balanced, Precise, Deterministic)
  • Playground for completions and chat with performance metrics
  • Live server log -- see model loading progress, layer offloading, and errors in real time
  • Slot monitor with real-time polling

Model Management

Search HuggingFace for GGUF models, download them, and start serving with a few clicks. Split models (multi-file GGUFs) are detected and downloaded as a bundle automatically.

Model download and serve

  • Live search -- type to search HuggingFace for GGUF models, see download counts and likes
  • Smart file picker -- shows available quantizations with file sizes, groups split models into bundles
  • Download progress -- real-time progress bar with ETA, visible from any tab
  • Popular model suggestions -- quick-pick chips for Gemma, Qwen, Llama, Mistral, Phi, and more
  • Multiple servers -- run several models at once, each on its own port. Connect, open in browser, or stop individually

Server Configuration

Configure llama-server options from the Server tab. Settings apply when starting any model.

Server configuration

  • Context & Memory -- context size, GPU layers, batch size, flash attention
  • Slots & Parallelism -- parallel slots, slot monitoring, continuous batching
  • Endpoints & API -- expose properties, enable metrics, listen host
  • Server log -- live stderr output from the running server process
  • Running servers list -- see all active servers with stop/open/connect controls

Integrations

  • OpenCode support โ€“ OpenCode can connect directly to your llama-server instance via the integration panel, enabling seamless model management and inference from within the OpenCode IDE.

Integration panel

Install

brew tap AlexsJones/llama-panel
brew install llama-panel

This installs the .app bundle to /Applications and a llama-panel command on your PATH.

Download from GitHub Releases

Grab the latest .tar.gz from Releases, extract it, and drag llama-panel.app to /Applications:

tar -xzf llama-panel-v*.tar.gz
mv llama-panel.app /Applications/

From source

Requires Rust and the Tauri CLI:

cargo install tauri-cli
cargo tauri build

The .app bundle will be in target/release/bundle/macos/.

Usage

Launch from Spotlight, the Applications folder, or the command line:

llama-panel

Development

# Install Tauri CLI
cargo install tauri-cli

# Run in dev mode (hot-reload for the UI)
cargo tauri dev

The frontend is vanilla HTML/CSS/JS in ui/ -- no build step required.

License

MIT