llmserve

March 22, 2026 · View on GitHub

llmserve icon

CI Crates.io License

Any model. Any backend. One TUI to serve them all.

If you're like me, you've got dozens of GGUF and MLX models scattered across LM Studio, HuggingFace cache, and random directories — and you want to quickly spin one up with whichever inference engine happens to be installed. llmserve is the front door for that. It finds your models, finds your backends, and gets out of the way.

It auto-detects locally installed inference engines (llama-server, KoboldCpp, LocalAI, MLX, and more), discovers model files across multiple locations, and lets you launch servers with live log output — all from a single interactive TUI. No config files to write, no CLI flags to remember.

Sister project: Use llmfit to figure out which models fit your hardware, then use llmserve to actually run them.

demo


Install

Quick install (macOS / Linux)

curl -fsSL https://llmserve.axjns.dev/install.sh | sh

Homebrew

brew tap AlexsJones/llmserve
brew install llmserve

Cargo

cargo install llmserve

From source

cargo install --path .

Usage

llmserve

The TUI has three panels:

PanelPositionToggleDescription
SourcesLeft1File tree of model locations with counts and serving indicators
ModelsCenterAlways onSearchable, sortable model table with serve status
Serve/LogsRight3Running server cards + live backend output logs

Focus cycles between visible panels with Tab. Resize the focused panel with Shift+Left/Shift+Right.

Keybindings

KeyAction
TabCycle focus: Sources -> Models -> Logs
1 / 3Toggle sources / logs panel
j/kNavigate (works in focused panel)
g/GJump to top / bottom
Ctrl-d/Ctrl-uHalf page down / up
Shift+Left/RightResize focused panel
EnterModels: open serve dialog / Sources: filter by source
SpaceSources: expand/collapse node
aAdd model directory (with tab-completion)
xRemove custom directory (sources panel)
/Search / filter models by name
bPick default backend
fCycle format filter (All / GGUF / MLX)
oCycle sort (Name / Size / Source)
sStop a server
SStop all servers
wToggle log word wrap
CClear dead server logs
rRefresh models and backends
tCycle theme
qQuit

Serve dialog

When you press Enter on a model, a confirmation dialog opens:

KeyAction
h/l or Left/RightCycle through backends (shows availability + already-serving status)
p or TabEdit port number
Enter/yLaunch server
Esc/nCancel

The dialog shows the resolved preset for the selected backend (context size, flash attention, batch size, GPU layers, extra args).


Features

  • Auto-detects inference backends — llama-server, KoboldCpp, LocalAI, MLX (Apple Silicon), Ollama, vLLM, LM Studio
  • Source tree — collapsible file tree showing all model locations with model counts and green dots for serving models
  • Add directories live — press a, type a path with tab-completion, and the directory is scanned immediately and persisted to config
  • Filter by source — click a source in the tree to show only its models
  • Per-backend presets — context size, batch size, GPU layers, threads, and extra CLI args per backend
  • Serve multiple models — run different models on different backends simultaneously, each on its own auto-assigned port
  • Live log output — stdout/stderr from inference backends streams into the logs panel in real-time with color-coded error/warning highlighting
  • Crash diagnostics — when a server exits, its logs are preserved so you can see exactly what went wrong
  • Word wrap — press w to wrap long log lines in the logs panel
  • Resizable panels — Shift+arrows to grow/shrink any focused panel
  • Toggleable panels1 hides/shows sources, 3 hides/shows logs
  • 7 themes — Default, Dracula, Solarized, Nord, Monokai, Gruvbox, Catppuccin Mocha
  • Vision model support — auto-detects mmproj projector files and passes --mmproj to llama-server

Configuration

Config lives at ~/.config/llmserve/config.toml. Created automatically on first run.

# Extra directories to scan for model files
extra_model_dirs = [
    "/path/to/more/models",
]

# Global defaults
preferred_port = 8080
preferred_host = "0.0.0.0"
default_ctx_size = 8192
flash_attn = true

# Preferred backend on startup (auto-detected if not set)
# default_backend = "llama-server"

# theme = "Dracula"

Backend presets

Each backend has its own preset that overrides global defaults. Missing fields fall back to the global value.

[presets.llama-server]
ctx_size = 8192
flash_attn = true
batch_size = 2048
gpu_layers = -1          # -1 = all layers to GPU
threads = 8
extra_args = ["--mlock", "--cont-batching"]

[presets.koboldcpp]
ctx_size = 8192
gpu_layers = -1
port = 5001

[presets.localai]
ctx_size = 8192
port = 8080

[presets.mlx]
ctx_size = 4096
port = 8081
FieldTypeDescription
ctx_sizeintegerContext window size
hoststringBind address
portintegerBind port
flash_attnbooleanEnable flash attention (llama-server)
batch_sizeintegerBatch size for prompt processing
gpu_layersintegerGPU layers to offload (-1 = all)
threadsintegerCPU threads for inference
extra_argsstring[]Extra CLI arguments passed to the backend

Backend detection

llmserve detects 7 backends at startup. Backends that can serve local model files are marked with a checkmark:

BackendLocal GGUFLocal MLXDetectionEnv override
llama-serverYeswhich llama-server
KoboldCppYesbinary + API :5001KOBOLDCPP_HOST
LocalAIYesbinary + API :8080 + DockerLOCALAI_HOST
MLXYespython3 -c "import mlx_lm" (macOS)
OllamaGET :11434/api/tagsOLLAMA_HOST
vLLMbinary + API :8000VLLM_HOST
LM StudioGET :1234/v1/modelsLMSTUDIO_HOST

Backends that can't serve local files (Ollama, vLLM, LM Studio) are detected but show a clear reason in the serve dialog. They use their own model registries or manage their own servers.

Model discovery

SourceDefault path
LM Studio~/.lmstudio/models/
llama.cpp~/.cache/llm-models/
HuggingFace/MLX~/.cache/huggingface/hub/ (mlx-community repos)
OllamaVia API
Customextra_model_dirs in config

Development

make build       # Debug build
make test        # Unit + integration tests (CI-safe)
make test-local  # All tests including local model serve rotation
make clippy      # Lint
make fmt         # Format
make install     # Install to ~/.cargo/bin

Project structure

src/
  main.rs       — Terminal init/restore, main loop
  lib.rs        — Module exports for integration tests
  app.rs        — App state, input modes, navigation, filtering, serve lifecycle
  backends.rs   — Backend detection (7 backends: llama-server, KoboldCpp, LocalAI, MLX, Ollama, vLLM, LM Studio)
  config.rs     — Config + per-backend presets, load/save TOML
  events.rs     — Crossterm event handling, vim-style keybindings
  models.rs     — Model discovery from disk + APIs
  server.rs     — Server launch/monitor/stop, non-blocking log capture
  theme.rs      — 7 color themes
  ui.rs         — Ratatui rendering (3-panel layout, popups, log viewer)
tests/
  serve_integration.rs — Integration tests (serve, verify HTTP, rotate backends)

Companion to llmfit

llmserve is designed as a companion to llmfit:

  • llmfit answers: "Which models fit my hardware?" — scores models across quality, speed, fit, and context
  • llmserve answers: "Let me serve this one right now" — picks a model, picks a backend, launches it

Both share the same TUI patterns (vim keys, ratatui, crossterm) and theme system.

License

MIT