Chapter 2: Architecture and Runtime Components
April 13, 2026 · View on GitHub
Welcome to Chapter 2: Architecture and Runtime Components. In this part of Tabby Tutorial: Self-Hosted AI Coding Assistant Architecture and Operations, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Tabby is more than a single completion endpoint. It is a layered runtime that combines server services, context processing, and editor-facing agent bridges.
Learning Goals
- map major runtime components and boundaries
- understand request flow from editor to model backend
- identify where to place custom integrations and controls
Core Component Map
| Component | Responsibility |
|---|---|
| Tabby server | serves completion/chat APIs and admin web UI |
tabby-agent | LSP bridge between editors and Tabby APIs |
| editor extension | user interaction layer for completion/chat |
| model backends | completion/chat/embedding inference providers |
| indexing subsystem | builds repository and document context |
Runtime Flow
sequenceDiagram
participant Dev as Developer
participant Ext as Editor Extension
participant Agent as tabby-agent
participant Srv as Tabby Server
participant Model as Model Backend
Dev->>Ext: trigger completion/chat
Ext->>Agent: send LSP request
Agent->>Srv: call Tabby API
Srv->>Model: inference request with context
Model-->>Srv: generated output
Srv-->>Agent: completion/chat payload
Agent-->>Ext: LSP response
Ext-->>Dev: inline result
Repository Structure Orientation
| Path | Why You Care |
|---|---|
clients/ | extension and agent-side integration patterns |
crates/ | core server/runtime internals implemented in Rust |
website/docs/ | operational and configuration guidance |
ee/ | enterprise-oriented modules and integrations |
Design Implications
- API and LSP boundaries let teams update editor adapters independently.
- Model provider abstraction enables mixed local and remote deployment strategy.
- Context indexing is a first-class system, not an optional add-on.
Source References
Summary
You now have a structural map for where behavior lives and how requests move across Tabby.
Next: Chapter 3: Model Serving and Completion Pipeline
Source Code Walkthrough
Use the following upstream sources to verify architecture and runtime component details while reading this chapter:
Cargo.toml— the Rust workspace manifest that lists all crates composing the Tabby runtime:tabby,tabby-common,tabby-inference,tabby-index,tabby-crawler, and the enterpriseee/components.crates/tabby-inference/src/lib.rs— the inference abstraction layer that defines theTextGenerationStreamtrait implemented by each model backend (llama.cpp, GGML, HTTP API).
Suggested trace strategy:
- read
Cargo.tomlto map the crate dependency graph and identify which crates handle serving, inference, and indexing - trace
tabby-inference/src/lib.rsto understand the trait abstraction that decouples the API server from specific model backends - review
crates/tabby-common/for shared data types (completion request/response, configuration structs) used across crates
How These Components Connect
flowchart LR
A[HTTP API request] --> B[tabby crate: request routing]
B --> C[tabby-inference: TextGenerationStream trait]
C --> D[Backend: llama.cpp or GGML or HTTP model]
D --> E[Completion tokens streamed back]