Chapter 2: Architecture and Runtime Components

April 13, 2026 · View on GitHub

Welcome to Chapter 2: Architecture and Runtime Components. In this part of Tabby Tutorial: Self-Hosted AI Coding Assistant Architecture and Operations, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.

Tabby is more than a single completion endpoint. It is a layered runtime that combines server services, context processing, and editor-facing agent bridges.

Learning Goals

  • map major runtime components and boundaries
  • understand request flow from editor to model backend
  • identify where to place custom integrations and controls

Core Component Map

ComponentResponsibility
Tabby serverserves completion/chat APIs and admin web UI
tabby-agentLSP bridge between editors and Tabby APIs
editor extensionuser interaction layer for completion/chat
model backendscompletion/chat/embedding inference providers
indexing subsystembuilds repository and document context

Runtime Flow

sequenceDiagram
    participant Dev as Developer
    participant Ext as Editor Extension
    participant Agent as tabby-agent
    participant Srv as Tabby Server
    participant Model as Model Backend

    Dev->>Ext: trigger completion/chat
    Ext->>Agent: send LSP request
    Agent->>Srv: call Tabby API
    Srv->>Model: inference request with context
    Model-->>Srv: generated output
    Srv-->>Agent: completion/chat payload
    Agent-->>Ext: LSP response
    Ext-->>Dev: inline result

Repository Structure Orientation

PathWhy You Care
clients/extension and agent-side integration patterns
crates/core server/runtime internals implemented in Rust
website/docs/operational and configuration guidance
ee/enterprise-oriented modules and integrations

Design Implications

  • API and LSP boundaries let teams update editor adapters independently.
  • Model provider abstraction enables mixed local and remote deployment strategy.
  • Context indexing is a first-class system, not an optional add-on.

Source References

Summary

You now have a structural map for where behavior lives and how requests move across Tabby.

Next: Chapter 3: Model Serving and Completion Pipeline

Source Code Walkthrough

Use the following upstream sources to verify architecture and runtime component details while reading this chapter:

  • Cargo.toml — the Rust workspace manifest that lists all crates composing the Tabby runtime: tabby, tabby-common, tabby-inference, tabby-index, tabby-crawler, and the enterprise ee/ components.
  • crates/tabby-inference/src/lib.rs — the inference abstraction layer that defines the TextGenerationStream trait implemented by each model backend (llama.cpp, GGML, HTTP API).

Suggested trace strategy:

  • read Cargo.toml to map the crate dependency graph and identify which crates handle serving, inference, and indexing
  • trace tabby-inference/src/lib.rs to understand the trait abstraction that decouples the API server from specific model backends
  • review crates/tabby-common/ for shared data types (completion request/response, configuration structs) used across crates

How These Components Connect

flowchart LR
    A[HTTP API request] --> B[tabby crate: request routing]
    B --> C[tabby-inference: TextGenerationStream trait]
    C --> D[Backend: llama.cpp or GGML or HTTP model]
    D --> E[Completion tokens streamed back]