Khoj AI: Deep Dive Tutorial

June 22, 2026 · View on GitHub

Project: Khoj — An open-source, self-hostable AI personal assistant that connects to your notes, documents, and online data.

Stars License: AGPL v3 Python

Why This Track Matters

Khoj AI is increasingly relevant for developers working with modern AI/ML infrastructure. Project: Khoj — An open-source, self-hostable AI personal assistant that connects to your notes, documents, and online data, and this track helps you understand the architecture, key patterns, and production considerations.

This track focuses on:

  • understanding getting started
  • understanding architecture overview
  • understanding data connectors
  • understanding search & retrieval

What Is Khoj?

Khoj is an open-source AI personal assistant that transforms your scattered notes, documents, and online data into a searchable, conversational knowledge base. It combines semantic search with LLM-backed chat to help you retrieve, synthesize, and act on your personal information. Khoj can be self-hosted for full data privacy or used via the hosted service.

FeatureDescription
Semantic SearchSymmetric and asymmetric search with cross-encoder re-ranking over personal data
Multi-SourceOrg-mode, Markdown, PDF, GitHub repos, Notion pages, web content
LLM ChatConversational AI with context from your knowledge base (OpenAI, Anthropic, Ollama)
AutomationScheduled tasks, autonomous agents, and tool-use for proactive assistance
Self-HostableFull Docker deployment with PostgreSQL, data stays on your infrastructure
Client IntegrationsObsidian plugin, Emacs package, web UI, WhatsApp, and API access

Current Snapshot (auto-updated)

  • repository: khoj-ai/khoj
  • stars: about 35.2k
  • GitHub release reference: 2.0.0-beta.28 (checked 2026-06-22; release metadata on GitHub)

Mental Model

graph TB
    subgraph Clients["Client Layer"]
        WEB[Web UI]
        OBS[Obsidian Plugin]
        EMACS[Emacs Package]
        WHATSAPP[WhatsApp]
        API_CLIENT[REST API]
    end

    subgraph Server["Khoj Server (Django)"]
        API[API Routes]
        SEARCH[Search Engine]
        CHAT[Chat Engine]
        INDEX[Indexer Pipeline]
        AUTO[Automation Engine]
    end

    subgraph Data["Data Connectors"]
        ORG[Org-mode]
        MD[Markdown]
        PDF[PDF / Docx]
        GH[GitHub]
        NOTION[Notion]
        WEB_SRC[Web Content]
    end

    subgraph Storage["Storage Layer"]
        PG[(PostgreSQL)]
        EMBED[(Embeddings)]
        FILES[(File Store)]
    end

    subgraph LLM["LLM Providers"]
        OPENAI[OpenAI]
        ANTHROPIC[Anthropic]
        OLLAMA[Ollama / Local]
    end

    Clients --> Server
    Data --> INDEX
    INDEX --> Storage
    SEARCH --> EMBED
    CHAT --> LLM
    AUTO --> CHAT
    SERVER --> Storage

Chapter Guide

ChapterTopicWhat You'll Learn
1. Getting StartedSetupInstallation, self-hosting, connecting first data sources
2. Architecture OverviewDesignSystem components, search indexing, LLM integration pipeline
3. Data ConnectorsIngestionOrg-mode, Markdown, PDF, GitHub, Notion connector internals
4. Search & RetrievalSearchSymmetric/asymmetric search, embeddings, cross-encoder ranking
5. Chat InterfaceConversationContext management, conversation threads, tool use, citations
6. Automation & AgentsAgentsScheduled tasks, autonomous actions, tool integration
7. Customization & PluginsExtensibilityCustom data types, model configuration, extensions
8. Production DeploymentOperationsDocker, scaling, security, monitoring, backup strategies

Tech Stack

ComponentTechnology
BackendPython, Django
DatabasePostgreSQL
SearchSentence Transformers embeddings, cross-encoder re-ranking
LLM ProvidersOpenAI, Anthropic, Google, Ollama (local)
FrontendNext.js (web UI)
ClientsObsidian plugin, Emacs package, WhatsApp bot
DeploymentDocker Compose, systemd
Task QueueDjango Q / APScheduler for automation

Ready to begin? Start with Chapter 1: Getting Started.


Built with insights from the Khoj repository and community documentation.

What You Will Learn

  • Core architecture and key abstractions
  • Practical patterns for production use
  • Integration and extensibility approaches

Full Chapter Map

  1. Chapter 1: Getting Started
  2. Chapter 2: Architecture Overview
  3. Chapter 3: Data Connectors
  4. Chapter 4: Search & Retrieval
  5. Chapter 5: Chat Interface
  6. Chapter 6: Automation & Agents
  7. Chapter 7: Customization & Plugins
  8. Chapter 8: Production Deployment

Source References

Generated by AI Codebase Knowledge Builder