Chapter 2: Search Engine Architecture
March 2, 2026 · View on GitHub
Welcome to Chapter 2: Search Engine Architecture. In this part of Perplexica Tutorial: AI-Powered Search Engine, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Understanding Perplexica's architecture is key to customizing and extending the search engine. This chapter explores every layer of the system -- from the moment a user types a query to the final synthesized answer with citations. By the end, you will know how each component communicates, where data is stored, and how to extend the architecture for your own use cases.
High-Level System Overview
Perplexica follows a layered architecture that cleanly separates concerns across the frontend, backend, search infrastructure, and AI layer. The diagram below shows the full request lifecycle.
flowchart TD
subgraph Client["Client Layer"]
A[Next.js Frontend]
B[WebSocket Connection]
end
subgraph Backend["Backend Layer"]
C[Express / API Server]
D[Route Handlers]
E[WebSocket Server]
end
subgraph Search["Search Infrastructure"]
F[SearXNG Meta-Search]
G[Tavily API]
H[Custom Search Providers]
end
subgraph AI["AI Layer"]
I[LangChain Orchestration]
J[Chat Model Providers]
K[Embedding Models]
end
subgraph Storage["Data Layer"]
L[(SQLite Database)]
M[Chat History Store]
N[Config Store]
end
A --> B
B --> E
A --> D
D --> C
C --> I
I --> F
I --> G
I --> H
I --> J
I --> K
C --> L
L --> M
L --> N
classDef client fill:#e1f5fe,stroke:#01579b
classDef backend fill:#f3e5f5,stroke:#4a148c
classDef search fill:#fff3e0,stroke:#ef6c00
classDef ai fill:#e8f5e8,stroke:#1b5e20
classDef storage fill:#fce4ec,stroke:#b71c1c
class A,B client
class C,D,E backend
class F,G,H search
class I,J,K ai
class L,M,N storage
Core Architecture Components
1. Query Processing Engine
The query processing engine is the entry point for every search. It receives the user's raw input and transforms it into a structured representation that downstream components can work with.
// src/lib/queryProcessor.ts
interface ProcessedQuery {
originalQuery: string;
reformulatedQuery: string;
searchQueries: string[];
focusMode: FocusMode;
conversationContext: Message[];
}
type FocusMode =
| 'webSearch'
| 'academicSearch'
| 'youtubeSearch'
| 'redditSearch'
| 'writingAssistant'
| 'wolframAlpha';
async function processQuery(
input: string,
history: Message[],
focusMode: FocusMode,
): Promise<ProcessedQuery> {
// Step 1: Analyze conversation context
const context = extractContext(history);
// Step 2: Reformulate query using LLM
const reformulated = await reformulateWithLLM(input, context);
// Step 3: Generate multiple search sub-queries
const searchQueries = await generateSearchQueries(reformulated);
return {
originalQuery: input,
reformulatedQuery: reformulated,
searchQueries,
focusMode,
conversationContext: history,
};
}
Key responsibilities:
| Responsibility | Description |
|---|---|
| Intent Recognition | Determines what kind of answer the user expects (factual, comparative, tutorial, etc.) |
| Query Expansion | Adds synonyms and related terms to broaden recall |
| Context Injection | Merges conversation history so follow-up questions resolve correctly |
| Focus Mode Routing | Sends the query to the correct search agent based on the selected mode |
2. Agent and Chain Architecture
Perplexica uses LangChain agents to orchestrate multi-step reasoning. Each focus mode maps to a dedicated agent with its own prompt template and tool set.
flowchart LR
Q[User Query] --> R{Focus Mode Router}
R -->|Web| WA[Web Search Agent]
R -->|Academic| AA[Academic Search Agent]
R -->|YouTube| YA[YouTube Search Agent]
R -->|Reddit| RA[Reddit Search Agent]
R -->|Writing| WR[Writing Assistant Agent]
R -->|Wolfram| WO[Wolfram Alpha Agent]
WA --> SX[SearXNG]
AA --> SX
YA --> SX
RA --> SX
WR --> LLM[Chat Model]
WO --> WAP[Wolfram API]
SX --> RE[Result Enrichment]
WAP --> RE
RE --> LLM
LLM --> RESP[Final Response]
classDef router fill:#fff3e0,stroke:#ef6c00
classDef agent fill:#e1f5fe,stroke:#01579b
classDef tool fill:#e8f5e8,stroke:#1b5e20
classDef output fill:#f3e5f5,stroke:#4a148c
class R router
class WA,AA,YA,RA,WR,WO agent
class SX,WAP,LLM tool
class RE,RESP output
// src/agents/webSearchAgent.ts
import { RunnableSequence } from '@langchain/core/runnables';
import { ChatPromptTemplate } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';
const webSearchChain = RunnableSequence.from([
// Step 1: Reformulate query for search engines
ChatPromptTemplate.fromMessages([
['system', `You are a search query reformulator. Given a user question
and conversation history, produce 3 optimized search engine queries.
Return them as a JSON array of strings.`],
['human', '{query}'],
]),
chatModel,
new StringOutputParser(),
// Step 2: Execute searches in parallel
async (queries: string) => {
const parsed = JSON.parse(queries);
const results = await Promise.all(
parsed.map((q: string) => searchSearXNG(q)),
);
return results.flat();
},
// Step 3: Synthesize answer with citations
async (results: SearchResult[]) => {
return synthesizeAnswer(results, chatModel);
},
]);
3. Search Aggregation System
Perplexica does not rely on a single search engine. It uses SearXNG as a meta-search engine to aggregate results from dozens of upstream providers, then optionally enriches them with Tavily's AI-optimized search API.
// src/lib/searxng.ts
interface SearXNGConfig {
apiUrl: string;
engines: string[]; // google, bing, duckduckgo, brave, etc.
categories: string[]; // general, science, it, news, etc.
language: string;
timeRange: string; // day, week, month, year
safesearch: number; // 0 = off, 1 = moderate, 2 = strict
}
async function searchSearXNG(
query: string,
config: SearXNGConfig,
): Promise<SearchResult[]> {
const params = new URLSearchParams({
q: query,
format: 'json',
engines: config.engines.join(','),
categories: config.categories.join(','),
language: config.language,
time_range: config.timeRange,
safesearch: String(config.safesearch),
});
const response = await fetch(`${config.apiUrl}/search?${params}`);
const data = await response.json();
return data.results.map((r: any) => ({
title: r.title,
url: r.url,
content: r.content,
engine: r.engine,
score: r.score,
publishedDate: r.publishedDate,
}));
}
Search provider comparison:
| Provider | Strengths | Latency | Cost |
|---|---|---|---|
| SearXNG (self-hosted) | Privacy, no rate limits, aggregation | 200-500ms | Free (self-hosted) |
| Tavily API | AI-optimized results, high relevance | 300-800ms | Pay-per-query |
| Google Custom Search | Largest index, freshest results | 100-300ms | 100 free/day, then paid |
| Bing Web Search | Strong for news and images | 150-400ms | 1000 free/month |
4. WebSocket Communication Layer
Real-time streaming of AI-generated answers is handled over WebSocket connections. This allows the frontend to display tokens as they are generated, giving users immediate feedback.
// src/websocket/connectionManager.ts
import { WebSocketServer, WebSocket } from 'ws';
interface WSMessage {
type: 'query' | 'sources' | 'answer_chunk' | 'answer_end' | 'error';
data: any;
messageId: string;
}
function initWebSocketServer(server: http.Server): WebSocketServer {
const wss = new WebSocketServer({ server });
wss.on('connection', (ws: WebSocket) => {
ws.on('message', async (raw: string) => {
const message: WSMessage = JSON.parse(raw);
if (message.type === 'query') {
// Send sources first
const sources = await fetchSources(message.data);
ws.send(JSON.stringify({
type: 'sources',
data: sources,
messageId: message.messageId,
}));
// Stream answer tokens
const stream = await generateStreamingAnswer(
message.data,
sources,
);
for await (const chunk of stream) {
ws.send(JSON.stringify({
type: 'answer_chunk',
data: chunk,
messageId: message.messageId,
}));
}
ws.send(JSON.stringify({
type: 'answer_end',
data: null,
messageId: message.messageId,
}));
}
});
});
return wss;
}
5. Data Storage Layer
Perplexica uses SQLite with Drizzle ORM for local persistence. Conversations, messages, and configuration are all stored locally, which is ideal for self-hosted privacy.
// src/db/schema.ts
import { sqliteTable, text, integer } from 'drizzle-orm/sqlite-core';
export const chats = sqliteTable('chats', {
id: text('id').primaryKey(),
title: text('title').notNull(),
createdAt: text('created_at').notNull(),
focusMode: text('focus_mode').notNull(),
});
export const messages = sqliteTable('messages', {
id: text('id').primaryKey(),
chatId: text('chat_id').references(() => chats.id),
role: text('role').notNull(), // 'user' | 'assistant'
content: text('content').notNull(),
sources: text('sources'), // JSON-serialized source array
createdAt: text('created_at').notNull(),
});
Data Flow Architecture
The complete lifecycle of a search request follows five phases.
sequenceDiagram
participant U as User
participant FE as Next.js Frontend
participant WS as WebSocket Server
participant AG as Search Agent
participant SX as SearXNG
participant LLM as Chat Model
participant DB as SQLite
U->>FE: Enter search query
FE->>WS: Send query via WebSocket
WS->>AG: Route to focus-mode agent
AG->>LLM: Reformulate query
LLM-->>AG: Optimized sub-queries
AG->>SX: Execute parallel searches
SX-->>AG: Raw search results
AG-->>WS: Send sources to client
WS-->>FE: Display source cards
AG->>LLM: Synthesize answer with context
LLM-->>AG: Streamed answer tokens
AG-->>WS: Forward token stream
WS-->>FE: Render tokens in real-time
AG->>DB: Persist chat + messages
FE->>U: Display complete answer
Phase Breakdown
| Phase | Components | Description |
|---|---|---|
| 1. Input | Frontend, WebSocket | User query captured and sent via WebSocket |
| 2. Reformulation | Agent, LLM | Raw query transformed into optimized search queries |
| 3. Retrieval | SearXNG, Tavily | Parallel searches executed against multiple providers |
| 4. Synthesis | LLM, Agent | Results combined into a coherent, cited answer |
| 5. Delivery | WebSocket, Frontend | Answer streamed token-by-token to the UI |
Directory Structure
Understanding the file layout helps when extending Perplexica.
Perplexica/
├── src/
│ ├── agents/ # Focus-mode agent definitions
│ │ ├── webSearchAgent.ts
│ │ ├── academicSearchAgent.ts
│ │ ├── youtubeSearchAgent.ts
│ │ ├── redditSearchAgent.ts
│ │ ├── writingAssistant.ts
│ │ └── wolframAlphaSearchAgent.ts
│ ├── chains/ # LangChain runnables
│ ├── lib/ # Core utilities
│ │ ├── providers/ # LLM + embedding provider configs
│ │ ├── searxng.ts # SearXNG integration
│ │ └── linkDocument.ts # Document fetching + parsing
│ ├── routes/ # Express API routes
│ ├── websocket/ # WebSocket server logic
│ ├── db/ # Drizzle ORM schemas + migrations
│ └── config.ts # Configuration loading
├── ui/ # Next.js frontend
│ ├── app/ # App router pages
│ ├── components/ # React components
│ └── lib/ # Client-side utilities
├── docker-compose.yaml
├── Dockerfile
├── config.toml # User configuration file
└── drizzle.config.ts
Configuration System
Perplexica uses a TOML-based configuration file (config.toml) that is read at startup and can be updated at runtime through the settings UI.
# config.toml
[GENERAL]
PORT = 3001
SIMILARITY_MEASURE = "cosine" # cosine | dot | euclidean
[API_KEYS]
OPENAI = "sk-..."
GROQ = "gsk_..."
ANTHROPIC = "sk-ant-..."
[API_ENDPOINTS]
SEARXNG = "http://localhost:32768"
OLLAMA = "http://localhost:11434"
// src/config.ts
import fs from 'fs';
import toml from '@iarna/toml';
interface Config {
GENERAL: {
PORT: number;
SIMILARITY_MEASURE: 'cosine' | 'dot' | 'euclidean';
};
API_KEYS: Record<string, string>;
API_ENDPOINTS: Record<string, string>;
}
export function loadConfig(): Config {
const raw = fs.readFileSync('config.toml', 'utf-8');
return toml.parse(raw) as unknown as Config;
}
Scalability Considerations
Horizontal Scaling
For production deployments, each layer can be scaled independently.
flowchart LR
subgraph LB["Load Balancer"]
NX[Nginx / Caddy]
end
subgraph App["Application Tier (N instances)"]
A1[Perplexica Instance 1]
A2[Perplexica Instance 2]
A3[Perplexica Instance N]
end
subgraph Search["Search Tier"]
S1[SearXNG Instance 1]
S2[SearXNG Instance 2]
end
subgraph DB["Data Tier"]
D1[(Primary SQLite / PostgreSQL)]
end
NX --> A1
NX --> A2
NX --> A3
A1 --> S1
A2 --> S2
A3 --> S1
A1 --> D1
A2 --> D1
A3 --> D1
classDef lb fill:#fff3e0,stroke:#ef6c00
classDef app fill:#e1f5fe,stroke:#01579b
classDef search fill:#e8f5e8,stroke:#1b5e20
classDef db fill:#fce4ec,stroke:#b71c1c
class NX lb
class A1,A2,A3 app
class S1,S2 search
class D1 db
Performance Optimization
| Strategy | Impact | Complexity |
|---|---|---|
| Query result caching (in-memory or Redis) | Reduces duplicate search calls by 40-60% | Low |
| LLM response caching with semantic keys | Saves API costs on repeated queries | Medium |
| SearXNG result pre-fetching | Lowers perceived latency for trending topics | Medium |
| WebSocket connection pooling | Handles more concurrent users per instance | Low |
| Database WAL mode for SQLite | Enables concurrent reads during writes | Low |
Summary
Perplexica's architecture is designed around the principle of separation of concerns: the frontend handles presentation and streaming, the backend orchestrates agents and routing, the search layer retrieves raw data, and the AI layer synthesizes answers. Every component communicates through well-defined interfaces (WebSocket messages, HTTP endpoints, LangChain runnables), making the system modular and extensible.
Key Takeaways
- Agent-per-mode architecture -- each focus mode (web, academic, YouTube, Reddit, etc.) has its own agent with tailored prompts and tools.
- SearXNG as the search backbone -- a self-hosted meta-search engine eliminates dependency on any single commercial search API.
- WebSocket streaming -- answers are streamed token-by-token for a responsive user experience.
- TOML configuration -- all settings live in a single, human-readable file that can be updated at runtime.
- SQLite for local persistence -- conversations and settings are stored locally, preserving privacy.
Next Steps
Now that you understand how the pieces fit together, the next chapter dives deep into the AI layer -- how Perplexica connects to LLM providers, constructs prompts, and generates coherent answers from raw search results.
Ready for AI integration? Continue to Chapter 3: AI Integration
Built with insights from the Perplexica project.
What Problem Does This Solve?
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for classDef, fill, stroke so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 2: Search Engine Architecture as an operating subsystem inside Perplexica Tutorial: AI-Powered Search Engine, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around config, Search, search as your checklist when adapting these patterns to your own repository.
How it Works Under the Hood
Under the hood, Chapter 2: Search Engine Architecture usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
classDef. - Input normalization: shape incoming data so
fillreceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
stroke. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Source Walkthrough
Use the following upstream sources to verify implementation details while reading this chapter:
- View Repo
Why it matters: authoritative reference on
View Repo(github.com).
Suggested trace strategy:
- search upstream code for
classDefandfillto map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production