System Architecture

May 20, 2026 · View on GitHub

Overview of Probe's architecture and component relationships.


TL;DR

Probe is a three-layer system:

  1. Rust Core: High-performance search and extraction
  2. Node.js SDK: AI agent orchestration
  3. Interfaces: CLI, Chat, MCP Server

System Overview

┌─────────────────────────────────────────────────────────────┐
│                    USER INTERFACES                          │
├──────────────────┬──────────────────┬──────────────────────┤
│  CLI             │  Chat CLI/Web    │  MCP Server          │
│  (search,        │  (examples/chat) │  (npm/src/mcp)       │
│   extract,       │                  │                      │
│   query)         │                  │                      │
└────────┬─────────┴──────────┬───────┴──────────┬───────────┘
         │                    │                   │
    ┌────▼────────────────────▼───────────────────▼───────────┐
    │           NODE.JS SDK LAYER (@probelabs/probe)          │
    ├─────────────────────────────────────────────────────────┤
    │ • ProbeAgent - AI orchestration                         │
    │ • Tool Definitions - search, query, extract, etc.       │
    │ • Binary Execution - Rust CLI wrapper                   │
    │ • Session Management - History, storage                 │
    │ • Telemetry - OpenTelemetry integration                │
    └────┬────────────────────────────────────────────────────┘

    ┌────▼────────────────────────────────────────────────────┐
    │              RUST CORE (src/)                           │
    ├─────────────────────────────────────────────────────────┤
    │  ┌──────────────┬──────────────┬──────────────┐        │
    │  │ SEARCH       │ EXTRACT      │ LANGUAGE     │        │
    │  │              │              │              │        │
    │  │ • Ripgrep    │ • Tree-sitter│ • Parsers    │        │
    │  │ • BM25/TF-IDF│ • Line/Symbol│ • AST Blocks │        │
    │  │ • Tokenization│             │ • Test Detection │    │
    │  └──────────────┴──────────────┴──────────────┘        │
    └─────────────────────────────────────────────────────────┘

Rust Core

Search Pipeline

SearchOptions

Parse Query (elastic_query.rs)

Tokenize & Normalize

Scan Files (ripgrep)

Parse AST (tree-sitter)

Early Ranking (BM25)

Extract Code Blocks

Final Ranking

Merge Adjacent Blocks

Apply Limits

LimitedSearchResults

Key Modules

ModulePathPurpose
Search Runnersrc/search/search_runner.rsMain orchestration
Ripgrepsrc/search/ripgrep_searcher.rsFast file scanning
Elastic Querysrc/search/elastic_query.rsQuery parsing
Rankingsrc/ranking.rsBM25/TF-IDF scoring
SIMD Rankingsrc/simd_ranking.rsOptimized scoring
Tokenizationsrc/search/tokenization.rsNLP processing
Languagesrc/language/Tree-sitter parsers
Extractsrc/extract/Code extraction

Language Support

All languages implement the LanguageImpl trait:

pub trait LanguageImpl {
    fn get_tree_sitter_language(&self) -> TSLanguage;
    fn is_acceptable_parent(&self, node: &Node) -> bool;
    fn is_test_node(&self, node: &Node, source: &[u8]) -> bool;
    fn get_symbol_signature(&self, node: &Node, source: &[u8]) -> Option<String>;
}

Supported: Rust, JavaScript, TypeScript, Python, Go, C, C++, Java, Ruby, PHP, Swift, Solidity, C#, HTML, Markdown, YAML

Performance Optimizations

OptimizationPurpose
Parser PoolReuse tree-sitter parsers
Tree CacheCache parsed ASTs
SIMD ScoringVector operations
Early RankingSkip non-relevant files
Session CacheAvoid duplicate results
RayonParallel processing

Node.js SDK Layer

ProbeAgent

The intelligent AI agent:

const agent = new ProbeAgent({
  path: './src',
  provider: 'anthropic'
});

const response = await agent.answer('How does auth work?');

Responsibilities:

  • Multi-turn conversation management
  • Tool execution loop (max 30 iterations)
  • JSON/Mermaid validation
  • Token tracking
  • Retry and fallback

Tool System

ToolFunction
searchSemantic code search
queryAST pattern matching
extractCode block extraction
grepRipgrep search
bashShell execution
editFile modification
delegateSub-agent creation

Supporting Infrastructure

ModulePurpose
probeTool.jsRust binary execution
delegate.jsSub-agent orchestration
storage/Session persistence
hooks/Event callbacks
mcp/MCP server
tokenCounter.jsToken tracking
RetryManager.jsAutomatic retries
FallbackManager.jsProvider fallback

User Interfaces

CLI (Rust)

Direct command-line interface:

probe search "query" ./path
probe extract file.rs:42
probe query "pattern" --language rust

Chat (Node.js)

Interactive AI chat:

probe-chat ./project          # CLI mode
probe-chat --web ./project    # Web mode

MCP Server

Model Context Protocol integration:

npx -y @probelabs/probe mcp

Tools exposed: search_code, query_code, extract_code


Data Flow

Search Request

CLI Args → SearchOptions

Rust: perform_probe()

JSON Output → Node.js

ProbeAgent processes result

AI Response

AI Agent Loop

User Message

System Prompt + Context

AI Provider (Anthropic/OpenAI/Google)

Tool Call → Execute → Result

Continue (up to 30 iterations)

Final Response

Integration Points

Rust → Node.js

// Binary execution
const result = await execFile('probe', ['search', query, path]);
const parsed = JSON.parse(result);

Node.js → AI Providers

import { streamText } from 'ai';
import { createAnthropic } from '@ai-sdk/anthropic';

const result = await streamText({
  model: createAnthropic()('claude-sonnet-4-6'),
  system: systemPrompt,
  messages: history,
  tools: toolDefinitions
});

MCP Integration

// STDIO transport
server.connect(new StdioServerTransport());

// Tools registered
server.setRequestHandler(ListToolsRequestSchema, handleListTools);
server.setRequestHandler(CallToolRequestSchema, handleCallTool);

Module Organization

Rust Core

src/
├── main.rs             # CLI entry point
├── cli.rs              # Argument parsing
├── lib.rs              # Public API
├── models.rs           # Data structures
├── ranking.rs          # BM25/TF-IDF
├── simd_ranking.rs     # SIMD scoring
├── query.rs            # AST patterns
├── grep.rs             # Ripgrep wrapper
├── search/             # Search pipeline
│   ├── mod.rs
│   ├── search_runner.rs
│   ├── ripgrep_searcher.rs
│   ├── elastic_query.rs
│   ├── tokenization.rs
│   ├── result_ranking.rs
│   └── ...
├── extract/            # Code extraction
│   ├── mod.rs
│   ├── processor.rs
│   ├── formatter.rs
│   └── ...
├── language/           # Language support
│   ├── mod.rs
│   ├── factory.rs
│   ├── rust.rs
│   ├── javascript.rs
│   └── ...
└── path_resolver/      # Dependency resolution

Node.js SDK

npm/src/
├── index.js            # Public exports
├── agent/
│   ├── ProbeAgent.js   # Main agent class
│   └── tools.js        # Tool instantiation
├── tools/
│   ├── common.js       # Shared schemas
│   ├── vercel.js       # Vercel AI SDK
│   └── langchain.js    # LangChain
├── mcp/
│   └── index.ts        # MCP server
├── hooks/
│   └── index.js        # Hook system
├── storage/
│   └── JsonChatStorage.js
└── ...

Chat Application

examples/chat/
├── index.js            # Entry point
├── webServer.js        # Web server
├── probeChat.js        # Chat wrapper
├── ChatSessionManager.js
├── auth.js             # Authentication
├── index.html          # Web UI
├── storage/
│   └── JsonChatStorage.js
└── implement/          # Code editing

Key Design Patterns

Error Handling

// Rust: Result<T> with anyhow
pub fn search(options: SearchOptions) -> Result<Vec<SearchResult>> {
    fs::read_to_string(path)
        .context("Failed to read file")?
}
// Node.js: try/catch with context
try {
  const result = await search(options);
} catch (error) {
  throw new ProbeError(`Search failed: ${error.message}`);
}

Thread Safety

// Parser pool with Arc
Arc<DashMap<String, Vec<Parser>>>

// Cache with concurrent access
DashMap<PathBuf, CachedTree>

Event System

// Tool execution events
agent.events.on('toolCall', (event) => {
  console.log(`Tool: ${event.name}, Status: ${event.status}`);
});

Security Boundaries

BoundaryProtection
File AccessallowedFolders configuration
Path TraversalCanonicalization
Bash ExecutionAllow/deny patterns
API KeysEnvironment variables
Web AccessBasic authentication

Performance Characteristics

ComponentPerformance
Ripgrep~1GB/s scanning
Tree-sitter~1ms per file parsing
SIMD Ranking4-8x faster scoring
Parser PoolAvoid re-initialization
Session CacheDeduplicated results