Performance Tuning

February 14, 2026 · View on GitHub

Optimize Probe for large codebases and demanding workloads.


TL;DR

# Optimized search
PROBE_PARSER_POOL_SIZE=8 probe search "query" ./ \
  --language rust \
  --max-results 50 \
  --session my-search

# Skip warmup for scripts
PROBE_NO_PARSER_WARMUP=1 probe search "query" ./

# Debug performance
DEBUG=1 probe search "query" ./

Performance Architecture

Probe uses multiple optimization techniques:

┌─────────────────────────────────────────────────────┐
│                    Search Pipeline                   │
├─────────────────────────────────────────────────────┤
│  1. File Scanning (ripgrep)         │ Parallel     │
│  2. File Filtering (.gitignore)     │ O(1) lookup  │
│  3. Content Matching (SIMD)         │ Vectorized   │
│  4. AST Parsing (tree-sitter)       │ Cached       │
│  5. Block Extraction                │ Streamed     │
│  6. Ranking (BM25/SIMD)             │ Vectorized   │
│  7. Result Assembly                 │ Lazy         │
└─────────────────────────────────────────────────────┘

Configuration Options

Parser Pool Size

Control parallel parsing threads:

# Default: number of CPU cores
PROBE_PARSER_POOL_SIZE=8 probe search "query" ./

# For I/O bound workloads (network drives, SSDs)
PROBE_PARSER_POOL_SIZE=16 probe search "query" ./

# For CPU bound workloads (complex AST patterns)
PROBE_PARSER_POOL_SIZE=4 probe search "query" ./

Recommendation:

  • SSD: 2x CPU cores
  • HDD: 1x CPU cores
  • Network: 4x CPU cores (hide latency)

Tree Cache Size

Cache parsed ASTs for repeated queries:

# Increase cache for large codebases
PROBE_TREE_CACHE_SIZE=1000 probe search "query" ./

# Disable caching (low memory environments)
PROBE_TREE_CACHE_SIZE=0 probe search "query" ./

Default: Automatic based on available memory

Parser Warmup

Skip warmup for faster cold starts:

# Skip warmup (faster first query, slower subsequent)
PROBE_NO_PARSER_WARMUP=1 probe search "query" ./

# Default behavior: warm up parsers on first use
probe search "query" ./

Use cases:

  • CI/CD pipelines (single query)
  • Script automation
  • Quick one-off searches

SIMD Optimization

Probe uses SIMD (Single Instruction, Multiple Data) for:

  1. Tokenization: memchr, aho-corasick
  2. Ranking: simsimd for dot products
  3. Pattern Matching: Vectorized string operations

Disabling SIMD

For debugging or compatibility:

# Disable all SIMD
DISABLE_SIMD_TOKENIZATION=1 \
DISABLE_SIMD_RANKING=1 \
DISABLE_SIMD_PATTERN_MATCHING=1 \
probe search "query" ./

# Disable specific features
DISABLE_SIMD_RANKING=1 probe search "query" ./

Note: Disabling SIMD reduces performance by 2-10x.


Query Optimization

Limit Results Early

# Stop after 20 results (fastest)
probe search "function" ./ --max-results 20

# Limit by token count (for AI)
probe search "function" ./ --max-tokens 8000

# Limit by bytes
probe search "function" ./ --max-bytes 50000

Use Language Filters

# Search only Rust files (skip parsing others)
probe search "struct" ./ --language rust

# Use search hints
probe search "interface AND lang:typescript" ./

# Filter by extension
probe search "class AND ext:py" ./

Use Sessions for Pagination

# First batch
probe search "api" ./ --session api-search --max-results 100

# Subsequent batches (deduplicated)
probe search "api" ./ --session api-search --max-results 100

Sessions cache:

  • Parsed file list
  • MD5 hashes for invalidation
  • Previously returned blocks

Skip Block Merging

For maximum speed, disable block merging:

probe search "function" ./ --no-merge

Memory Optimization

Large Codebases

# Limit results to prevent memory issues
probe search "query" ./ --max-results 100

# Use pagination
probe search "query" ./ --session my-search --max-results 50

# Process files sequentially
PROBE_PARSER_POOL_SIZE=1 probe search "query" ./

Monitoring Memory

# On Linux
/usr/bin/time -v probe search "query" ./

# On macOS
/usr/bin/time -l probe search "query" ./

Benchmarking

Built-in Benchmarks

# Run all benchmarks
probe benchmark

# Run specific benchmark
probe benchmark --bench search

# Quick benchmarks only
probe benchmark --fast

# Save results
probe benchmark --format json --output results.json

# Compare with baseline
probe benchmark --compare --baseline previous.json

Manual Timing

# Verbose output with timing
DEBUG=1 probe search "query" ./

# Time multiple runs
for i in {1..5}; do
  time probe search "query" ./ --max-results 10 2>&1 | tail -1
done

Profiling

# With cargo flamegraph (requires installation)
cargo flamegraph -- search "query" ./path

# With perf (Linux)
perf record probe search "query" ./
perf report

Best Practices

1. Use Appropriate Reranker

RerankerSpeedQualityUse Case
bm25FastGoodDefault, most queries
tfidfFastGoodExact term matching
hybridMediumBetterBalanced
hybrid2MediumBetterFile-aware ranking
ms-marco-*SlowBestWhen quality matters
# Default (fast)
probe search "auth" ./

# High quality (slower)
probe search "auth" ./ --reranker ms-marco-tinybert

2. Directory Scope

# Bad: search everything
probe search "function" ./

# Good: search specific directories
probe search "function" ./src

# Better: multiple specific paths
probe search "function" ./src/core ./src/api

3. Ignore Patterns

# Exclude large directories
probe search "config" ./ \
  --ignore "node_modules/*" \
  --ignore "vendor/*" \
  --ignore "*.min.js"

4. Output Format

FormatSpeedUse Case
outlineFastestInteractive use
plainFastPiping to tools
jsonMediumMachine processing
colorMediumTerminal display
markdownSlowDocumentation
# Fast output for scripts
probe search "query" ./ --format plain

# Machine readable
probe search "query" ./ --format json

Performance Comparison

Typical performance on a medium codebase (100k lines):

OperationCold StartWarm
Simple search200ms50ms
With AST parsing500ms100ms
Full ranking800ms200ms
BERT reranking2000ms1500ms

vs Other Tools

Tool100k filesMemory
grep2s10MB
ripgrep0.5s50MB
Probe (search)0.8s100MB
Probe (query)1.5s200MB

Debugging Performance

Enable Debug Output

DEBUG=1 probe search "query" ./

Output shows:

  • File scanning time
  • Parsing time per file type
  • Ranking time
  • Total result count
  • Memory usage

Identify Bottlenecks

[DEBUG] Scan: 150ms (1234 files)
[DEBUG] Filter: 20ms (890 files)
[DEBUG] Parse: 400ms (890 files)  ← Parser pool too small?
[DEBUG] Rank: 50ms (234 blocks)
[DEBUG] Output: 30ms
[DEBUG] Total: 650ms

Common Bottlenecks

BottleneckSymptomSolution
File I/OSlow scan phaseUse SSD, increase pool
ParsingSlow parse phaseIncrease PARSER_POOL_SIZE
MemoryOOM errorsLimit results, use sessions
NetworkVery slow scanCache locally, increase pool