ADR-001: Ruvector Core Architecture

February 14, 2026 · View on GitHub

Status: Proposed Date: 2026-01-18 Authors: ruv.io, RuVector Team Deciders: Architecture Review Board SDK: Claude-Flow

Note: The storage layer described in this ADR is superseded by ADR-029 (RVF as Canonical Binary Format). All vector persistence now uses the RVF segment model.

Version History

VersionDateAuthorChanges
0.12026-01-18ruv.ioInitial architecture proposal

Context

The Vector Database Challenge

Modern AI applications require vector databases that can:

  1. Store high-dimensional embeddings from LLMs and embedding models
  2. Search with sub-millisecond latency for real-time inference
  3. Scale to billions of vectors while maintaining performance
  4. Deploy anywhere - edge devices, browsers (WASM), cloud servers
  5. Integrate seamlessly with LLM inference pipelines

Current State of Vector Databases

Existing solutions fall into several categories:

CategoryExamplesLimitations
Cloud-onlyPineconeNo edge deployment, vendor lock-in
Heavy nativeMilvus, QdrantComplex deployment, high memory
Python-firstChromaDB, FAISSPerformance overhead, no WASM
Learning-capableNoneNo existing solutions learn from usage

The Ruvector Vision

Ruvector is designed as a high-performance, learning-capable vector database implemented in Rust that:

  • Achieves 61us p50 latency for k=10 search on 384-dim vectors
  • Provides 2-32x memory compression through tiered quantization
  • Runs anywhere - native (x86_64, ARM64), WASM (browser, edge), PostgreSQL extension
  • Learns from usage via GNN layers that improve search quality over time
  • Integrates with AI agent memory systems for policy, session state, and audit logs

Decision

Adopt a Layered, SIMD-Optimized Architecture

We implement ruvector-core as the foundational vector database engine with the following architecture:

+-----------------------------------------------------------------------------+
|                              APPLICATION LAYER                               |
|  AgenticDB | VectorDB API | Cypher Queries | REST/gRPC Server               |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              INDEX LAYER                                     |
|  HNSW Index | Flat Index | Filtered Search | Hybrid Search | MMR            |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              QUANTIZATION LAYER                              |
|  Scalar (4x) | Product (8-16x) | Binary (32x) | Conformal Prediction        |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              DISTANCE LAYER                                  |
|  Euclidean | Cosine | Dot Product | Manhattan | SIMD Dispatch               |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              SIMD INTRINSICS LAYER                           |
|  AVX2/AVX-512 (x86_64) | NEON (ARM64/Apple Silicon) | Scalar Fallback       |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              STORAGE LAYER                                   |
|  REDB (native) | Memory-only (WASM) | PostgreSQL Extension                  |
+-----------------------------------------------------------------------------+

Key Components

1. SIMD Intrinsics Layer (simd_intrinsics.rs)

The performance foundation of ruvector, providing hardware-accelerated distance calculations.

Architecture Dispatch

pub fn euclidean_distance_simd(a: &[f32], b: &[f32]) -> f32 {
    #[cfg(target_arch = "x86_64")]
    {
        if is_x86_feature_detected!("avx2") {
            unsafe { euclidean_distance_avx2_impl(a, b) }
        } else {
            euclidean_distance_scalar(a, b)
        }
    }

    #[cfg(target_arch = "aarch64")]
    {
        unsafe { euclidean_distance_neon_impl(a, b) }
    }

    #[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
    {
        euclidean_distance_scalar(a, b)
    }
}

Supported Operations

OperationAVX2 (x86_64)NEON (ARM64)Scalar Fallback
Euclidean Distance8 floats/cycle4 floats/cycle1 float/cycle
Dot Product8 floats/cycle4 floats/cycle1 float/cycle
Cosine Similarity8 floats/cycle4 floats/cycle1 float/cycle
Manhattan DistanceN/A4 floats/cycle1 float/cycle

Performance Characteristics

MetricAVX2NEONScalar
512-dim Euclidean~16M ops/sec~8M ops/sec~2M ops/sec
384-dim Cosine~143ns~200ns~800ns
1536-dim Dot Product~33ns~50ns~150ns

Security Guarantees

  • Bounds checking via assert_eq!(a.len(), b.len()) prevents buffer overflows
  • Unaligned loads (_mm256_loadu_ps, vld1q_f32) handle arbitrary alignment
  • Scalar fallback handles remainder elements after SIMD processing

2. Distance Metrics Layer (distance.rs)

High-level distance API with optional SimSIMD integration for additional acceleration.

Supported Metrics

pub enum DistanceMetric {
    Euclidean,   // L2 distance: sqrt(sum((a[i] - b[i])^2))
    Cosine,      // 1 - cosine_similarity
    DotProduct,  // Negative dot product (for maximization)
    Manhattan,   // L1 distance: sum(|a[i] - b[i]|)
}

Feature Flags

FeatureDescriptionUse Case
simdSimSIMD accelerationNative builds
parallelRayon batch processingMulti-core systems
NonePure Rust fallbackWASM builds

Batch Distance API

pub fn batch_distances(
    query: &[f32],
    vectors: &[Vec<f32>],
    metric: DistanceMetric,
) -> Result<Vec<f32>> {
    #[cfg(all(feature = "parallel", not(target_arch = "wasm32")))]
    {
        use rayon::prelude::*;
        vectors.par_iter()
            .map(|v| distance(query, v, metric))
            .collect()
    }
    // Sequential fallback for WASM...
}

3. Index Structures (index/)

HNSW Index (index/hnsw.rs)

Hierarchical Navigable Small World graph for approximate nearest neighbor search.

Configuration Parameters:

ParameterDefaultDescription
m32Connections per layer (higher = better recall, more memory)
ef_construction200Build-time search depth (higher = better graph, slower build)
ef_search100Query-time search depth (higher = better recall, slower query)
max_elements10MPre-allocated capacity

Complexity Analysis:

OperationTime ComplexitySpace Complexity
InsertO(log n * m * ef_construction)O(m * log n) per vector
SearchO(log n * m * ef_search)O(ef_search)
DeleteO(1)*O(1)

*Note: HNSW deletion marks vectors as removed but does not restructure the graph.

Serialization:

pub struct HnswState {
    vectors: Vec<(String, Vec<f32>)>,
    id_to_idx: Vec<(String, usize)>,
    idx_to_id: Vec<(usize, String)>,
    next_idx: usize,
    config: SerializableHnswConfig,
    dimensions: usize,
    metric: SerializableDistanceMetric,
}

Flat Index

Linear scan index for small datasets or exact search.

Use Cases:

  • Datasets < 10K vectors
  • Exact k-NN required
  • Benchmarking HNSW recall

4. Quantization Strategies (quantization.rs)

Memory compression techniques trading precision for storage efficiency.

Scalar Quantization (4x compression)

Quantizes f32 to u8 using min-max scaling.

pub struct ScalarQuantized {
    pub data: Vec<u8>,     // Quantized values
    pub min: f32,          // Minimum for dequantization
    pub scale: f32,        // Scale factor
}

Characteristics:

  • Compression: 4x (f32 -> u8)
  • Distance calculation: Uses average scale for symmetric distance
  • Reconstruction error: < 0.4% for typical embedding distributions

Product Quantization (8-16x compression)

Divides vectors into subspaces, each quantized independently via k-means codebooks.

pub struct ProductQuantized {
    pub codes: Vec<u8>,                    // One code per subspace
    pub codebooks: Vec<Vec<Vec<f32>>>,     // Learned centroids
}

Training:

  • K-means clustering on subspace vectors
  • Codebook size typically 256 (fits in u8)
  • Iterations: 10-100 for convergence

Binary Quantization (32x compression)

Single-bit representation based on sign.

pub struct BinaryQuantized {
    pub bits: Vec<u8>,      // Packed bits (8 dimensions per byte)
    pub dimensions: usize,
}

Characteristics:

  • Compression: 32x (f32 -> 1 bit)
  • Distance: Hamming distance (XOR + popcount)
  • Best for: Filtering stage before exact distance on candidates

Tiered Compression Strategy

Ruvector automatically manages compression based on access patterns:

Access FrequencyFormatCompressionLatency
Hot (>80%)f321xInstant
Warm (40-80%)f162x~1us
Cool (10-40%)Scalar4x~10us
Cold (1-10%)Product8-16x~100us
Archive (<1%)Binary32x~1ms

5. Memory Management

Arena Allocator (arena.rs)

Bump allocator for batch operations reducing allocation overhead.

Lock-Free Structures (lockfree.rs)

  • Crossbeam-based concurrent data structures
  • Lock-free queues for batch ingestion
  • Available only on parallel feature (not WASM)

Cache-Optimized Operations (cache_optimized.rs)

  • Prefetching hints for sequential access
  • Cache-line aligned storage
  • NUMA-aware allocation on supported platforms

6. Storage Layer (storage.rs)

Native Storage (REDB)

  • ACID transactions
  • Memory-mapped vectors
  • Configuration persistence
  • Connection pooling for multiple VectorDB instances
const VECTORS_TABLE: TableDefinition<&str, &[u8]> = TableDefinition::new("vectors");
const METADATA_TABLE: TableDefinition<&str, &str> = TableDefinition::new("metadata");
const CONFIG_TABLE: TableDefinition<&str, &str> = TableDefinition::new("config");

Security:

  • Path traversal protection
  • Validates relative paths don't escape working directory

Memory-Only Storage (storage_memory.rs)

  • Pure in-memory for WASM
  • No persistence
  • DashMap for concurrent access

Integration Points

1. Policy Memory Store

Ruvector serves as the backing store for AI agent policy memory:

+-------------------+       +-------------------+       +-------------------+
|   AI Agent        |       |   Policy Memory   |       |   ruvector-core   |
|                   | ----> |   (AgenticDB)     | ----> |                   |
| "What action for  |       | Search similar    |       | HNSW search       |
|  this situation?" |       | past situations   |       | with metadata     |
+-------------------+       +-------------------+       +-------------------+

Use Cases:

  • Q-learning state-action lookups
  • Contextual bandit policy retrieval
  • Episodic memory for reasoning

2. Session State Index

Real-time session context for conversational AI:

+-------------------+       +-------------------+       +-------------------+
|   Chat Session    |       |   Session Index   |       |   ruvector-core   |
|                   | ----> |                   | ----> |                   |
| Current context   |       | Find relevant     |       | Cosine similarity |
| embedding         |       | past turns        |       | top-k search      |
+-------------------+       +-------------------+       +-------------------+

Requirements:

  • < 10ms latency for interactive use
  • Session isolation via namespaces
  • TTL-based cleanup

3. Witness Log for Audit

Cryptographically-linked audit trail:

+-------------------+       +-------------------+       +-------------------+
|   Agent Action    |       |   Witness Log     |       |   ruvector-core   |
|                   | ----> |                   | ----> |                   |
| Action embedding  |       | Store with hash   |       | Append-only       |
| + metadata        |       | chain reference   |       | with timestamps   |
+-------------------+       +-------------------+       +-------------------+

Properties:

  • Immutable entries
  • Hash-chain linking
  • Semantic searchability

Decision Drivers

1. Performance (Sub-millisecond Latency)

RequirementImplementation
61us p50 searchSIMD-optimized distance + HNSW
16,400 QPSParallel search with Rayon
Batch ingestionLock-free queues + bulk insert

2. Memory Efficiency (Quantization Support)

RequirementImplementation
4x compressionScalar quantization
8-16x compressionProduct quantization
32x compressionBinary quantization
Automatic tieringAccess pattern tracking

3. Cross-Platform Portability (WASM, Native)

PlatformFeatures Available
x86_64 Linux/macOSFull (SIMD, parallel, storage)
ARM64 macOS (Apple Silicon)Full (NEON, parallel, storage)
WASM (browser)Memory-only, scalar fallback
PostgreSQL extensionFull + SQL integration

4. LLM Integration

RequirementImplementation
Embedding ingestionAPI-based and local providers
Semantic searchCosine/dot product metrics
RAG pipelineHybrid search + metadata filtering

Alternatives Considered

Alternative 1: Pure Python Implementation (NumPy/FAISS)

Rejected because:

  • 10-100x slower than Rust SIMD
  • No WASM support
  • GIL contention in concurrent workloads

Alternative 2: C++ with Bindings

Rejected because:

  • Memory safety concerns
  • Complex cross-compilation
  • Build system complexity (CMake)

Alternative 3: Qdrant/Milvus Integration

Rejected because:

  • External service dependency
  • No WASM support
  • Complex deployment for edge use cases

Alternative 4: GPU-Only Acceleration (CUDA/ROCm)

Rejected because:

  • Not portable to edge/mobile
  • Driver dependencies
  • Overkill for < 100M vectors

Consequences

Benefits

  1. Performance: Sub-millisecond latency enables real-time AI applications
  2. Portability: Single codebase runs native, WASM, and PostgreSQL
  3. Memory Efficiency: 2-32x compression makes large datasets practical on edge
  4. Integration: Native Rust means zero-cost abstractions for embedding in other systems
  5. Learning: GNN layers can improve search quality without reindexing

Risks and Mitigations

RiskProbabilityImpactMitigation
HNSW recall < 100%HighMediumef_search tuning, hybrid with exact search
Quantization accuracy lossMediumMediumConformal prediction bounds
WASM performance gapMediumLowSpecialized WASM-optimized builds
API embeddings require external callHighLowLocal embedding option via ONNX

Performance Targets

MetricTargetAchieved
HNSW Search (k=10, 384-dim)< 100us p5061us
HNSW Search (k=100, 384-dim)< 200us p50164us
Cosine Distance (1536-dim)< 200ns143ns
Dot Product (384-dim)< 50ns33ns
Batch Distance (1000 vectors)< 500us237us
QPS (10K vectors, k=10)> 10K16,400

Implementation Status

Completed (v0.1.x)

ModuleStatusDescription
simd_intrinsicsCompleteAVX2/NEON dispatch with scalar fallback
distanceCompleteAll 4 metrics with SimSIMD integration
index/hnswCompleteFull HNSW with serialization
index/flatCompleteLinear scan baseline
quantizationCompleteScalar, Product, Binary
storageCompleteREDB-based with connection pooling
storage_memoryCompleteIn-memory for WASM
typesCompleteCore types with serde
errorCompleteError types with thiserror
vector_dbCompleteHigh-level API
agenticdbCompleteAI agent memory interface

Advanced Features

ModuleStatusDescription
advanced_features/filtered_searchCompleteMetadata-based filtering
advanced_features/hybrid_searchCompleteDense + sparse (BM25)
advanced_features/mmrCompleteMaximal Marginal Relevance
advanced_features/conformal_predictionCompleteUncertainty quantification
advanced_features/product_quantizationCompleteEnhanced PQ with training

Research Features (advanced/)

ModuleStatusDescription
hypergraphExperimentalHyperedge relationships
learned_indexExperimentalNeural index structures
neural_hashExperimentalLSH with neural tuning
tdaExperimentalTopological data analysis

Feature Flags

FeatureDefaultDescription
defaultYessimd, storage, hnsw, api-embeddings, parallel
simdYesSimSIMD acceleration
parallelYesRayon parallel processing
storageYesREDB file-based storage
hnswYesHNSW index support
api-embeddingsYesHTTP-based embedding providers
memory-onlyNoPure in-memory (WASM)
real-embeddingsNoDeprecated, use api-embeddings

Dependencies

Core Dependencies

DependencyVersionPurpose
hnsw_rsworkspaceHNSW implementation
simsimdworkspaceSIMD distance functions
rayonworkspaceParallel iteration
redbworkspaceEmbedded database
bincodeworkspaceBinary serialization
dashmapworkspaceConcurrent hash map
parking_lotworkspaceOptimized locks

Optional Dependencies

DependencyFeaturePurpose
reqwestapi-embeddingsHTTP client for embedding APIs
memmap2storageMemory-mapped files
crossbeamparallelLock-free data structures

API Examples

use ruvector_core::{VectorDB, DistanceMetric, HnswConfig};

// Create database
let config = HnswConfig {
    m: 32,
    ef_construction: 200,
    ef_search: 100,
    max_elements: 1_000_000,
};
let mut db = VectorDB::new(384, DistanceMetric::Cosine, config)?;

// Insert vectors
db.insert("doc_1".to_string(), vec![0.1; 384])?;
db.insert("doc_2".to_string(), vec![0.2; 384])?;

// Search
let query = vec![0.15; 384];
let results = db.search(&query, 10)?;
use ruvector_core::quantization::{ScalarQuantized, QuantizedVector};

// Quantize vectors for storage
let quantized = ScalarQuantized::quantize(&vector);

// Distance in quantized space
let distance = quantized.distance(&other_quantized);

// Reconstruct if needed
let reconstructed = quantized.reconstruct();

Batch Operations

use ruvector_core::distance::batch_distances;

// Calculate distances to many vectors in parallel
let distances = batch_distances(
    &query,
    &corpus_vectors,
    DistanceMetric::Cosine,
)?;

References

  1. Malkov, Y., & Yashunin, D. (2018). "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." arXiv:1603.09320.

  2. Jegou, H., Douze, M., & Schmid, C. (2011). "Product quantization for nearest neighbor search." IEEE TPAMI.

  3. RuVector Team. "ruvector-core Benchmarks." /crates/ruvector-core/benches/

  4. SimSIMD Documentation. https://github.com/ashvardanian/SimSIMD


Appendix A: SIMD Register Usage

AVX2 (256-bit registers)

+-------+-------+-------+-------+-------+-------+-------+-------+
|  f32  |  f32  |  f32  |  f32  |  f32  |  f32  |  f32  |  f32  |
+-------+-------+-------+-------+-------+-------+-------+-------+
   [0]     [1]     [2]     [3]     [4]     [5]     [6]     [7]

Operations per cycle:
- _mm256_loadu_ps: Load 8 floats
- _mm256_sub_ps: 8 subtractions
- _mm256_mul_ps: 8 multiplications
- _mm256_add_ps: 8 additions

NEON (128-bit registers)

+-------+-------+-------+-------+
|  f32  |  f32  |  f32  |  f32  |
+-------+-------+-------+-------+
   [0]     [1]     [2]     [3]

Operations per cycle:
- vld1q_f32: Load 4 floats
- vsubq_f32: 4 subtractions
- vfmaq_f32: 4 fused multiply-add
- vaddvq_f32: Horizontal sum

Appendix B: Memory Layout

VectorEntry

+------------------+------------------+------------------+
|     id: String   |  vector: Vec<f32>|  metadata: JSON  |
|     (optional)   |  (required)      |  (optional)      |
+------------------+------------------+------------------+

HNSW Graph Structure

Level 3:  [v0] -------- [v5]
            \            /
Level 2:  [v0] -- [v3] -- [v5] -- [v9]
            \    /    \    /    \
Level 1:  [v0]-[v1]-[v3]-[v4]-[v5]-[v7]-[v9]
            |    |    |    |    |    |    |
Level 0:  [v0]-[v1]-[v2]-[v3]-[v4]-[v5]-[v6]-[v7]-[v8]-[v9]

Appendix C: Benchmark Results

Platform: Apple M2 (ARM64 NEON)

HNSW Search k=10 (10K vectors, 384-dim):
  p50: 61us
  p95: 89us
  p99: 112us
  Throughput: 16,400 QPS

HNSW Search k=100 (10K vectors, 384-dim):
  p50: 164us
  p95: 203us
  p99: 245us
  Throughput: 6,100 QPS

Distance Operations (1536-dim):
  Cosine: 143ns
  Euclidean: 156ns
  Dot Product: 33ns (384-dim)

Batch Distance (1000 vectors, 384-dim):
  Parallel (Rayon): 237us
  Sequential: 890us

Platform: Intel i7 (AVX2)

HNSW Search k=10 (10K vectors, 384-dim):
  p50: 72us
  p95: 105us
  p99: 134us
  Throughput: 13,900 QPS

Distance Operations (1536-dim):
  Cosine: 128ns
  Euclidean: 141ns
  Dot Product: 29ns (384-dim)

  • ADR-002: RuvLLM Integration with Ruvector
  • ADR-003: SIMD Optimization Strategy
  • ADR-004: KV Cache Management
  • ADR-005: WASM Runtime Integration
  • ADR-006: Memory Management
  • ADR-007: Security Review & Technical Debt

Implementation Status (v2.1)

ComponentStatusNotes
HNSW Index✅ ImplementedM=32, ef_construct=256, 16K QPS
SIMD Distance✅ ImplementedAVX2/NEON with fallback
Scalar Quantization✅ Implemented8-bit with min/max scaling
Batch Operations✅ ImplementedRayon parallel distances
Graph Store✅ ImplementedAdjacency list with metadata
Persistence✅ ImplementedBinary format with versioning

Security Status: Core components reviewed. No critical vulnerabilities in ruvector-core. See ADR-007 for full audit (RuvLLM-specific issues).


Revision History

VersionDateAuthorChanges
1.02026-01-18Ruvector Architecture TeamInitial version
1.12026-01-19Security Review AgentAdded implementation status, related decisions