VelesDB Architecture

June 14, 2026 · View on GitHub

This document describes the internal architecture of VelesDB.

Architecture Status Update (2026-02-26)

VelesDB core architecture is explicitly hybrid by design:

Vector engine with 5 metrics (Cosine, Euclidean, DotProduct, Hamming, Jaccard) and SIMD acceleration.
Graph engine for nodes/edges/traversal inside collection runtime.
Multi-column engine (ColumnStore) for typed filtering and bitmap operations.
VelesQL control plane (parser/validation/planning/cache) orchestrating cross-domain execution paths.

High-Level Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                           CLIENT LAYER                                   │
├─────────────────────────────────────────────────────────────────────────┤
│  TypeScript SDK │ Python SDK │ REST Client │ VelesQL CLI │ Mobile SDK  │
│  (@velesdb/sdk) │ (velesdb)  │ (curl/HTTP) │ (velesdb)   │ (iOS/Android)│
└───────┬─────────┴──────┬─────┴───────┬─────┴──────┬──────┴──────┬──────┘
         │                  │                 │                 │
         ▼                  ▼                 ▼                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                           API LAYER                                      │
├─────────────────────────────────────────────────────────────────────────┤
│   WASM Module     │   Python Bindings   │    REST Server    │   CLI    │
│  (velesdb-wasm)   │   (velesdb-python)  │  (velesdb-server) │  (REPL)  │
│                   │       PyO3          │      Axum         │          │
└────────┬──────────┴───────┬─────────────┴────────┬──────────┴────┬─────┘
         │                  │                      │               │
         ▼                  ▼                      ▼               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                          CORE ENGINE                                     │
│                         (velesdb-core)                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌─────────────┐ │
│  │   Database   │  │  Collection  │  │   VelesQL    │  │   Filter    │ │
│  │  Management  │  │  Operations  │  │   Parser     │  │   Engine    │ │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘  └──────┬──────┘ │
│         │                 │                 │                 │         │
│         ▼                 ▼                 ▼                 ▼         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                       INDEX LAYER                                │   │
│  ├─────────────────────────────────────────────────────────────────┤   │
│  │   ┌─────────────┐    ┌─────────────┐    ┌─────────────────────┐ │   │
│  │   │  HNSW Index │    │ BM25 Index  │    │  ColumnStore Filter │ │   │
│  │   │  (ANN)      │    │ (Full-Text) │    │  (RoaringBitmap)    │ │   │
│  │   └──────┬──────┘    └──────┬──────┘    └──────────┬──────────┘ │   │
│  └──────────┼──────────────────┼─────────────────────┼─────────────┘   │
│             │                  │                     │                  │
│             ▼                  ▼                     ▼                  │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                     DISTANCE LAYER (SIMD)                        │   │
│  ├─────────────────────────────────────────────────────────────────┤   │
│  │  Cosine  │  Euclidean  │  Dot Product  │  Hamming  │  Jaccard   │   │
│  │  (33.1ns)│   (22.5ns)  │    (19.8ns)   │  (35.8ns) │   (35.1ns) │   │
│  │                                                                  │   │
│  │  AVX2/AVX-512 │ ARM64 NEON │ Scalar fallback (incl. WASM —    │   │
│  │               │ (simd_neon)│ SIMD128 planned)                 │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                          │
└────────────────────────────────────┬────────────────────────────────────┘
                                     │
                                     ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         STORAGE LAYER                                    │
├─────────────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌────────────────┐ │
│  │ Vector Data │  │   Payload   │  │     WAL     │  │  Binary Export │ │
│  │  (mmap)     │  │   Storage   │  │  (durability│  │  (VELS format) │ │
│  └─────────────┘  └─────────────┘  └─────────────┘  └────────────────┘ │
│                                                                          │
│  File System / Memory / IndexedDB (WASM)                                │
└─────────────────────────────────────────────────────────────────────────┘

Component Details

1. Client Layer

Component	Language	Purpose
TypeScript SDK	TypeScript	Unified client for browser/Node.js
Python SDK	Python	Native bindings via PyO3
Mobile SDK	Swift/Kotlin	Native iOS and Android bindings via UniFFI
REST Client	Any	HTTP API access
VelesQL CLI	Rust	Interactive query REPL

2. API Layer

velesdb-wasm

WebAssembly module for browser/Node.js
Scalar distance calculations (SIMD128 kernels planned)
IndexedDB persistence via binary export/import
~430 KB gzipped (v1.18.0 npm artifact)

velesdb-server

Axum-based REST API server
OpenAPI/Swagger documentation
48 REST endpoints (55 method+path operations)
Prometheus metrics served by default (GET /metrics)

velesdb-python

PyO3 bindings for Python
NumPy array support
Zero-copy when possible

velesdb-mobile

UniFFI bindings for iOS (Swift) and Android (Kotlin)
Thread-safe Arc-wrapped handles
StorageMode support (Full, SQ8, Binary) for IoT/Edge
Targets: aarch64-apple-ios, aarch64-linux-android, etc.

3. Core Engine (velesdb-core)

Database

Collection management via typed registries:
- vector_collections: HashMap<String, VectorCollection>
- graph_collections: HashMap<String, GraphCollection>
- metadata_collections: HashMap<String, MetadataCollection>
Multi-collection support
Automatic persistence

Collection

Three typed collection variants: VectorCollection, GraphCollection, MetadataCollection
Point CRUD operations
Vector search (single & batch)
Text search (BM25)
Hybrid search (vector + text)

VelesQL Parser (v2.0)

SQL-like query language
~1.3M queries/sec parsing
Bound parameters support
v2.0 Features:
- GROUP BY / HAVING (AND/OR)
- ORDER BY (multi-column, similarity)
- JOIN with aliases
- UNION / INTERSECT / EXCEPT
- USING FUSION (hybrid search)
- WITH (max_groups, group_limit)

Filter Engine

ColumnStore-based filtering (adaptive per-collection payload mirror in the SELECT ... WHERE path)
RoaringBitmap for set operations
Up to 130x faster than JSON filtering (filtering-API micro-benchmark)

Aggregation Engine (EPIC-017/018)

Streaming aggregation executor
Performance Optimizations:
- process_batch() - SIMD-friendly vectorized aggregation
- Parallel aggregation with Rayon (10K+ datasets)
- Pre-computed hash for GROUP BY (vs JSON serialization)
- String interning to avoid allocations in hot path
~2x speedup on large aggregations

4. Knowledge Graph Layer (EPIC-019)

┌─────────────────────────────────────────────────────────────────────────┐
│                      KNOWLEDGE GRAPH ENGINE                              │
├─────────────────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐  ┌──────────────────┐  ┌────────────────────────┐ │
│  │   GraphSchema    │  │    GraphNode     │  │      GraphEdge         │ │
│  │  (labels, types) │  │ (id, properties) │  │ (src, tgt, label, props)│ │
│  └────────┬─────────┘  └────────┬─────────┘  └───────────┬────────────┘ │
│           │                     │                        │              │
│           ▼                     ▼                        ▼              │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                    ConcurrentEdgeStore                           │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │   │
│  │  │  256 Shards │  │  edge_ids   │  │    Label Indices        │  │   │
│  │  │ (RwLock<    │  │  HashMap    │  │  by_label, outgoing_    │  │   │
│  │  │  EdgeStore>)│  │ (edge→src)  │  │  by_label               │  │   │
│  │  └──────┬──────┘  └──────┬──────┘  └───────────┬─────────────┘  │   │
│  │         │                │                     │                 │   │
│  │         ▼                ▼                     ▼                 │   │
│  │  ┌──────────────────────────────────────────────────────────┐   │   │
│  │  │  Optimized Operations:                                    │   │   │
│  │  │  • add_edge: O(1) with cross-shard dual-insert           │   │   │
│  │  │  • remove_edge: O(1) 2-shard lookup (not 256)            │   │   │
│  │  │  • get_edges_by_label: O(k) via label index              │   │   │
│  │  └──────────────────────────────────────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                          │
│  ┌──────────────────┐  ┌──────────────────┐  ┌────────────────────────┐ │
│  │   LabelTable     │  │   BfsIterator    │  │    GraphMetrics        │ │
│  │ String interning │  │ Streaming BFS    │  │  LatencyHistogram      │ │
│  │  LabelId (u32)   │  │ memory-bounded   │  │  node/edge counters    │ │
│  └──────────────────┘  └──────────────────┘  └────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘

Scalability (10M+ edges):

Adaptive sharding: 1-512 shards based on graph size
2-shard removal: O(1) instead of O(256) lock acquisitions
Label indices: O(k) edge lookup by relationship type
String interning: ~60% memory reduction for labels
CsrSnapshot: Zero-copy CSR (Compressed Sparse Row) snapshot for cache-friendly BFS/DFS traversal. Built on-demand after load or build_read_snapshot(), auto-invalidated by writes. Returns neighbor IDs as contiguous &[u64] slices instead of per-shard edge lookups.
Parent-pointer BFS: BFS/DFS uses a FxHashMap parent-pointer map instead of cloning path vectors at every edge expansion. Paths are reconstructed on-demand via reconstruct_path() only when emitting results. Uses FxHashSet visited sets (via rustc_hash) for faster hashing than std::HashSet.
Parallel BFS: Multi-source BFS traversal (traverse_bfs_parallel) launches concurrent BFS from multiple source nodes with deduplication by path signature. Available across all components: server REST API, Python bindings (GIL-released), Mobile (UniFFI), Tauri plugin, and TypeScript SDK.

5. Index Layer

HNSW Index

                    Entry Point (Layer L)
                          │
            ┌─────────────┼─────────────┐
            ▼             ▼             ▼
         Node A ─────── Node B ─────── Node C   (Layer L-1)
            │             │             │
    ┌───────┼───────┐     │     ┌───────┼───────┐
    ▼       ▼       ▼     ▼     ▼       ▼       ▼
   ...     ...     ...   ...   ...     ...     ... (Layer 0)

Parameters:
- M: Max connections per node (default: 24-32, auto-tuned by dimension)
- ef_construction: Build-time search width (default: 300-400, auto-tuned by dimension)
- ef_search: Query-time search width (default: 160, Balanced mode). An explicit WITH (ef_search = N) is passed through as the requested budget (clamped to at least k, and still subject to the standard dataset-size scaling), instead of being snapped to a coarse named profile. (Updated 2026-06-14.)
Features:
- Thread-safe parallel insertions with lock-free CAS entry-point promotion
- Graduated ef_construction (3-phase VAMANA/DiskANN schedule for batches >= 1000)
- Pre-allocated vector storage (reserve + bulk push to minimize lock contention)
- Automatic level assignment
- Persistent storage with WAL recovery

BM25 Index

Term frequency / inverse document frequency
Tokenization with stopword removal
Persistent storage

ColumnStore

Columnar storage for typed metadata
String interning for efficient comparisons
RoaringBitmap for fast set operations

5. Distance Layer (SIMD)

Metric	Implementation	Latency (768D)
Dot Product	AVX2 FMA	21.7 ns
Euclidean	AVX2 FMA	26.0 ns
Cosine	AVX2 4-acc, single-sqrt finish	33.1 ns
Hamming	AVX2 FP-domain 4-acc	35.8 ns
Jaccard	AVX-512 4-acc	35.1 ns

Per-metric numbers above are the contract values in docs/reference/promise-contract.json. Raw micro-benchmark snapshots (March 27 2026 run on a specific machine) live in SIMD_PERFORMANCE.md and may differ by ~10% due to methodology / cache state.

SIMD Strategy:

Native (x86_64): AVX2/AVX-512 via core::arch intrinsics with 4-accumulator ILP
Native (aarch64): NEON 128-bit with 1-acc/4-acc variants
WASM: scalar fallback (SIMD128 kernels planned; wasm32 dispatches to SimdLevel::Scalar)
Fallback: Scalar with loop unrolling

6. Storage Layer

Vector Data

Memory-mapped files for large datasets
Contiguous f32 buffer for cache locality
Lazy loading support

Payload Storage

JSON-based payload storage
Nested field access with dot notation
Type-aware indexing

WAL (Write-Ahead Log)

Durability guarantees
Automatic recovery on restart
Configurable sync policy

$ ┌────────┬─────────┬───────────┬────────┬─────────┬─────────────────────┐ │ "\text{VELS}" │ \text{Version} │ \text{Dimension} │ \text{Metric} │ \text{Count} │ \text{Vectors} │ │ 4 \text{bytes}│ 1 \text{byte} │ 4 \text{bytes} │ 1 \text{byte} │ 8 \text{bytes} │ (\text{id} + \text{data}) \times \text{count} │ └────────┴─────────┴───────────┴────────┴─────────┴─────────────────────┘ $

Data Flow

Vector Search Flow

Query Vector
     │
     ▼
┌─────────────────┐
│  VelesQL Parse  │ (optional)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Filter Engine  │ (if filters present:
│ (secondary idx +│  secondary indexes + JSON
│  JSON filters)  │  payload filters)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   HNSW Search   │
│  (entry → L0)   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  SIMD Distance  │
│  Calculations   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Top-K Results  │
│  (min-heap)     │
└────────┬────────┘
         │
         ▼
   Sorted Results

Hybrid Search Flow

Query Vector + Text Query
         │
    ┌────┴────┐
    ▼         ▼
┌───────┐ ┌───────┐
│ HNSW  │ │ BM25  │
│Search │ │Search │
└───┬───┘ └───┬───┘
    │         │
    ▼         ▼
┌─────────────────┐
│  RRF Fusion     │
│ (Reciprocal     │
│  Rank Fusion)   │
└────────┬────────┘
         │
         ▼
   Merged Results

VelesQL v2.0 Query Flow

┌─────────────────────────────────────────────────────────────────┐
│                      VelesQL v2.0 Parser                         │
├─────────────────────────────────────────────────────────────────┤
│  SQL Query                                                       │
│    │                                                             │
│    ▼                                                             │
│  ┌────────────────┐                                              │
│  │  Pest Grammar  │  compound_query → select_stmt [set_op]       │
│  └────────┬───────┘                                              │
│           │                                                      │
│           ▼                                                      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                        AST                                  │ │
│  ├────────────────────────────────────────────────────────────┤ │
│  │  Query {                                                    │ │
│  │    select: SelectStatement {                                │ │
│  │      columns, from, joins[], where_clause,                  │ │
│  │      group_by, having, order_by, limit, offset,             │ │
│  │      with_clause, fusion_clause                             │ │
│  │    },                                                       │ │
│  │    compound: Option<CompoundQuery> {                        │ │
│  │      operator: UNION|INTERSECT|EXCEPT,                      │ │
│  │      right: SelectStatement                                 │ │
│  │    }                                                        │ │
│  │  }                                                          │ │
│  └────────────────────────────────────────────────────────────┘ │
│           │                                                      │
│           ▼                                                      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                    Execution Engine                         │ │
│  ├────────────────────────────────────────────────────────────┤ │
│  │  1. Filter Pushdown → ColumnStore                           │ │
│  │  2. Vector Search → HNSW (if NEAR clause)                   │ │
│  │  3. JOIN Execution → Cross-collection merge                 │ │
│  │  4. Aggregation → GROUP BY + HAVING                         │ │
│  │  5. Ordering → ORDER BY (columns, similarity)               │ │
│  │  6. Set Operations → UNION/INTERSECT/EXCEPT                 │ │
│  │  7. Fusion → RRF/Weighted/Maximum (if USING FUSION)         │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

VelesQL v2.0 Supported Syntax:

-- Aggregation with GROUP BY and HAVING
SELECT category, COUNT(*), AVG(price) 
FROM products 
GROUP BY category 
HAVING COUNT(*) > 5 AND AVG(price) > 50

-- ORDER BY with similarity function
SELECT * FROM docs 
ORDER BY similarity(vector, $query) DESC 
LIMIT 10

-- JOIN across collections
SELECT * FROM orders 
JOIN customers AS c ON orders.customer_id = c.id
WHERE status = 'active'

-- Set operations
SELECT * FROM active_users UNION SELECT * FROM archived_users

-- Hybrid fusion search (USING FUSION is a trailing clause: after LIMIT)
SELECT * FROM documents 
LIMIT 20 USING FUSION(strategy='rrf', k=60)

Performance Characteristics

Memory Usage

Component	Per Vector (768D)
Vector Data (f32)	3,072 bytes
Vector Data (f16)	1,536 bytes
Vector Data (SQ8)	768 bytes
HNSW Links	~256 bytes
Payload (avg)	~200 bytes

Throughput

Operation	Throughput
Insert	~3.8K-6.4K vec/sec (768D)
Search k=10 (10K vectors, 768D, HNSW index-only)	~55 µs
Search end-to-end p50 (10K/384D, WAL ON, recall ≥ 96%)	~450 µs
Search (100K vectors)	< 5 ms
VelesQL Parse	1.3M queries/sec
Export (WASM)	4,479 MB/s
Import (WASM)	2,943 MB/s

Platform Support

Platform	Status	SIMD	Performance
Linux x86_64	✅ Full	AVX2/AVX-512	100%
Windows x86_64	✅ Full	AVX2	100%
macOS x86_64	✅ Full	AVX2	100%
macOS ARM64	✅ Full	NEON	~90%
WASM (Browser)	✅ Full	Scalar (SIMD128 planned)	~70%
WASM (Node.js)	✅ Full	Scalar (SIMD128 planned)	~70%
iOS (ARM64)	✅ Full	NEON	~90%
Android (ARM64)	✅ Full	NEON	~90%
Android (ARMv7)	✅ Full	Fallback	~70%

ARM64 (Apple Silicon / Mobile) Note

On ARM64 platforms (macOS M1/M2/M3, iOS, Android), VelesDB uses native NEON SIMD instructions for distance calculations via the simd_native module, with both 1-accumulator and 4-accumulator variants depending on vector size.

Impact:

Distance calculations are ~10% slower than x86_64 with AVX2
All other operations (indexing, storage, queries) are unaffected
Overall search latency remains in the microsecond range

Future Architecture

Planned Components

┌─────────────────────────────────────────────────────────────────────────┐
│                       DISTRIBUTED LAYER (v1.0+)                          │
├─────────────────────────────────────────────────────────────────────────┤
│   Coordinator   │   Sharding   │   Replication   │   Consensus (Raft)  │
└─────────────────────────────────────────────────────────────────────────┘

GPU Acceleration: CUDA kernels for large-scale (wgpu-based, optional)

v1.6.0 Architecture Improvements (Shipped)

The following architectural changes, originally identified in the January 2026 technical audit, have been implemented as of v1.6.0:

Change	Before (v0.8.x)	After (v1.6.0)
Concurrency	Global `RwLock<HashMap>`	`DashMap` + 16-shard storage
Memory	`Vec<f32>` allocations per read	Zero-copy `&[f32]` from mmap
SIMD Dispatch	Per-call feature detection	`OnceLock` function pointer
Unsafe	`'static` lifetime tricks	Safe self-referential via `ouroboros`

See docs/internal/TECHNICAL_AUDIT_PLAN.md for the original audit plan.

Code-Truth Matrix (2026-02-26)

Capability	Runtime module(s)	Notes
VelesQL parser + validation + planning	`crates/velesdb-core/src/velesql/*`	Query control plane and parse cache
Vector engine (5 metrics)	`crates/velesdb-core/src/distance/`, `simd_native/`, `index/hnsw/*`	Cosine, Euclidean, DotProduct, Hamming, Jaccard
Graph engine	`crates/velesdb-core/src/collection/graph/*`	Nodes/edges/traversal/property indexes
Multi-column filtering	`crates/velesdb-core/src/column_store/*`	Typed filters + bitmap paths
Hybrid execution / fusion	`crates/velesdb-core/src/collection/search/query/*`	Pushdown + fusion strategies
Storage + WAL/recovery	`crates/velesdb-core/src/storage/*`	mmap storage and recovery tests

Governance links

Operations runbook: docs/reference/OPERATIONS_RUNBOOK.md