VelesDB Architecture
June 14, 2026 · View on GitHub
This document describes the internal architecture of VelesDB.
Architecture Status Update (2026-02-26)
VelesDB core architecture is explicitly hybrid by design:
- Vector engine with 5 metrics (
Cosine,Euclidean,DotProduct,Hamming,Jaccard) and SIMD acceleration. - Graph engine for nodes/edges/traversal inside collection runtime.
- Multi-column engine (
ColumnStore) for typed filtering and bitmap operations. - VelesQL control plane (parser/validation/planning/cache) orchestrating cross-domain execution paths.
High-Level Overview
┌─────────────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
├─────────────────────────────────────────────────────────────────────────┤
│ TypeScript SDK │ Python SDK │ REST Client │ VelesQL CLI │ Mobile SDK │
│ (@velesdb/sdk) │ (velesdb) │ (curl/HTTP) │ (velesdb) │ (iOS/Android)│
└───────┬─────────┴──────┬─────┴───────┬─────┴──────┬──────┴──────┬──────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ API LAYER │
├─────────────────────────────────────────────────────────────────────────┤
│ WASM Module │ Python Bindings │ REST Server │ CLI │
│ (velesdb-wasm) │ (velesdb-python) │ (velesdb-server) │ (REPL) │
│ │ PyO3 │ Axum │ │
└────────┬──────────┴───────┬─────────────┴────────┬──────────┴────┬─────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ CORE ENGINE │
│ (velesdb-core) │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ Database │ │ Collection │ │ VelesQL │ │ Filter │ │
│ │ Management │ │ Operations │ │ Parser │ │ Engine │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬──────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ INDEX LAYER │ │
│ ├─────────────────────────────────────────────────────────────────┤ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │
│ │ │ HNSW Index │ │ BM25 Index │ │ ColumnStore Filter │ │ │
│ │ │ (ANN) │ │ (Full-Text) │ │ (RoaringBitmap) │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ └──────────┬──────────┘ │ │
│ └──────────┼──────────────────┼─────────────────────┼─────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ DISTANCE LAYER (SIMD) │ │
│ ├─────────────────────────────────────────────────────────────────┤ │
│ │ Cosine │ Euclidean │ Dot Product │ Hamming │ Jaccard │ │
│ │ (33.1ns)│ (22.5ns) │ (19.8ns) │ (35.8ns) │ (35.1ns) │ │
│ │ │ │
│ │ AVX2/AVX-512 │ ARM64 NEON │ Scalar fallback (incl. WASM — │ │
│ │ │ (simd_neon)│ SIMD128 planned) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ STORAGE LAYER │
├─────────────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌────────────────┐ │
│ │ Vector Data │ │ Payload │ │ WAL │ │ Binary Export │ │
│ │ (mmap) │ │ Storage │ │ (durability│ │ (VELS format) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └────────────────┘ │
│ │
│ File System / Memory / IndexedDB (WASM) │
└─────────────────────────────────────────────────────────────────────────┘
Component Details
1. Client Layer
| Component | Language | Purpose |
|---|---|---|
| TypeScript SDK | TypeScript | Unified client for browser/Node.js |
| Python SDK | Python | Native bindings via PyO3 |
| Mobile SDK | Swift/Kotlin | Native iOS and Android bindings via UniFFI |
| REST Client | Any | HTTP API access |
| VelesQL CLI | Rust | Interactive query REPL |
2. API Layer
velesdb-wasm
- WebAssembly module for browser/Node.js
- Scalar distance calculations (SIMD128 kernels planned)
- IndexedDB persistence via binary export/import
- ~430 KB gzipped (v1.18.0 npm artifact)
velesdb-server
- Axum-based REST API server
- OpenAPI/Swagger documentation
- 48 REST endpoints (55 method+path operations)
- Prometheus metrics served by default (
GET /metrics)
velesdb-python
- PyO3 bindings for Python
- NumPy array support
- Zero-copy when possible
velesdb-mobile
- UniFFI bindings for iOS (Swift) and Android (Kotlin)
- Thread-safe
Arc-wrapped handles - StorageMode support (Full, SQ8, Binary) for IoT/Edge
- Targets:
aarch64-apple-ios,aarch64-linux-android, etc.
3. Core Engine (velesdb-core)
Database
- Collection management via typed registries:
vector_collections: HashMap<String, VectorCollection>graph_collections: HashMap<String, GraphCollection>metadata_collections: HashMap<String, MetadataCollection>
- Multi-collection support
- Automatic persistence
Collection
- Three typed collection variants:
VectorCollection,GraphCollection,MetadataCollection - Point CRUD operations
- Vector search (single & batch)
- Text search (BM25)
- Hybrid search (vector + text)
VelesQL Parser (v2.0)
- SQL-like query language
- ~1.3M queries/sec parsing
- Bound parameters support
- v2.0 Features:
GROUP BY/HAVING(AND/OR)ORDER BY(multi-column, similarity)JOINwith aliasesUNION/INTERSECT/EXCEPTUSING FUSION(hybrid search)WITH(max_groups, group_limit)
Filter Engine
- ColumnStore-based filtering (adaptive per-collection payload mirror in the
SELECT ... WHEREpath) - RoaringBitmap for set operations
- Up to 130x faster than JSON filtering (filtering-API micro-benchmark)
Aggregation Engine (EPIC-017/018)
- Streaming aggregation executor
- Performance Optimizations:
process_batch()- SIMD-friendly vectorized aggregation- Parallel aggregation with Rayon (10K+ datasets)
- Pre-computed hash for GROUP BY (vs JSON serialization)
- String interning to avoid allocations in hot path
- ~2x speedup on large aggregations
4. Knowledge Graph Layer (EPIC-019)
┌─────────────────────────────────────────────────────────────────────────┐
│ KNOWLEDGE GRAPH ENGINE │
├─────────────────────────────────────────────────────────────────────────┤
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────────────┐ │
│ │ GraphSchema │ │ GraphNode │ │ GraphEdge │ │
│ │ (labels, types) │ │ (id, properties) │ │ (src, tgt, label, props)│ │
│ └────────┬─────────┘ └────────┬─────────┘ └───────────┬────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ ConcurrentEdgeStore │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │
│ │ │ 256 Shards │ │ edge_ids │ │ Label Indices │ │ │
│ │ │ (RwLock< │ │ HashMap │ │ by_label, outgoing_ │ │ │
│ │ │ EdgeStore>)│ │ (edge→src) │ │ by_label │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ └───────────┬─────────────┘ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌──────────────────────────────────────────────────────────┐ │ │
│ │ │ Optimized Operations: │ │ │
│ │ │ • add_edge: O(1) with cross-shard dual-insert │ │ │
│ │ │ • remove_edge: O(1) 2-shard lookup (not 256) │ │ │
│ │ │ • get_edges_by_label: O(k) via label index │ │ │
│ │ └──────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────────────┐ │
│ │ LabelTable │ │ BfsIterator │ │ GraphMetrics │ │
│ │ String interning │ │ Streaming BFS │ │ LatencyHistogram │ │
│ │ LabelId (u32) │ │ memory-bounded │ │ node/edge counters │ │
│ └──────────────────┘ └──────────────────┘ └────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Scalability (10M+ edges):
- Adaptive sharding: 1-512 shards based on graph size
- 2-shard removal: O(1) instead of O(256) lock acquisitions
- Label indices: O(k) edge lookup by relationship type
- String interning: ~60% memory reduction for labels
- CsrSnapshot: Zero-copy CSR (Compressed Sparse Row) snapshot for
cache-friendly BFS/DFS traversal. Built on-demand after load or
build_read_snapshot(), auto-invalidated by writes. Returns neighbor IDs as contiguous&[u64]slices instead of per-shard edge lookups. - Parent-pointer BFS: BFS/DFS uses a
FxHashMapparent-pointer map instead of cloning path vectors at every edge expansion. Paths are reconstructed on-demand viareconstruct_path()only when emitting results. UsesFxHashSetvisited sets (viarustc_hash) for faster hashing thanstd::HashSet. - Parallel BFS: Multi-source BFS traversal (
traverse_bfs_parallel) launches concurrent BFS from multiple source nodes with deduplication by path signature. Available across all components: server REST API, Python bindings (GIL-released), Mobile (UniFFI), Tauri plugin, and TypeScript SDK.
5. Index Layer
HNSW Index
Entry Point (Layer L)
│
┌─────────────┼─────────────┐
▼ ▼ ▼
Node A ─────── Node B ─────── Node C (Layer L-1)
│ │ │
┌───────┼───────┐ │ ┌───────┼───────┐
▼ ▼ ▼ ▼ ▼ ▼ ▼
... ... ... ... ... ... ... (Layer 0)
-
Parameters:
M: Max connections per node (default: 24-32, auto-tuned by dimension)ef_construction: Build-time search width (default: 300-400, auto-tuned by dimension)ef_search: Query-time search width (default: 160, Balanced mode). An explicitWITH (ef_search = N)is passed through as the requested budget (clamped to at leastk, and still subject to the standard dataset-size scaling), instead of being snapped to a coarse named profile. (Updated 2026-06-14.)
-
Features:
- Thread-safe parallel insertions with lock-free CAS entry-point promotion
- Graduated ef_construction (3-phase VAMANA/DiskANN schedule for batches >= 1000)
- Pre-allocated vector storage (reserve + bulk push to minimize lock contention)
- Automatic level assignment
- Persistent storage with WAL recovery
BM25 Index
- Term frequency / inverse document frequency
- Tokenization with stopword removal
- Persistent storage
ColumnStore
- Columnar storage for typed metadata
- String interning for efficient comparisons
- RoaringBitmap for fast set operations
5. Distance Layer (SIMD)
| Metric | Implementation | Latency (768D) |
|---|---|---|
| Dot Product | AVX2 FMA | 21.7 ns |
| Euclidean | AVX2 FMA | 26.0 ns |
| Cosine | AVX2 4-acc, single-sqrt finish | 33.1 ns |
| Hamming | AVX2 FP-domain 4-acc | 35.8 ns |
| Jaccard | AVX-512 4-acc | 35.1 ns |
Per-metric numbers above are the contract values in
docs/reference/promise-contract.json. Raw micro-benchmark snapshots (March 27 2026 run on a specific machine) live inSIMD_PERFORMANCE.mdand may differ by ~10% due to methodology / cache state.
SIMD Strategy:
- Native (x86_64): AVX2/AVX-512 via
core::archintrinsics with 4-accumulator ILP - Native (aarch64): NEON 128-bit with 1-acc/4-acc variants
- WASM: scalar fallback (SIMD128 kernels planned;
wasm32dispatches toSimdLevel::Scalar) - Fallback: Scalar with loop unrolling
6. Storage Layer
Vector Data
- Memory-mapped files for large datasets
- Contiguous f32 buffer for cache locality
- Lazy loading support
Payload Storage
- JSON-based payload storage
- Nested field access with dot notation
- Type-aware indexing
WAL (Write-Ahead Log)
- Durability guarantees
- Automatic recovery on restart
- Configurable sync policy
Binary Export (WASM)
$ ┌────────┬─────────┬───────────┬────────┬─────────┬─────────────────────┐ │ "\text{VELS}" │ \text{Version} │ \text{Dimension} │ \text{Metric} │ \text{Count} │ \text{Vectors} │ │ 4 \text{bytes}│ 1 \text{byte} │ 4 \text{bytes} │ 1 \text{byte} │ 8 \text{bytes} │ (\text{id} + \text{data}) \times \text{count} │ └────────┴─────────┴───────────┴────────┴─────────┴─────────────────────┘ $
Data Flow
Vector Search Flow
Query Vector
│
▼
┌─────────────────┐
│ VelesQL Parse │ (optional)
└────────┬────────┘
│
▼
┌─────────────────┐
│ Filter Engine │ (if filters present:
│ (secondary idx +│ secondary indexes + JSON
│ JSON filters) │ payload filters)
└────────┬────────┘
│
▼
┌─────────────────┐
│ HNSW Search │
│ (entry → L0) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ SIMD Distance │
│ Calculations │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Top-K Results │
│ (min-heap) │
└────────┬────────┘
│
▼
Sorted Results
Hybrid Search Flow
Query Vector + Text Query
│
┌────┴────┐
▼ ▼
┌───────┐ ┌───────┐
│ HNSW │ │ BM25 │
│Search │ │Search │
└───┬───┘ └───┬───┘
│ │
▼ ▼
┌─────────────────┐
│ RRF Fusion │
│ (Reciprocal │
│ Rank Fusion) │
└────────┬────────┘
│
▼
Merged Results
VelesQL v2.0 Query Flow
┌─────────────────────────────────────────────────────────────────┐
│ VelesQL v2.0 Parser │
├─────────────────────────────────────────────────────────────────┤
│ SQL Query │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ Pest Grammar │ compound_query → select_stmt [set_op] │
│ └────────┬───────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ AST │ │
│ ├────────────────────────────────────────────────────────────┤ │
│ │ Query { │ │
│ │ select: SelectStatement { │ │
│ │ columns, from, joins[], where_clause, │ │
│ │ group_by, having, order_by, limit, offset, │ │
│ │ with_clause, fusion_clause │ │
│ │ }, │ │
│ │ compound: Option<CompoundQuery> { │ │
│ │ operator: UNION|INTERSECT|EXCEPT, │ │
│ │ right: SelectStatement │ │
│ │ } │ │
│ │ } │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Execution Engine │ │
│ ├────────────────────────────────────────────────────────────┤ │
│ │ 1. Filter Pushdown → ColumnStore │ │
│ │ 2. Vector Search → HNSW (if NEAR clause) │ │
│ │ 3. JOIN Execution → Cross-collection merge │ │
│ │ 4. Aggregation → GROUP BY + HAVING │ │
│ │ 5. Ordering → ORDER BY (columns, similarity) │ │
│ │ 6. Set Operations → UNION/INTERSECT/EXCEPT │ │
│ │ 7. Fusion → RRF/Weighted/Maximum (if USING FUSION) │ │
│ └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
VelesQL v2.0 Supported Syntax:
-- Aggregation with GROUP BY and HAVING
SELECT category, COUNT(*), AVG(price)
FROM products
GROUP BY category
HAVING COUNT(*) > 5 AND AVG(price) > 50
-- ORDER BY with similarity function
SELECT * FROM docs
ORDER BY similarity(vector, $query) DESC
LIMIT 10
-- JOIN across collections
SELECT * FROM orders
JOIN customers AS c ON orders.customer_id = c.id
WHERE status = 'active'
-- Set operations
SELECT * FROM active_users UNION SELECT * FROM archived_users
-- Hybrid fusion search (USING FUSION is a trailing clause: after LIMIT)
SELECT * FROM documents
LIMIT 20 USING FUSION(strategy='rrf', k=60)
Performance Characteristics
Memory Usage
| Component | Per Vector (768D) |
|---|---|
| Vector Data (f32) | 3,072 bytes |
| Vector Data (f16) | 1,536 bytes |
| Vector Data (SQ8) | 768 bytes |
| HNSW Links | ~256 bytes |
| Payload (avg) | ~200 bytes |
Throughput
| Operation | Throughput |
|---|---|
| Insert | ~3.8K-6.4K vec/sec (768D) |
| Search k=10 (10K vectors, 768D, HNSW index-only) | ~55 µs |
| Search end-to-end p50 (10K/384D, WAL ON, recall ≥ 96%) | ~450 µs |
| Search (100K vectors) | < 5 ms |
| VelesQL Parse | 1.3M queries/sec |
| Export (WASM) | 4,479 MB/s |
| Import (WASM) | 2,943 MB/s |
Platform Support
| Platform | Status | SIMD | Performance |
|---|---|---|---|
| Linux x86_64 | ✅ Full | AVX2/AVX-512 | 100% |
| Windows x86_64 | ✅ Full | AVX2 | 100% |
| macOS x86_64 | ✅ Full | AVX2 | 100% |
| macOS ARM64 | ✅ Full | NEON | ~90% |
| WASM (Browser) | ✅ Full | Scalar (SIMD128 planned) | ~70% |
| WASM (Node.js) | ✅ Full | Scalar (SIMD128 planned) | ~70% |
| iOS (ARM64) | ✅ Full | NEON | ~90% |
| Android (ARM64) | ✅ Full | NEON | ~90% |
| Android (ARMv7) | ✅ Full | Fallback | ~70% |
ARM64 (Apple Silicon / Mobile) Note
On ARM64 platforms (macOS M1/M2/M3, iOS, Android), VelesDB uses native NEON SIMD
instructions for distance calculations via the simd_native module, with both
1-accumulator and 4-accumulator variants depending on vector size.
Impact:
- Distance calculations are ~10% slower than x86_64 with AVX2
- All other operations (indexing, storage, queries) are unaffected
- Overall search latency remains in the microsecond range
Future Architecture
Planned Components
┌─────────────────────────────────────────────────────────────────────────┐
│ DISTRIBUTED LAYER (v1.0+) │
├─────────────────────────────────────────────────────────────────────────┤
│ Coordinator │ Sharding │ Replication │ Consensus (Raft) │
└─────────────────────────────────────────────────────────────────────────┘
- GPU Acceleration: CUDA kernels for large-scale (wgpu-based, optional)
v1.6.0 Architecture Improvements (Shipped)
The following architectural changes, originally identified in the January 2026 technical audit, have been implemented as of v1.6.0:
| Change | Before (v0.8.x) | After (v1.6.0) |
|---|---|---|
| Concurrency | Global RwLock<HashMap> | DashMap + 16-shard storage |
| Memory | Vec<f32> allocations per read | Zero-copy &[f32] from mmap |
| SIMD Dispatch | Per-call feature detection | OnceLock function pointer |
| Unsafe | 'static lifetime tricks | Safe self-referential via ouroboros |
See docs/internal/TECHNICAL_AUDIT_PLAN.md for the original audit plan.
Code-Truth Matrix (2026-02-26)
| Capability | Runtime module(s) | Notes |
|---|---|---|
| VelesQL parser + validation + planning | crates/velesdb-core/src/velesql/* | Query control plane and parse cache |
| Vector engine (5 metrics) | crates/velesdb-core/src/distance/*, simd_native/*, index/hnsw/* | Cosine, Euclidean, DotProduct, Hamming, Jaccard |
| Graph engine | crates/velesdb-core/src/collection/graph/* | Nodes/edges/traversal/property indexes |
| Multi-column filtering | crates/velesdb-core/src/column_store/* | Typed filters + bitmap paths |
| Hybrid execution / fusion | crates/velesdb-core/src/collection/search/query/* | Pushdown + fusion strategies |
| Storage + WAL/recovery | crates/velesdb-core/src/storage/* | mmap storage and recovery tests |
Governance links
- Operations runbook:
docs/reference/OPERATIONS_RUNBOOK.md