VelesQL Language Specification
June 30, 2026 · View on GitHub
SQL-like query language for vector + graph + column-store search in VelesDB.
Version: 3.10.0 | Last Updated: 2026-06-30 (VelesDB v3.5.0)
Overview
VelesQL is a SQL-inspired query language designed for hybrid search workloads.
It combines familiar SQL syntax with vector-specific operations (NEAR,
SPARSE_NEAR, NEAR_FUSED), graph pattern matching (MATCH), full-text search
(MATCH BM25), and collection management (CREATE, DROP).
All keywords are case-insensitive: SELECT, select, and Select are
equivalent. Identifiers (collection names, column names) are case-sensitive.
Feature Support Status
| Feature | Status | Version |
|---|---|---|
| SELECT, FROM, WHERE | Stable | 1.0 |
| NEAR vector search | Stable | 1.0 |
| similarity() function | Stable | 1.3 |
| LIMIT, OFFSET | Stable | 1.0 |
| WITH clause | Stable | 1.0 |
| ORDER BY | Stable | 2.0 |
| GROUP BY, HAVING | Stable | 2.0 |
| JOIN (INNER ... ON) | Stable | 2.0 |
| JOIN (LEFT, RIGHT, FULL OUTER) | Stable (column-store, primary-key join only — see crates/velesdb-core/src/collection/search/query/join.rs:155-171) | 2.0 |
| JOIN USING | Stable (single-column only; multi-column USING (a, b, …) parses but is rejected at execution by validate_join_condition) | 2.0 |
| Set Operations (UNION, INTERSECT, EXCEPT) | Stable | 2.0 |
| USING FUSION | Stable | 2.0 |
| NOW() / INTERVAL temporal | Stable | 2.1 |
| MATCH graph traversal | Stable | 2.1 |
| SPARSE_NEAR sparse vector search | Stable | 2.2 |
| NEAR_FUSED multi-vector fusion | Stable | 2.2 |
| TRAIN QUANTIZER command | Stable | 2.2 |
| ORDER BY arithmetic scoring | Stable | 3.0 |
| LET score bindings | Stable | 3.2 |
| Agent Memory VelesQL queries | Stable | 3.2 |
| INSERT INTO statement | Stable | 3.2 |
| UPDATE statement | Stable | 3.2 |
| CREATE COLLECTION | Stable | 3.3 |
| DROP COLLECTION | Stable | 3.3 |
| INSERT EDGE graph mutation | Stable | 3.3 |
| DELETE FROM statement | Stable | 3.3 |
| DELETE EDGE graph mutation | Stable | 3.3 |
| DISTINCT modifier | Stable | 3.3 |
| ILIKE case-insensitive pattern | Stable | 3.3 |
| FROM / JOIN aliases | Stable (INNER JOIN) | 2.0 |
| Scalar subqueries | Executed and substituted as a literal; correlated ones rejected (V010, see below) | 3.2 |
SQL comments (--) | Stable | 1.0 |
| Identifier quoting (backtick, double-quote) | Stable | 1.3 |
| SHOW COLLECTIONS | Stable | 3.4 |
| DESCRIBE COLLECTION | Stable | 3.4 |
| EXPLAIN query plan | Stable | 3.4 |
| EXPLAIN ANALYZE execution stats | API-level (via explain_analyze_query() and /query/explain?analyze=true, not a parsed VelesQL statement) | 3.8 |
| CREATE INDEX / DROP INDEX | Stable | 3.5 |
| ANALYZE | Stable | 3.5 |
| TRUNCATE | Stable | 3.5 |
| ALTER COLLECTION | Stable | 3.5 |
| Multi-row INSERT | Stable | 3.5 |
| UPSERT INTO statement | Stable | 3.5 |
| WITH (quality='...') alias | Stable | 3.5 |
| SELECT EDGES graph query | Stable | 3.5 |
| INSERT NODE graph mutation | Stable | 3.5 |
| FLUSH / FLUSH FULL | Stable | 3.6 |
| CONTAINS array filter | Stable | 3.7 |
| GEO_DISTANCE / GEO_BBOX geospatial | Stable | 3.7 |
| GROUP BY MAX(score) / AVG(score) | Stable | 3.7 |
| FIRST(column) projection | Stable | 3.7 |
| CONTAINS_TEXT strict text filter | Stable | 3.8 |
Window functions (ROW_NUMBER, RANK, DENSE_RANK) with OVER, PARTITION BY, ORDER BY | Stable | 3.9 (VelesDB v1.13.0) |
CBO feedback calibration in EXPLAIN ANALYZE | Stable | 3.10 (VelesDB v1.15.0) |
| FUSE BY fusion clause | Planned | -- |
REST Contract Notes
/querysupports top-levelMATCH, but request body must includecollection./collections/{name}/matchis the collection-scoped graph endpoint.- Canonical payload contract:
docs/reference/VELESQL_CONTRACT.md. - Conformance test matrix:
docs/reference/VELESQL_CONFORMANCE_MATRIX.md. - Recommended developer syntax for mixed filters:
SELECT ... FROM <collection> WHERE ... AND MATCH (...). WhenFROMdeclares an alias, the MATCH anchor either reuses it or — when no pattern alias matches a declared alias — binds implicitly to the FROM rows (rule V011 — see Graph Match Predicate in WHERE).
Comments
VelesQL supports single-line comments with --. Everything after -- to the end
of the line is ignored by the parser.
-- This is a comment
SELECT * FROM docs LIMIT 10 -- inline comment after a statement
Block comments (/* ... */) are not supported.
SELECT Statement
The SELECT statement is the primary way to query data from collections.
Full Syntax
[LET <name> = <expr> ...]
SELECT [DISTINCT] <columns> [, <window_fn>() OVER (...) [AS <alias>] ...]
FROM <collection> [[AS] <alias>]
[JOIN <collection2> [[AS] <alias>] ON <condition> | USING (<column>)]
[WHERE <conditions>]
[GROUP BY <columns>]
[HAVING <aggregate_condition>]
[ORDER BY <expressions>]
[LIMIT <n>]
[OFFSET <n>]
[WITH (<options>)]
[USING FUSION (<strategy>)]
SELECT Clause
Select All Columns
SELECT * FROM documents
Select Specific Columns
SELECT id, score FROM documents
SELECT id, payload.title, payload.category FROM documents
Nested Payload Fields
Access nested JSON fields using dot notation at any depth:
SELECT payload.metadata.author FROM articles
SELECT payload.stats.views, payload.stats.likes FROM posts
Column Aliases
Rename output columns with AS:
SELECT id, payload.title AS title FROM docs
SELECT id AS doc_id, category AS cat FROM products
DISTINCT Modifier
Eliminate duplicate rows from the result set. DISTINCT applies to ALL selected columns -- two rows are considered duplicates only when every selected column has the same value.
-- Unique categories
SELECT DISTINCT category FROM products
-- Unique authors from nested payload
SELECT DISTINCT payload.author FROM docs
-- DISTINCT with WHERE filter
SELECT DISTINCT category FROM products WHERE price > 50
-- DISTINCT on multiple columns (unique combinations)
SELECT DISTINCT category, status FROM products
When combined with vector search, DISTINCT deduplicates based on payload values after the search results are returned:
SELECT DISTINCT payload.source FROM docs WHERE vector NEAR $v LIMIT 20
Qualified Wildcard
Select all columns from a specific alias in a JOIN:
SELECT ctx.* FROM docs AS ctx WHERE vector NEAR $v LIMIT 10
Aggregate Functions in SELECT
SELECT COUNT(*) FROM docs
SELECT category, COUNT(*), AVG(price), MAX(rating) FROM products GROUP BY category
SELECT SUM(quantity) AS total FROM orders
| Function | Description | Argument |
|---|---|---|
COUNT(*) | Count all rows | Wildcard |
COUNT(col) | Count non-null values in column | Column name |
SUM(col) | Sum of numeric values | Column name |
AVG(col) | Average of numeric values | Column name |
MIN(col) | Minimum value | Column name |
MAX(col) | Maximum value | Column name |
MAX(score) | Max similarity score across group (v3.7+) | score pseudo-column |
AVG(score) | Mean similarity score across group (v3.7+) | score pseudo-column |
FIRST(col) | Value from highest-scoring row in group (v3.7+) | Column name |
Similarity Score in SELECT
Use the zero-argument similarity() to project the pre-computed search score:
SELECT similarity() FROM docs WHERE vector NEAR $v LIMIT 10
SELECT similarity() AS score FROM docs WHERE vector NEAR $v LIMIT 10
SELECT id, payload.title, similarity() AS relevance
FROM docs WHERE vector NEAR $v ORDER BY similarity() DESC LIMIT 5
FROM Clause
Specify the source collection. An optional alias enables shorter column references
and is required for self-joins. The AS keyword is optional: FROM documents d
and FROM documents AS d are strictly equivalent. A bare alias may not be a
reserved clause keyword (WHERE, GROUP, HAVING, ORDER, LIMIT, OFFSET,
WITH, USING, UNION, INTERSECT, EXCEPT, INNER, LEFT, RIGHT,
FULL, OUTER, JOIN, ON, AS); quote it (`limit`) to use one anyway.
-- Simple
SELECT * FROM my_collection
-- With alias
SELECT * FROM documents AS d
-- Bare alias (no AS) — identical semantics
SELECT * FROM documents d
-- Alias used in column references
SELECT d.title, d.category FROM documents AS d WHERE d.price > 50
JOIN Clause (v2.0+)
Combine data from multiple collections.
Syntax
SELECT <columns>
FROM <table1> [[AS] <alias1>]
[INNER | LEFT | RIGHT | FULL] JOIN <table2> [[AS] <alias2>]
ON <alias1>.<col> = <alias2>.<col>
Join Types
| Type | Keyword | Description | Runtime |
|---|---|---|---|
| Inner | JOIN or INNER JOIN | Only matching rows from both sides | Fully executed |
| Left | LEFT JOIN or LEFT OUTER JOIN | All left rows; unmatched right side gets NULLs for joined columns | Fully executed (column-store, primary-key only — see collection/search/query/join.rs:164-171) |
| Right | RIGHT JOIN or RIGHT OUTER JOIN | All right rows; unmatched left side gets NULLs | Fully executed (join.rs:155-162) |
| Full | FULL JOIN or FULL OUTER JOIN | All rows from both sides with NULLs on the unmatched side | Fully executed (join.rs:155-171) |
Constraint: all four JOIN types require the join column to be the right-side collection's column-store primary key. The executor rejects
JOIN ... ON other_col = ...at runtime with"JOIN on table 'T' requires primary key 'PK', got 'other_col'"(seevalidate_join_conditioninjoin.rs:177-201).
ON Condition
Specify the join equality condition:
SELECT o.id, c.name
FROM orders AS o
JOIN customers AS c ON o.customer_id = c.id
USING Clause
Shorthand when both tables share the same column name:
SELECT o.id, c.name
FROM orders AS o
JOIN customers AS c USING (customer_id)
Limitation:
USINGcurrently supports a single column only. Multi-columnUSING (a, b, …)parses successfully but is rejected at execution byvalidate_join_condition(collection/search/query/join.rs:177) with the error"JOIN on table 'T' must use ON condition or USING(single_column)".
Self-Join
Join a collection with itself using aliases:
SELECT e.name AS employee, m.name AS manager
FROM employees AS e
JOIN employees AS m ON e.manager_id = m.id
Multiple Joins
Chain multiple JOINs in a single query:
SELECT e.name, m.name AS manager, d.name AS director
FROM employees AS e
JOIN employees AS m ON e.manager_id = m.id
JOIN employees AS d ON m.manager_id = d.id
JOIN with Vector Search
SELECT p.title, c.name AS category_name
FROM products AS p
JOIN categories AS c ON p.category_id = c.id
WHERE similarity(p.embedding, $query) > 0.7
LIMIT 20
JOIN Optimizations (v1.12+)
The query engine automatically applies two optimizations to cross-collection JOINs:
Filter Pushdown: WHERE conditions that reference the joined table (using qualified column names like inventory.price > 100) are automatically pushed down and evaluated before loading points into the ColumnStore. This reduces memory usage and I/O for large joined collections.
-- inventory.stock > 0 is pushed down to filter inventory before JOIN
SELECT * FROM products
JOIN inventory ON products.id = inventory.id
WHERE inventory.stock > 0
Lookup Join: When the JOIN condition references the primary key (id) on both sides and no pushdown filters apply, the engine uses direct collection.get() lookups instead of building a full ColumnStore, achieving O(K) instead of O(N) retrieval.
-- Uses optimized lookup path (both sides reference id)
SELECT * FROM products
JOIN inventory ON products.id = inventory.id
Both optimizations are transparent and produce identical results to the non-optimized path.
WHERE Clause
The WHERE clause filters rows using conditions. Multiple conditions are combined
with AND, OR, and NOT.
Comparison Operators
| Operator | Description | Example |
|---|---|---|
= | Equal | category = 'tech' |
!= or <> | Not equal | status != 'deleted' |
> | Greater than | price > 100 |
>= | Greater or equal | score >= 0.8 |
< | Less than | count < 50 |
<= | Less or equal | rating <= 5 |
SELECT * FROM products WHERE price > 100 AND category = 'electronics'
SELECT * FROM users WHERE age >= 18 AND age <= 65
SELECT * FROM docs WHERE status <> 'deleted'
Logical Operators (AND, OR, NOT)
| Priority | Operator | Description |
|---|---|---|
| 1 (highest) | NOT | Negation |
| 2 | AND | Conjunction |
| 3 (lowest) | OR | Disjunction |
-- AND
SELECT * FROM docs WHERE category = 'tech' AND price > 50
-- OR
SELECT * FROM docs WHERE category = 'tech' OR category = 'science'
-- Parentheses to override precedence
SELECT * FROM docs WHERE (category = 'tech' OR category = 'ai') AND price > 100
NOT Expression
The NOT operator negates any condition, including comparisons, IN, BETWEEN,
and parenthesized groups.
-- Negate a comparison
SELECT * FROM docs WHERE NOT (price > 100)
-- Negate an IN list
SELECT * FROM docs WHERE NOT (category IN ('draft', 'archived'))
-- Negate a compound condition
SELECT * FROM docs WHERE NOT (status = 'deleted' AND archived = true)
-- NOT with vector search filter
SELECT * FROM docs
WHERE vector NEAR $v AND NOT (category = 'spam')
LIMIT 10
IN / NOT IN
Test membership in a list of values:
-- String values
SELECT * FROM docs WHERE category IN ('tech', 'science', 'ai')
-- Integer values
SELECT * FROM docs WHERE id IN (1, 2, 3, 4, 5)
-- NOT IN
SELECT * FROM docs WHERE category NOT IN ('draft', 'deleted')
SELECT * FROM docs WHERE id NOT IN (1, 2, 3)
Index acceleration: When a secondary index exists on the filtered column,
IN uses bitmap pre-filtering to restrict HNSW traversal to matching points
only. The engine builds a RoaringBitmap by unioning per-value B-tree lookups
(time complexity: O(N × log K) where N = list size, K = index cardinality).
NOT IN on indexed fields computes the universe bitmap minus the IN bitmap
(set subtraction). Queries on non-indexed fields fall back to post-filtering
transparently. Results are identical in both paths.
BETWEEN
Test if a value falls within a range (inclusive):
SELECT * FROM docs WHERE price BETWEEN 10 AND 100
SELECT * FROM docs WHERE date BETWEEN '2024-01-01' AND '2024-12-31'
SELECT * FROM products WHERE rating BETWEEN 3.5 AND 5.0
LIKE (Pattern Matching)
Case-sensitive pattern matching with wildcards:
| Wildcard | Meaning |
|---|---|
% | Any sequence of characters (including empty) |
_ | Exactly one character |
-- Suffix match
SELECT * FROM docs WHERE title LIKE '%database%'
-- Prefix match
SELECT * FROM docs WHERE name LIKE 'vec%'
-- Single character wildcard
SELECT * FROM docs WHERE code LIKE 'v_les'
-- Combined with other filters
SELECT * FROM docs WHERE name LIKE 'John%' AND active = true
ILIKE (Case-Insensitive Pattern Matching)
ILIKE works identically to LIKE but performs case-insensitive matching.
The same % and _ wildcards apply.
-- Matches 'Database', 'DATABASE', 'database', etc.
SELECT * FROM docs WHERE title ILIKE '%database%'
-- Case-insensitive prefix match
SELECT * FROM docs WHERE name ILIKE 'vec%'
-- Single-character wildcard (case-insensitive)
SELECT * FROM docs WHERE code ILIKE 'v_les'
-- Combined with other filters
SELECT * FROM docs
WHERE category ILIKE '%science%' AND price > 50
LIMIT 20
| Operator | Case-Sensitive | Example Match for 'hello' |
|---|---|---|
LIKE | Yes | LIKE 'hello' matches hello only |
ILIKE | No | ILIKE 'hello' matches Hello, HELLO, hello |
Unicode behavior: case folding follows Rust's default Unicode lowercase rules. Characters without a defined case mapping (e.g., CJK ideographs) are compared as-is.
IS NULL / IS NOT NULL
Test for null values:
SELECT * FROM docs WHERE category IS NULL
SELECT * FROM docs WHERE category IS NOT NULL
SELECT * FROM users WHERE email IS NOT NULL AND verified = true
CONTAINS (Array Filter, v3.7+)
Test whether an array field contains specific values. Three modes are supported:
| Syntax | Description |
|---|---|
column CONTAINS value | Array contains the single value |
column CONTAINS ANY (v1, v2, ...) | Array contains at least one of the values |
column CONTAINS ALL (v1, v2, ...) | Array contains every listed value |
-- Single value containment
SELECT * FROM hotels WHERE amenities CONTAINS 'pool' LIMIT 10
-- At least one match (OR semantics)
SELECT * FROM hotels WHERE amenities CONTAINS ANY ('spa', 'bar') LIMIT 10
-- All values must be present (AND semantics)
SELECT * FROM hotels WHERE amenities CONTAINS ALL ('pool', 'gym') LIMIT 10
-- Combined with scalar filters
SELECT * FROM hotels
WHERE amenities CONTAINS 'pool' AND rating > 4.5
LIMIT 10
-- Combined with vector search
SELECT * FROM hotels
WHERE amenities CONTAINS 'spa' OR vector NEAR $v
LIMIT 10
-- Integer arrays
SELECT * FROM items WHERE scores CONTAINS 10 LIMIT 10
-- NOT negation
SELECT * FROM hotels WHERE NOT (amenities CONTAINS 'pool') LIMIT 10
-- Multiple array fields
SELECT * FROM hotels
WHERE amenities CONTAINS 'gym' AND tags CONTAINS 'luxury'
LIMIT 10
Behavior on edge cases:
- Empty array (
[]): CONTAINS returns no match - Null/missing field: row excluded from results
- Non-array field: CONTAINS returns no match (use
LIKEfor substring matching) - Non-existent field: returns empty result set
Full-Text Search (MATCH)
The MATCH operator performs BM25 full-text search against the collection's
text index.
MATCH alone -- returns BM25 text search results ranked by text relevance:
SELECT * FROM docs WHERE content MATCH 'vector database' LIMIT 10
MATCH + NEAR (hybrid search) -- triggers Reciprocal Rank Fusion (RRF)
between vector similarity and BM25 text relevance. In this mode, MATCH acts
as a score boost, not a strict filter. Results that do not contain the
keyword may still appear with lower fused scores:
-- Hybrid: vector similarity fused with BM25 text relevance via RRF
SELECT * FROM docs WHERE vector NEAR $v AND content MATCH 'database' LIMIT 10
Multi-condition hybrid search:
SELECT * FROM docs
WHERE vector NEAR $v AND title MATCH 'vector database' AND category = 'tech'
LIMIT 10
Known Limitations (v1.9.0):
- Column parameter ignored: The column name (e.g.,
content) is parsed but the execution engine searches all indexed text fields regardless.
Tip: For strict text filtering (exclude results that do not contain a keyword), use
CONTAINS_TEXTinstead ofMATCH. See the CONTAINS_TEXT section below.
Strict Text Filter (CONTAINS_TEXT, v3.8+)
The CONTAINS_TEXT operator performs case-sensitive substring matching on a
string payload field. Unlike MATCH (which boosts via RRF), CONTAINS_TEXT
is a strict filter: any result whose target field does not contain the
specified substring is excluded from the result set.
column CONTAINS_TEXT 'substring'
The keyword is case-insensitive (contains_text, Contains_Text, CONTAINS_TEXT
are all valid). The substring match itself is case-sensitive.
Standalone metadata filter — returns all documents where the field contains the substring:
SELECT * FROM docs WHERE content CONTAINS_TEXT 'rust' LIMIT 10
With vector search (hybrid strict filter) — applies as a post-filter on vector search results, guaranteeing every returned result contains the substring:
-- Every result is guaranteed to contain 'database' in the content field
SELECT * FROM docs
WHERE vector NEAR $v AND content CONTAINS_TEXT 'database'
LIMIT 10
Combined with MATCH (boost + strict filter) — MATCH provides RRF score
boosting while CONTAINS_TEXT enforces strict inclusion:
-- MATCH boosts results mentioning 'database' via RRF
-- CONTAINS_TEXT guarantees every result actually contains 'database'
SELECT * FROM docs
WHERE vector NEAR $v
AND content MATCH 'database'
AND content CONTAINS_TEXT 'database'
LIMIT 10
Combined with other filters:
SELECT * FROM docs
WHERE content CONTAINS_TEXT 'rust' AND category = 'tech'
LIMIT 10
MATCH vs CONTAINS_TEXT
| Feature | MATCH | CONTAINS_TEXT |
|---|---|---|
| Purpose | BM25 text search / RRF boost | Strict substring filter |
| With NEAR | Boosts score via RRF fusion | Excludes non-matching results |
| Non-matching results | May still appear (lower score) | Always excluded |
| Matching | BM25 tokenized relevance | Case-sensitive str::contains() |
| Best for | "Rank by text relevance" | "Must contain this text" |
Edge Cases
| Scenario | Behavior |
|---|---|
CONTAINS_TEXT '' (empty string) | Matches all string fields (every string contains "") |
| Missing payload field | Returns false (row excluded) |
| Non-string field (e.g., integer) | Returns false (row excluded) |
| All results filtered out | Empty result set, no error |
| Unicode text | Supported (case-sensitive byte-level matching) |
Vector Search (NEAR)
Use NEAR for approximate nearest neighbor search on dense vectors:
-- With parameter placeholder
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10
-- With literal vector
SELECT * FROM docs WHERE vector NEAR [0.1, 0.2, 0.3, 0.4] LIMIT 5
-- With metadata filter
SELECT * FROM docs
WHERE vector NEAR $v AND category = 'tech' AND price > 50
LIMIT 10
-- With similarity score projection and ordering
SELECT id, title, similarity() AS score
FROM docs WHERE vector NEAR $query
ORDER BY similarity() DESC
LIMIT 5
-- With search quality tuning
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10
WITH (mode = 'accurate', ef_search = 256)
Sparse Vector Search (SPARSE_NEAR, v2.2+)
Use SPARSE_NEAR for sparse vector similarity search (SPLADE, BM42, or custom
sparse embeddings).
-- With parameter placeholder
SELECT * FROM docs WHERE vector SPARSE_NEAR $sv LIMIT 10
-- With inline sparse vector literal (dimension_index: weight)
SELECT * FROM docs
WHERE vector SPARSE_NEAR {0: 0.9, 14: 0.7, 256: 0.5, 1024: 0.3}
LIMIT 10
-- With named sparse index
SELECT * FROM docs
WHERE vector SPARSE_NEAR $sparse USING 'bm42_index'
LIMIT 10
-- Combined dense + sparse (hybrid)
SELECT * FROM docs
WHERE vector NEAR $dense AND vector SPARSE_NEAR $sparse
LIMIT 10
Sparse vectors use {index: weight} format where index is a non-negative
integer (dimension) and weight is a float value. Scoring uses inner product
with MaxScore DAAT for efficient top-K retrieval.
Multi-Vector Fusion (NEAR_FUSED, v2.2+)
NEAR_FUSED combines multiple embedding vectors into a single similarity search
using a fusion strategy (multi-modal or ensemble search). It is executed by
routing to the engine's multi-vector fusion (multi_query_search): each query
vector runs its own search and the rankings are fused. USING FUSION 'rrf'
(default, k=60), 'average', and 'maximum' are honored; other strategies
fall back to RRF. An optional AND <metadata> predicate is applied as a
pre-fusion filter.
NEAR_FUSED must be the only vector predicate in the WHERE: combining it
with OR/NOT, or with another NEAR / similarity() / SPARSE_NEAR, is
rejected (it can only be AND-ed with a metadata filter).
-- Two-vector fusion with RRF
SELECT * FROM products
WHERE vector NEAR_FUSED [$text_emb, $image_emb]
USING FUSION 'rrf' (k=60)
LIMIT 20
-- Three-vector ensemble
SELECT * FROM docs
WHERE vector NEAR_FUSED [$query_v1, $query_v2, $query_v3]
USING FUSION 'maximum'
LIMIT 10
-- Without explicit fusion (defaults to RRF)
SELECT * FROM docs
WHERE vector NEAR_FUSED [$v1, $v2]
LIMIT 5
The vector array [$v1, $v2, ...] accepts parameters ($name) or vector
literals ([0.1, 0.2, ...]).
| Strategy | Best For | Parameters |
|---|---|---|
rrf | General-purpose ensemble (default) | k (default: 60) |
average | Balanced score blending | (none) |
maximum | Conservative high-precision | (none) |
Only rrf, average, and maximum are honored. Any other strategy name
(e.g. rsf, weighted) is rejected at validation — per-branch dense/sparse
weights are ill-defined over N homogeneous query vectors.
Similarity Function (v1.3+)
The similarity() function enables threshold-based vector filtering -- filter
results by similarity score rather than just finding nearest neighbors.
Syntax
similarity(field, vector_expr) <operator> threshold
| Parameter | Description |
|---|---|
field | The vector field name (e.g., vector, embedding) |
vector_expr | A parameter ($v) or literal vector ([0.1, 0.2, ...]) |
operator | Comparison: >, >=, <, <=, = |
threshold | Similarity score (0.0 to 1.0 for cosine/dot) |
Examples
-- Find documents with similarity > 0.8
SELECT * FROM docs WHERE similarity(vector, $query) > 0.8
-- High precision filtering (>= 0.9)
SELECT * FROM docs WHERE similarity(embedding, $v) >= 0.9 LIMIT 10
-- Exclude very similar documents (deduplication)
SELECT * FROM docs WHERE similarity(vector, $ref) < 0.95
-- Combined with metadata filters
SELECT * FROM docs
WHERE similarity(vector, $q) > 0.7 AND category = 'technology'
LIMIT 20
NEAR vs similarity()
| Feature | NEAR | similarity() |
|---|---|---|
| Purpose | Find K nearest neighbors | Filter by score threshold |
| Returns | Top-K results | All matching results |
| Control | LIMIT N | Threshold value |
| Best for | "Find similar" | "Filter by quality" |
-- NEAR: "Give me 10 most similar docs"
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10
-- similarity(): "Give me docs with similarity > 0.8"
SELECT * FROM docs WHERE similarity(vector, $v) > 0.8
Temporal Functions (v2.1+)
VelesQL supports temporal expressions for date/time filtering using NOW()
and INTERVAL.
NOW() Function
Returns the current Unix timestamp (seconds since epoch):
SELECT * FROM events WHERE timestamp > NOW()
SELECT * FROM items WHERE created_at > NOW()
INTERVAL Expression
Defines a time duration with the syntax INTERVAL '<magnitude> <unit>':
| Unit | Aliases | Seconds |
|---|---|---|
| seconds | s, sec, second | 1 |
| minutes | m, min, minute | 60 |
| hours | h, hour | 3,600 |
| days | d, day | 86,400 |
| weeks | w, week | 604,800 |
| months | month | ~2,592,000 (30 days) |
Temporal Arithmetic
Combine NOW() with INTERVAL using + or -:
-- Last 7 days
SELECT * FROM logs WHERE created_at > NOW() - INTERVAL '7 days'
-- Last hour
SELECT * FROM events WHERE timestamp > NOW() - INTERVAL '1 hour'
-- Next week (future events)
SELECT * FROM tasks WHERE due_date < NOW() + INTERVAL '7 days'
-- Shorthand units
SELECT * FROM metrics WHERE ts > NOW() - INTERVAL '30 min'
-- Filter items older than 30 days
SELECT * FROM logs WHERE created_at < NOW() - INTERVAL '30 days'
Common Temporal Patterns
| Use Case | Query |
|---|---|
| Last 24 hours | WHERE ts > NOW() - INTERVAL '24 hours' |
| This week | WHERE ts > NOW() - INTERVAL '7 days' |
| Last month | WHERE ts > NOW() - INTERVAL '1 month' |
| Recent activity | WHERE last_seen > NOW() - INTERVAL '5 minutes' |
Note:
NOW()returns a Unix timestamp in seconds (timezone-agnostic). Month intervals are approximated as 2,592,000 seconds (30 days).
Scalar Subqueries (v3.2+)
A scalar subquery in WHERE/HAVING (or an INSERT/UPDATE value) is
executed and substituted as a literal before the outer query runs
(EPIC-039). It compares against a value computed from another (or the same)
collection, and must return exactly one row and one column.
-- Executes: the subquery resolves to AVG(amount), then the outer filter runs.
SELECT * FROM orders
WHERE amount > (SELECT AVG(amount) FROM orders)
A subquery is enclosed in parentheses and contains a full SELECT statement
(with optional WHERE, GROUP BY, HAVING, LIMIT). It runs through the core query
engine, so the same query executes identically on the REST /query endpoint and
the CLI REPL (both route through Database::execute_query).
Cardinality contract:
| Subquery result | Behavior |
|---|---|
| Exactly one row, one column | resolves to that scalar value |
| 0 rows | resolves to NULL (a comparison against NULL is never true, so the outer query returns no matching rows) |
| > 1 row | error VELES-010 (Query): returned N rows but must return at most one row |
> 1 column (e.g. SELECT *, multiple columns) | error VELES-010 (Query): must select exactly one column |
Supported forms:
- Aggregate subqueries —
(SELECT AVG(amount) FROM t),COUNT(*),MIN/MAX/SUM. - Single-column row projection —
(SELECT amount FROM t WHERE id = 1). WHERE,HAVING,BETWEEN,IN (...),CONTAINS (...)value positions.INSERT ... VALUES (..., (SELECT MAX(x) FROM t))andUPDATE ... SET c = (SELECT ...).UPDATE ... WHEREandDELETE ... WHEREpredicates — the subquery is resolved to a literal before the mutation runs, soUPDATE t SET flagged = true WHERE amount > (SELECT AVG(amount) FROM t)updates exactly the matching rows, andDELETE FROM t WHERE id = (SELECT … )deletes the resolved id (DELETE … WHERE idstill requires the resolved value to be an integer).- A subquery whose inner
WHEREfilters on a payload path — e.g.(SELECT AVG(price) FROM t WHERE meta.cat = 5)— is a plain payload filter on the subquery's own collection, not a correlation, so it executes normally. - Nesting up to 8 levels deep (a deeper nest errors with
VELES-010).
Not yet supported (rejected at validation with V010):
- Correlated subqueries — a subquery whose inner
WHEREreferences one of the outer query's tables/aliases (e.g. with the outer query... FROM orders ..., the inner(SELECT AVG(x) FROM s WHERE orders.id = 5)). A dotted reference is only correlated when its prefix names an outer table/alias; a payload path such asmeta.catis not. Rewrite without the outer reference.
Surface note. Scalar subqueries are resolved in
velesdb-core, so the REST/queryendpoint and the CLI REPL support them. WASM uses a separate executor that does not yet resolve subqueries and still rejects them.
Graph Match Predicate in WHERE
Embed a graph pattern inside a WHERE clause using the MATCH keyword:
SELECT * FROM docs
WHERE category = 'tech' AND MATCH (d:Doc)-[:REL]->(x)
LIMIT 10
This filters rows that participate in the specified graph pattern, combined with any other scalar conditions.
Anchor rule (V011) — explicit and implicit. The first node of the pattern (the anchor) must carry an alias. The pattern binds to the FROM rows:
- Explicitly — when the anchor alias is one of the aliases declared by
FROMor aJOIN, the pattern is anchored on the rows of that table. - Implicitly — when no alias in the pattern matches a declared alias,
the leftmost node binds to the FROM rows. Three guards apply:
- G1: if a pattern alias does match a declared alias, the anchor must be that alias (catches inverted pattern directions);
- G2: the implicit anchor alias must not appear in another
MATCHpredicate of the same WHERE clause — chain into a single pattern instead, e.g.MATCH (m)-[:R]->(f)-[:S]->(g); - G3: the anchor node must not carry a
@collectionoverride (it would resolve outside the FROM collection).
- When
FROMhas no alias, any anchor alias is accepted.
NOT MATCH applies the same rule uniformly (exact dual of the positive case).
Note. Implicit binding is not Cypher's existential MATCH semantics: the leftmost node always binds to the candidate row.
MATCH (ctx)-[:R]->(f)keeps the rows that have an outgoingRedge — it does not test whether the pattern exists anywhere in the graph.
Violations are rejected at validation time with error code V011
(MATCH predicate anchor must be an alias declared in FROM/JOIN):
-- OK: anchor 'd' matches the FROM alias (explicit)
SELECT * FROM docs AS d
WHERE category = 'tech' AND MATCH (d)-[:REL]->(x)
LIMIT 10
-- OK: no pattern alias matches 'd' — 'ctx' binds implicitly to the docs rows
SELECT * FROM docs AS d
WHERE category = 'tech' AND MATCH (ctx)-[:REL]->(x)
LIMIT 10
-- Rejected with V011 (G1): the FROM alias 'd' is in a non-anchor position;
-- anchor the pattern on it instead, e.g. MATCH (d)-[:REL]->(w)
SELECT * FROM docs AS d
WHERE category = 'tech' AND MATCH (w)-[:REL]->(d)
LIMIT 10
Execution note (GraphFirst anchored fetch). When every MATCH (...)
predicate is AND-required by the WHERE clause (not wrapped in OR/NOT),
the engine evaluates the graph patterns FIRST and fetches within their
anchor sets — retrieval is then exhaustive: a matching row is returned no
matter how it ranks globally. This covers vector NEAR (anchor sets up to
10 000 ids are scored exactly; larger ones go through the bitmap-filtered
HNSW path), metadata-only fetches (the anchors are hydrated directly),
sparse-only fetches (anchors feed the sparse index's per-id filter), and
NOT similarity() scans (restricted to the anchors).
Execution note (over-fetch window, residual shapes). Query shapes that
cannot use the anchored fetch keep the windowed execution: graph predicates
under OR/NOT, combinations with a similarity() threshold cascade, BM25
text MATCH fusion, or hybrid dense+sparse fusion. There, a ranked fetch
retrieves up to max(LIMIT, min(LIMIT × 10, 10 000)) candidates and keeps
those that satisfy the pattern (rows ranked beyond the window are not
returned — increase LIMIT to widen it); unranked shapes scan up to 100 000
candidates in storage order.
Vector Search with Filters
Combine vector search with any number of metadata filters:
SELECT * FROM docs
WHERE vector NEAR $v
AND category = 'tech'
AND price > 50
AND status IS NOT NULL
AND tags IN ('ai', 'ml')
LIMIT 10
GROUP BY Clause (v2.0+)
Group results by one or more columns and apply aggregate functions.
-- Count by category
SELECT category, COUNT(*) FROM products GROUP BY category
-- Multiple aggregates
SELECT category, COUNT(*), AVG(price), MAX(rating)
FROM products
GROUP BY category
-- Nested payload fields
SELECT payload.author, payload.metadata.language, COUNT(*)
FROM articles
GROUP BY payload.author, payload.metadata.language
-- With vector similarity threshold
SELECT category, COUNT(*)
FROM documents
WHERE similarity(embedding, $query) > 0.6
GROUP BY category
ORDER BY COUNT(*) DESC
LIMIT 10
Vector-Search GROUP BY (Parent-Document Retrieval, v3.7+)
When combined with a vector NEAR search, GROUP BY enables parent-document retrieval from chunked collections. This groups search results by a parent field and aggregates similarity scores across chunks.
Use MAX(score) for ColBERT-style MaxSim (max score across chunks),
AVG(score) for mean similarity, and FIRST(column) to surface the
excerpt from the highest-scoring chunk.
-- Parent-document retrieval with MAX_SIM scoring
SELECT parent_id, MAX(score) AS relevance, FIRST(text) AS excerpt
FROM chunks
WHERE vector NEAR $embedding
GROUP BY parent_id
ORDER BY relevance DESC
LIMIT 20
-- AVG_SIM scoring (mean across chunks)
SELECT document_id, AVG(score) AS avg_sim
FROM paragraphs
WHERE vector NEAR $query
GROUP BY document_id
ORDER BY avg_sim DESC
LIMIT 10
-- Combined with metadata filter
SELECT parent_id, MAX(score) AS relevance, FIRST(text) AS excerpt
FROM chunks
WHERE vector NEAR $v AND category = 'science'
GROUP BY parent_id
ORDER BY relevance DESC
LIMIT 10
| Function | Description | Requires |
|---|---|---|
MAX(score) | Maximum similarity score across chunks in group | NEAR + GROUP BY |
AVG(score) | Mean similarity score across chunks in group | NEAR + GROUP BY |
FIRST(col) | Value of col from the highest-scoring chunk | GROUP BY |
Behavior:
- Chunks missing the GROUP BY field are silently skipped
FIRST(col)returnsnullif the best chunk lacks the specified column- Multiple
FIRSTprojections all come from the same highest-scoring chunk FIRST(*)is not supported (parse error)MAX(score)/AVG(score)withoutNEARreturns an errorFIRST(col)withoutGROUP BYreturns an error
HAVING Clause (v2.0+)
Filter groups after aggregation. HAVING conditions compare aggregate functions
against values using AND/OR.
-- Single condition
SELECT category, COUNT(*) FROM docs
GROUP BY category
HAVING COUNT(*) > 10
-- Multiple conditions
SELECT category, COUNT(*) FROM items
GROUP BY category
HAVING COUNT(*) > 10 AND AVG(price) < 100
-- OR logic
SELECT region, SUM(amount) FROM sales
GROUP BY region
HAVING SUM(amount) > 1000 OR COUNT(*) > 50
-- Full pipeline: GROUP BY + HAVING + ORDER BY
SELECT category, COUNT(*) AS total, AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING COUNT(*) > 5
ORDER BY AVG(price) DESC
Window Functions (v1.13.0+)
Window functions compute a value for each row based on a "window" of related rows,
without collapsing the result set the way GROUP BY does. VelesDB supports
three ranking functions over a PARTITION BY / ORDER BY window.
Syntax
<function>() OVER ([PARTITION BY <expr> [, <expr> ...]] [ORDER BY <expr> [ASC|DESC] [, <expr> ...]])
Place the window call in the SELECT list, optionally with an alias.
Supported Functions
| Function | Description |
|---|---|
ROW_NUMBER() | Sequential position within the window (1, 2, 3, ...). Ties are broken by input order. |
RANK() | Rank with gaps after ties. Tied rows share a rank; the next distinct row skips ahead (e.g. 1, 2, 2, 4). |
DENSE_RANK() | Rank without gaps (e.g. 1, 2, 2, 3). |
All three functions take no arguments; the OVER (...) clause is mandatory.
Semantics
PARTITION BY(optional) groups rows into independent windows. Each partition restarts the row counter / rank at 1.ORDER BY(optional insideOVER) defines the ordering within each partition. Without anORDER BY, ranks are assigned in the order rows arrive at the evaluator — deterministic for a given input but not meaningful.- Stability: when two rows tie on the
ORDER BYkey, relative input order is preserved (stable ordering).
Evaluation Order
Window functions are evaluated after WHERE, GROUP BY, HAVING, and
DISTINCT, and before the outer ORDER BY / LIMIT / OFFSET.
Each window call pre-captures its inputs and emits one output value per
input row; the outer ORDER BY then sees those values as ordinary
projected columns.
Grammar: grammar.pest:window_item, grammar.pest:over_clause.
Evaluator: crates/velesdb-core/src/velesql/window_evaluator.rs.
Examples
-- Rank products by price within each category
SELECT
category,
name,
price,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS rank
FROM products
Expected output (shape, not literal rows):
category name price rank
tech laptop-pro 1999 1
tech laptop-mid 999 2
tech laptop-entry 499 3
books encyclopedia 120 1
books paperback 12 2
-- DENSE_RANK on similarity scores inside each department
SELECT
payload.department AS dept,
payload.title,
similarity() AS score,
DENSE_RANK() OVER (PARTITION BY payload.department ORDER BY similarity() DESC) AS r
FROM docs
WHERE vector NEAR $query
LIMIT 100
-- Global row numbering, no partition
SELECT
id,
payload.title,
ROW_NUMBER() OVER (ORDER BY payload.created_at DESC) AS seq
FROM posts
LIMIT 50
Limitations
- Framing clauses (
ROWS BETWEEN ... AND ...,RANGE,GROUPS) are not supported. All three functions operate on the full partition. - Aggregate windows (
SUM() OVER,AVG() OVER,LAG,LEAD,NTILE,PERCENT_RANK,CUME_DIST,FIRST_VALUE,LAST_VALUE) are not yet supported. - A single query may contain multiple window calls, each with its own
OVER (...)clause.
ORDER BY Clause (v2.0+)
Sort results by one or more expressions, each with an optional direction.
Direction
| Keyword | Description |
|---|---|
ASC | Ascending (default) |
DESC | Descending |
Basic Sorting
SELECT * FROM docs ORDER BY created_at DESC
SELECT * FROM docs ORDER BY category ASC, price DESC
Order by Similarity
-- Highest similarity first
SELECT * FROM docs
WHERE vector NEAR $v
ORDER BY similarity() DESC
LIMIT 10
-- Two-argument similarity
SELECT * FROM docs
WHERE similarity(embedding, $query) > 0.5
ORDER BY similarity(embedding, $query) DESC
LIMIT 10
-- Multi-column with similarity
SELECT * FROM docs
WHERE vector NEAR $v
ORDER BY similarity() DESC, created_at DESC
LIMIT 20
CBO routing (v1.13.0+): When
ORDER BYcontainssimilarity(<col>, $q)(two-argument form) against a vector column with an HNSW index, the cost-based optimiser (CBO) automatically routes the query through the native HNSW path — even if the query does not contain an explicitNEARclause. The planner rewrites the sort into a top-k vector search and short-circuits when aLIMITis present. The legacy full-scan-then-sort path remains available and is chosen by the CBO when the top-k target is large relative to the collection size or when no HNSW index exists for the column. Seecrates/velesdb-core/src/velesql/planner/similarity_route.rsanddocs/guides/TUNING_GUIDE.mdfor the routing heuristics.
Order by Aggregate
SELECT category, COUNT(*) AS cnt
FROM products
GROUP BY category
ORDER BY COUNT(*) DESC
LIMIT 10
Arithmetic Scoring (v3.0+)
Combine multiple score components with arithmetic expressions in ORDER BY:
-- Weighted hybrid scoring: 70% vector + 30% text relevance
SELECT * FROM docs
WHERE vector NEAR $v AND content MATCH 'machine learning'
ORDER BY 0.7 * vector_score + 0.3 * bm25_score DESC
LIMIT 10
-- Average of two scores
SELECT * FROM docs
WHERE vector NEAR $v
ORDER BY (vector_score + graph_score) / 2 DESC
LIMIT 10
-- Boost similarity
SELECT * FROM docs
WHERE vector NEAR $v
ORDER BY similarity() * 2 + 1 ASC
LIMIT 10
-- Hybrid: vector + BM25 + graph
SELECT * FROM docs
WHERE vector NEAR $v AND content MATCH 'AI'
ORDER BY 0.5 * similarity() + 0.5 * bm25_score DESC
LIMIT 10
Arithmetic expressions support +, -, *, /, parentheses, numeric
literals, score variables, and similarity().
Score Variables
| Variable | Source | When Populated |
|---|---|---|
vector_score | HNSW dense search | NEAR clause present |
bm25_score | BM25 full-text search | MATCH text clause present |
sparse_score | Sparse vector search | SPARSE_NEAR clause present |
graph_score | Graph traversal | MATCH graph pattern present |
fused_score | After fusion pipeline | USING FUSION applied |
similarity() | Primary search score | Any search clause |
Component scores are populated independently by each search path. In hybrid
queries (NEAR + MATCH text), vector_score and bm25_score have different
values reflecting each component's individual contribution before fusion.
| Query Type | vector_score | bm25_score | sparse_score | graph_score |
|---|---|---|---|---|
| NEAR only | Populated | 0 | 0 | 0 |
| MATCH text only | 0 | Populated | 0 | 0 |
| NEAR + MATCH | Populated | Populated | 0 | 0 |
| SPARSE_NEAR | 0 | 0 | Populated | 0 |
| MATCH (graph) | 0 | 0 | 0 | Populated |
When a score variable is not populated, it defaults to 0.
similarity() always returns the primary search score regardless of type.
fused_score is populated after USING FUSION is applied.
LIMIT and OFFSET
-- Limit results
SELECT * FROM docs LIMIT 10
-- Pagination: skip first 20, return next 10
SELECT * FROM docs LIMIT 10 OFFSET 20
Default LIMIT. Unlike standard SQL, VelesQL applies LIMIT 10 by default
to every SELECT statement that has no explicit LIMIT clause — vector NEAR,
sparse, scalar-filter, and hybrid forms alike. VelesQL is ANN-first: a SELECT
is a top-k retrieval, so an unbounded default would defeat top-k vector search.
EXPLAIN surfaces the implicit limit as Limit: 10 (default).
Exceptions (no implicit LIMIT 10):
MATCH ... RETURNgraph queries return all matching rows, bounded only by the server-wide ceiling of 100 000 rows per query;- compound queries (
UNION/INTERSECT/EXCEPT) evaluate their operands up to the same 100 000-row ceiling; only an explicit outerLIMITcaps the merged result.
Always specify LIMIT explicitly — both to bound vector search and for
exhaustive retrieval beyond the default 10 rows.
LET Clause (Score Bindings, v3.2+)
Define named score bindings evaluated once and reusable in SELECT and ORDER BY. LET clauses appear before the SELECT statement.
Syntax
LET <name> = <arithmetic_expression>
Rules
- LET clauses appear before the SELECT statement.
- Each binding has a name and an arithmetic expression.
- Bindings can reference earlier bindings (forward references are invalid).
- Available in ORDER BY and SELECT projection.
- NOT available in WHERE (WHERE runs before scores exist).
- LET names take highest priority in variable resolution (overrides component scores).
- Case-insensitive keyword (
LET,let,Letall work).
Unsupported query shapes
LET bindings are evaluated in the SELECT finalization stage. Query shapes that take a dedicated early-return execution path bypass that stage, so the engine rejects (rather than silently dropping) a LET clause on any of these:
| Shape | Trigger | Error |
|---|---|---|
| Graph pattern | MATCH (...) RETURN ... | LET bindings are not supported with MATCH queries in this version |
| Sparse vector | WHERE vector SPARSE_NEAR ... | LET bindings are not supported with SPARSE_NEAR queries in this version |
| Multi-vector fusion | WHERE vector NEAR_FUSED ... | LET bindings are not supported with NEAR_FUSED queries in this version |
| Negated similarity | WHERE NOT similarity(...) ... | LET bindings are not supported with NOT similarity() queries in this version |
| OR / union | WHERE similarity(...) OR ... | LET bindings are not supported with OR/union queries in this version |
LET is supported on dense NEAR, text MATCH '...', hybrid NEAR + MATCH
text, and scalar-filter SELECTs.
Ordering by a built-in score on the excluded shapes
These shapes reject only the LET ... = ... clause, but ORDER BY still runs
in their own finalization. A bare built-in score variable in ORDER BY
(vector_score, bm25_score, sparse_score, graph_score, fused_score)
ranks by that component score, resolved from the result's component breakdown —
identical to the arithmetic form. (An absent component defaults to 0 on a
tagged result, per the score-variable rules above.)
-- Ranks by sparse score (bare and arithmetic forms are equivalent):
SELECT * FROM docs
WHERE vector SPARSE_NEAR $sparse
ORDER BY sparse_score DESC
LIMIT 5
Expression Support
LET expressions support the same arithmetic as ORDER BY:
LET boost = 1.5
LET weighted = 0.7 * vector_score + 0.3 * bm25_score
LET final_score = weighted * boost
LET sim = similarity()
Available variables: vector_score, bm25_score, graph_score, sparse_score,
fused_score, similarity(), numeric literals, and earlier LET bindings.
Examples
-- RAG scoring: blend vector similarity with text relevance
-- (the evaluated binding 'relevance' is injected into each row's payload;
-- note that `SELECT *, expr` is not valid syntax — use `*` or an explicit list)
LET relevance = 0.7 * vector_score + 0.3 * bm25_score
SELECT *
FROM documents
WHERE vector NEAR $query AND content MATCH 'machine learning'
ORDER BY relevance DESC
LIMIT 10
-- Chained bindings: later bindings can reference earlier ones
LET base = 0.6 * vector_score + 0.4 * bm25_score
LET boosted = base * 1.5
SELECT * FROM docs
WHERE vector NEAR $query AND content MATCH 'AI'
ORDER BY boosted DESC
LIMIT 5
-- A LET clause on a MATCH graph query is REJECTED (see "Unsupported query
-- shapes" above). MATCH runs on a dedicated traversal path that bypasses
-- LET evaluation, so the engine returns an explicit error rather than
-- silently discarding the binding:
-- LET x = similarity() MATCH (a)-[r]->(b) RETURN a LIMIT 5
-- -> Error: LET bindings are not supported with MATCH queries in this version
-- Rank graph rows with RETURN ... ORDER BY instead.
-- Sparse vector queries also reject LET, but `ORDER BY` runs: rank by the
-- built-in sparse_score variable directly (the bare form resolves the component
-- score, just like the arithmetic form `ORDER BY sparse_score * 1.0 DESC`):
SELECT * FROM docs
WHERE vector SPARSE_NEAR $sparse
ORDER BY sparse_score DESC
LIMIT 5
WITH Clause (Search Options)
Control search behavior with per-query configuration overrides.
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10
WITH (mode = 'accurate', ef_search = 512, timeout_ms = 5000)
Available Options
| Option | Type | Values | Description |
|---|---|---|---|
mode | string | fast, balanced, accurate, perfect, autotune | Search quality preset (maps to ef_search: 64/128/512/4096/auto) |
quality | string | same as mode | Alias for mode (v3.5+). If both are set, mode takes precedence. |
ef_search | integer | 16--4096 | HNSW ef_search parameter (overrides mode) |
timeout_ms | integer | >= 100 | Per-query timeout in milliseconds |
rerank | boolean | true/false | Two-stage SIMD reranking (retrieves 4x candidates, re-ranks with exact distance) |
quantization | string | f32, int8, dual, auto | Quantization mode for search |
oversampling | float | >= 1.0 | Oversampling ratio for dual-precision mode |
max_groups (alias group_limit) | integer | 1 .. 1,000,000 | GROUP BY group budget. Lowers the default (10,000); clamped down to the server ceiling of 1,000,000 — cannot raise it. See GROUP BY. |
Query guard-rails. Two hard limits protect the server from adversarial queries. (1) A query is rejected before parsing if its length exceeds the configured
max_query_lengthor its bracket/NOTnesting depth exceeds 64 (prevents a parser stack-overflow DoS). (2)WITH (max_groups = N)for GROUP BY is clamped to a server-side ceiling of 1,000,000 groups; the query can lower its budget but never raise the server memory ceiling.
Examples
-- Fast search for autocomplete
SELECT * FROM suggestions WHERE vector NEAR $v LIMIT 5 WITH (mode = 'fast')
-- High accuracy for production
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10 WITH (mode = 'accurate')
-- Custom ef_search
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10 WITH (ef_search = 512)
-- Combined options
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10
WITH (mode = 'balanced', ef_search = 256, rerank = true)
-- Dual quantization with oversampling
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10
WITH (quantization = 'dual', oversampling = 2)
-- quality alias for mode (v3.5+)
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10 WITH (quality = 'accurate')
USING FUSION -- Hybrid Search (v2.0+)
Combine multiple search strategies with result fusion.
Syntax
SELECT * FROM docs
WHERE vector NEAR $v AND content MATCH 'query'
USING FUSION(strategy = 'rrf', k = 60)
Fusion Strategies
| Strategy | Description | Parameters | Use Case |
|---|---|---|---|
rrf | Reciprocal Rank Fusion | k (default: 60) | Balanced ranking (default) |
weighted | Weighted combination | vector+text: vector_weight, graph_weight; dense+sparse: dense_weight, sparse_weight | Custom importance |
maximum | Take highest score | (none) | Best match wins |
rsf | Relative Score Fusion | dense_weight, sparse_weight (must sum to 1.0) | Dense + sparse blending |
USING FUSION requires at least two fusable branches (e.g.
vector NEAR
MATCH, orvector NEAR+vector SPARSE_NEAR) or a singleNEAR_FUSEDpredicate. Applied to a single-branch query it is rejected at validation with error codeV012(FusionMisconfigured). The same code covers RSF weights that do not sum to 1.0, negative weights, andweighted/rsfon aNEAR_FUSEDpredicate.dense_w/sparse_ware accepted as short aliases ofdense_weight/sparse_weight. Unknown strategy names and unknown option keys are rejected (no silent RRF fallback).
Examples
USING FUSION(...)is a trailing clause: it must come afterLIMIT(andOFFSET), not before it.
-- Default RRF fusion
SELECT * FROM docs
WHERE vector NEAR $v AND content MATCH 'neural networks'
LIMIT 10 USING FUSION(strategy = 'rrf', k = 60)
-- Weighted fusion (70% vector, 30% graph/text)
SELECT * FROM docs
WHERE vector NEAR $semantic AND content MATCH 'machine learning'
LIMIT 20 USING FUSION(strategy = 'weighted', vector_weight = 0.7, graph_weight = 0.3)
-- Dense + sparse hybrid with RSF
SELECT * FROM docs
WHERE vector NEAR $dense AND vector SPARSE_NEAR $sparse
LIMIT 10 USING FUSION(strategy = 'rsf', dense_weight = 0.7, sparse_weight = 0.3)
-- Maximum score fusion (requires two fusable branches; USING FUSION on a
-- single-branch query — e.g. similarity()-only or pure NEAR — is rejected)
SELECT * FROM docs
WHERE vector NEAR $q AND content MATCH 'transformers'
LIMIT 10 USING FUSION(strategy = 'maximum')
FUSE BY (Planned Syntax)
PLANNED:
FUSE BYis not yet implemented in the grammar. UseUSING FUSION(...)instead for all hybrid search queries.
Set Operations (v2.0+)
Combine results from multiple SELECT queries. N-ary chaining is supported.
Precedence note: VelesQL evaluates set operators strictly left-to-right, unlike standard SQL where
INTERSECTbinds tighter thanUNION. Use parenthesized subqueries if different evaluation order is needed.
UNION
Combine results, removing duplicates:
SELECT id, title FROM articles WHERE category = 'tech'
UNION
SELECT id, title FROM articles WHERE category = 'science'
UNION ALL
Combine results, keeping duplicates:
SELECT * FROM table1 WHERE similarity(v, $q) > 0.8
UNION ALL
SELECT * FROM table2 WHERE similarity(v, $q) > 0.8
INTERSECT
Only rows present in both queries:
SELECT item_id FROM user_likes WHERE user_id = 1
INTERSECT
SELECT item_id FROM user_likes WHERE user_id = 2
EXCEPT
Rows in first query but not in second:
SELECT id FROM all_items
EXCEPT
SELECT id FROM purchased_items
Three-Way Set Operations
-- Evaluated left-to-right: (A UNION B) UNION C
SELECT id FROM recent_docs WHERE created_at > 1000
UNION
SELECT id FROM popular_docs WHERE views > 100
UNION
SELECT id FROM featured_docs WHERE featured = TRUE
-- Mixed operators: (A UNION B) INTERSECT C
SELECT * FROM a UNION SELECT * FROM b INTERSECT SELECT * FROM c
MATCH Statement (Graph Queries, v2.1+)
Execute graph pattern matching queries with optional vector similarity filtering.
Syntax
MATCH <pattern>
[WHERE <conditions>]
RETURN <projection>
[ORDER BY <expression>]
[LIMIT <n>]
Pattern Syntax
Node Patterns
A node is enclosed in parentheses with optional alias, labels, and properties:
(alias) -- Any node
(alias:Label) -- Node with label
(alias:Label1:Label2) -- Node with multiple labels
(alias:Label {prop: value}) -- Node with property filter
(:Label) -- Anonymous node with label
() -- Anonymous node (any)
(alias:Label@collection) -- Node resolved from a specific collection (cross-collection)
Cross-Collection Annotation (@collection)
The @collection suffix on a node pattern specifies which collection the node's data
should be resolved from. This enables cross-collection MATCH queries where the graph
traversal runs on one collection (the primary/edge collection) and node payloads are
enriched from other collections.
-- Products in 'catalog' graph, enriched with pricing from 'inventory' collection
MATCH (p:Product)-[:STORED_IN]->(w:Warehouse@inventory)
RETURN p.name, w.price, w.stock
LIMIT 20
When using @collection:
- The graph traversal (edges) runs on the primary collection specified via
_collectionparam orFROMclause - Node payloads are looked up from the annotated collection after traversal
- Enriched fields are prefixed with the node alias (e.g.,
w.price,w.stock) - If the annotated collection doesn't exist, the enrichment is silently skipped
Relationship Patterns
Relationships connect nodes with direction and optional type/range:
-[r:TYPE]-> -- Outgoing, named
<-[r:TYPE]- -- Incoming, named
-[r:TYPE]- -- Undirected, named
-[:TYPE]-> -- Outgoing, anonymous
-[r:TYPE1|TYPE2]-> -- Multiple relationship types
-[*1..3]-> -- Variable-length (1 to 3 hops)
-[r:TYPE *2..5]-> -- Named with type and range
Range Specification
| Syntax | Meaning | Executable |
|---|---|---|
*1..3 | Between 1 and 3 hops | Yes |
*..5 | Up to 5 hops (lower bound defaults to 1) | Yes |
*3 | Exactly 3 hops | Yes |
*2.. | At least 2 hops, open-ended | Parses, rejected at validation |
* | Any number of hops | Parses, rejected at validation |
Hop bound. The maximum hop count of any range is capped by the validator at
DEFAULT_MAX_GRAPH_EXPANSION = 32 (configurable via
ValidationConfig::max_graph_expansion). Open-ended forms (*, *2..) map to
an unbounded upper limit and are therefore always rejected under the default
configuration with a Graph expansion exceeded: max=32 complexity error
(surfaced directly by Parser::parse) — use an explicit bounded range such as
*1..32 instead.
Variable-length alias semantics (openCypher lists). A relationship alias
on a ranged pattern (-[r:TYPE*1..3]->) binds the ordered list of
traversed relationships, not a single edge:
RETURN rprojects the ordered edge-id list (e.g.[100, 101]).RETURN r.propprojects the positional list of per-edge values (missing properties yieldnull), like openCypher's[rel IN r | rel.prop].WHERE r.prop = xuses ANY-element semantics: the path matches when at least one traversed edge satisfies the condition (openCypher'sany(rel IN r WHERE rel.prop = x)).- Distinct edge paths to the same target are distinct result rows, and
parallel edges between the same node pair yield one row per aliased edge.
Anonymous relationships (
-[:TYPE]->) keep the collapsed cardinality — bind an alias when parallel edges must be distinguished (a known divergence from openCypher, which always counts one row per relationship).
RETURN Clause
Project fields from matched nodes and relationships:
RETURN a.name, b.title -- Property access
RETURN a.name AS author -- With alias
RETURN * -- All bound variables
RETURN similarity() -- Pre-computed similarity score
RETURN a.name, similarity() AS score
Full Examples
-- Simple graph traversal
MATCH (a:Person)-[:KNOWS]->(b:Person)
RETURN a.name, b.name
-- With similarity filter (RAG use case)
MATCH (doc:Document)
WHERE similarity(doc.embedding, $query) > 0.8
RETURN doc.title, doc.content
ORDER BY similarity() DESC
LIMIT 5
-- Multi-hop traversal
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company)
WHERE a.department = 'engineering'
RETURN a.name, b.name, c.name
LIMIT 20
-- Variable-length paths
MATCH (start:Document)-[*1..3]->(end:Document)
WHERE start.topic = 'AI'
RETURN start.title, end.title
-- Bidirectional relationship
MATCH (a:Person)-[:COLLABORATES]-(b:Person)
RETURN a.name, b.name
-- Multiple relationship types
MATCH (p:Person)-[r:KNOWS|WORKS_WITH]->(q:Person)
RETURN p.name, q.name
LIMIT 10
-- Combined graph + vector + column (AIOps)
MATCH (incident:Ticket)-[:IMPACTS]->(service:Microservice)
WHERE similarity(incident.log_embedding, $error_vec) > 0.85
AND incident.status = 'RESOLVED'
AND service.criticality = 'HIGH'
RETURN incident.solution, service.name
ORDER BY similarity() DESC
LIMIT 3
Scope and Requirements
Collection Scope: MATCH patterns operate within the current collection's internal graph store. They do NOT traverse across different collections.
Label Requirement: Points must have a _labels array in their payload:
{ "_labels": ["Product"], "name": "Headphones", "price": 99.99 }
Edge Requirement: Edges must exist in the collection's edge store. Create
them via collection.add_edge() (Rust/Python) or the REST API.
Execution Strategies
The query planner automatically chooses the optimal strategy:
| Strategy | When Used | Description |
|---|---|---|
| GraphFirst | No similarity condition | Traverse graph, then filter |
| VectorFirst | Similarity on start node | Vector search, then validate graph |
| Parallel | Large collection, high threshold | Execute both in parallel |
REST API
POST /collections/{name}/match
Content-Type: application/json
{
"query": "MATCH (a:Person)-[:KNOWS]->(b) WHERE similarity(a.vec, $v) > 0.8 RETURN a.name",
"params": { "v": [0.1, 0.2, ...] }
}
Introspection Statements (v3.4+)
SHOW COLLECTIONS
Lists all collections in the database with their names and types.
-- List all collections
SHOW COLLECTIONS
-- With trailing semicolon
SHOW COLLECTIONS;
Returns one row per collection, each containing:
| Field | Type | Description |
|---|---|---|
name | string | Collection name |
type | string | Collection type: vector, graph, or metadata |
DESCRIBE COLLECTION
Returns metadata about a specific collection. The COLLECTION keyword is optional.
-- With COLLECTION keyword
DESCRIBE COLLECTION docs
-- Without COLLECTION keyword (equivalent)
DESCRIBE docs
Returns a single row with:
| Field | Type | Present When |
|---|---|---|
name | string | Always |
type | string | Always (vector, graph, or metadata) |
dimension | integer | Vector collections |
metric | string | Vector collections |
point_count | integer | All types |
Returns an error if the collection does not exist.
EXPLAIN
Returns the query execution plan for a SELECT query without executing it.
-- Explain a simple query
EXPLAIN SELECT * FROM docs LIMIT 10
-- Explain a complex query
EXPLAIN SELECT * FROM docs WHERE vector NEAR $v LIMIT 10
Phase 2A enhancements (v1.13.0)
EXPLAIN output is now stat-aware: Filter nodes report a cardinality
estimate derived from column histograms (ANALYZE output) rather than a
hard-coded fallback, and join / sort cost factors are calibrated against
CollectionStats::calibrated_cost_factors. As a result, the estimated_cost
field returned in the plan tracks real execution time closely enough for
the CBO to decide between HNSW routing, bitmap pre-filter, and full scan
on the same query shape. Plan-cost consistency fixes (#607 / #608 /
#609) ensure a single query produces the same plan whether invoked via
the REST /query/explain endpoint, the CLI .explain command, or
EXPLAIN ANALYZE.
When ORDER BY similarity(<col>, $q) is present without NEAR, the plan
tree now explicitly surfaces the similarity-as-predicate routing decision
(see Order by Similarity above) — the VectorSearch node appears even
though the SQL text has no NEAR.
v3.2.0 — REST plan single-sourced from core
The REST /query/explain step list is now produced directly from the engine
plan via QueryPlan::to_plan_steps(), the same PlanNode tree the CLI
.explain renders — it is no longer reconstructed from the parsed AST. The
wire operation vocabulary is unchanged (additive): a full scan still reports
FullScan and joins still report {Type}Join. As a side effect, GROUP BY,
aggregation, ORDER BY, and JOIN steps now also appear in the CLI/Python
to_tree() output (previously only the REST view surfaced them), and an
indexed-equality predicate may surface an IndexLookup step.
Returns a single row with:
| Field | Type | Description |
|---|---|---|
plan | object | Structured query plan (JSON) |
tree | string | Human-readable plan tree |
The tree field now surfaces additional context when the query uses WITH, LET, or FUSION
clauses:
WITHoptions —ef_search,mode,rerank,timeout_msare read from theWITHclause and displayed as-is (no longer hardcoded to 100 foref_search).LETbindings — each named binding is listed under aLET bindings:node.FUSIONdetails — strategy name,k, and per-source weights appear under aFUSION:node.
Example enriched output:
── VectorSearch (docs) ef_search=512
├── WITH options:
│ ├── ef_search: 512
│ ├── mode: accurate
│ └── rerank: true
├── LET bindings:
│ └── boost = vector_score * 1.5
├── FUSION: rrf (k=60, vector_weight=0.7, bm25_weight=0.3)
├── Filter (PreFilter)
└── Limit 10
EXPLAIN ANALYZE (v3.8+)
Executes the query with lightweight instrumentation and returns both the
estimated plan and actual execution statistics side-by-side. Unlike EXPLAIN
(which only plans), EXPLAIN ANALYZE runs the query to collect real metrics.
Caution: EXPLAIN ANALYZE executes the query. Use with care on write queries (INSERT, UPDATE, DELETE) as they will modify data.
Invocation
EXPLAIN ANALYZE is not a parsed VelesQL statement — the grammar only
accepts EXPLAIN <select-query>. ANALYZE mode is requested at the API level:
- REST:
POST /query/explainwith"analyze": truein the request body. - Rust API:
Database::explain_analyze_query(...)/Collection::explain_analyze_query(...).
POST /query/explain
{
"query": "SELECT * FROM docs WHERE vector NEAR $v LIMIT 10",
"params": { "v": [0.1, 0.2, 0.3] },
"analyze": true
}
Return Fields
The result includes the estimated plan (identical to EXPLAIN) plus:
| Field | Type | Description |
|---|---|---|
plan | object | Estimated query plan (same as EXPLAIN) |
actual_stats | object | Top-level actual execution statistics |
node_stats | array | Per-plan-node actual statistics |
cost_factors | object | null | Effective cost factors used for this plan (operation-by-operation cost weights). null when EXPLAIN is called without ANALYZE. |
calibration_source | string | null | Where the cost model picked its values: "histogram" if the collection's ANALYZE histogram was used, "default" otherwise. null without ANALYZE. |
feedback_calibration | object | null | Runtime CBO calibration from the EMA feedback loop (v1.15.0, issue #469 Phase 2). Present only once the loop has observed ≥ 10 vector queries on this collection. See subsection below. |
actual_stats fields:
| Field | Type | Description |
|---|---|---|
actual_rows | u64 | Number of rows returned by execution |
actual_time_ms | f64 | Wall-clock execution time in milliseconds |
loops | u64 | Number of execution iterations (always 1) |
nodes_visited | u64 | For MATCH queries, an approximate (best-effort, lower-bound) graph-traversal node count (start nodes examined + nodes reached by following edges); 0 for non-MATCH queries |
edges_traversed | u64 | For MATCH queries, an approximate (best-effort, lower-bound) count of edges followed during traversal; 0 for non-MATCH queries |
traversal_counters_approximate | bool | true when nodes_visited / edges_traversed are strategy-dependent approximations (a lower bound), not exact measured counts; false for non-graph queries where both counters are 0. |
Note:
nodes_visited/edges_traversedare an approximate, best-effort lower bound on the graph traversal across all MATCH strategies — not exact figures. For GraphFirst they are the start nodes examined plus the edges/nodes reached; for the similarity-anchored VectorFirst strategy (asimilarity()predicate on the start node) they are the candidate nodes evaluated plus the per-candidate existence-BFS edges/nodes — each BFS useslimit(1)and so undercounts the true frontier; the Parallel strategy sums both legs (a node touched by both is counted twice).
feedback_calibration fields (v1.15.0+, EXPLAIN ANALYZE only):
| Field | Type | Description |
|---|---|---|
ms_per_cost_unit | f64 | EMA-adjusted milliseconds per cost unit observed at runtime. Compares against the static default of 0.1 ms/unit. Values consistently below 0.1 indicate the cost model is over-estimating latency; above indicate under-estimation. |
sample_count | u64 | Number of vector query samples that produced this estimate. Low counts (< ~50) mean the EMA is still in warm-up; higher counts indicate a stable, deployment-specific calibration. |
The block is omitted from the JSON (via skip_serializing_if) when the
feedback loop has not yet warmed (< 10 samples for that collection) and on
plain EXPLAIN (no execution → no runtime samples).
See crates/velesdb-core/src/velesql/explain/types.rs:160-180 for the
canonical type definition and ExplainOutput::with_feedback_calibration
for how the planner attaches it.
node_stats entry fields:
| Field | Type | Description |
|---|---|---|
node_label | string | Plan node type (e.g. "VectorSearch", "Filter") |
actual_time_ms | f64 | Estimated (not measured) wall-clock time for this node, derived by distributing the plan-global actual_time_ms across nodes via fixed weight fractions. Heuristic until per-node instrumentation lands (#467). |
actual_rows_in | u64 | Estimated (heuristic) rows entering this node, not a measured count. |
actual_rows_out | u64 | Estimated (heuristic) rows leaving this node, not a measured count. |
loops | u64 | Loop iterations for this node |
estimated | bool | Always true until instrumented timing lands (#467): the actual_* row/time values above are heuristic estimates, not real per-node measurements. The actual_ prefix is kept for API stability. |
Filter plan fields (v3.9+):
When histogram data is available, Filter plan nodes include additional fields:
| Field | Type | Description |
|---|---|---|
| `estimated_rows$ | \text{u64}? | \text{Histogram}-\text{based} \text{cardinality} \text{estimate} (\text{selectivity} \times \text{total_rows}) |
| $estimation_method` | string? | "histogram", "cardinality", or "no histogram" |
Examples
Queries to analyze are passed in the query field of /query/explain with
"analyze": true (the EXPLAIN keyword itself must not be part of the string):
// Analyze a vector search query
{ "query": "SELECT * FROM docs WHERE vector NEAR $v LIMIT 10",
"params": { "v": [0.1, 0.2] }, "analyze": true }
// Analyze a filtered query
{ "query": "SELECT * FROM products WHERE category = 'tech' AND price > 50 LIMIT 20",
"analyze": true }
After ≥ 10 vector queries against the same collection, the EMA feedback
loop (v1.15.0) attaches a feedback_calibration block:
{
"plan": { ... },
"actual_stats": {
"actual_rows": 10,
"actual_time_ms": 1.234,
"loops": 1,
"nodes_visited": 0,
"edges_traversed": 0
},
"node_stats": [ ... ],
"cost_factors": { ... },
"calibration_source": "histogram",
"feedback_calibration": {
"ms_per_cost_unit": 0.087,
"sample_count": 42
}
}
Read the block as: "On this deployment, every CBO cost unit has cost
0.087 ms in practice (vs the 0.1 ms default) across the last 42 vector
queries". A persistent gap between estimated and actual time (the ⚠
warning below) usually shows up here as ms_per_cost_unit drifting
significantly away from 0.1.
CLI Usage
The CLI REPL provides the .explain-analyze command:
velesdb> .explain-analyze SELECT * FROM docs WHERE vector NEAR [0.1, 0.2, 0.3] LIMIT 10
Query Plan:
└─ VectorSearch
├─ Collection: docs
├─ ef_search: 100
└─ Candidates: 10
Estimated cost: 0.050ms
Actual Statistics:
Actual rows: 10
Actual time: 1.234ms
Loops: 1
Nodes visited: 0
Edges traversed: 0
Per-Node Statistics:
VectorSearch: 1.173ms (rows: 10 → 10)
Limit: 0.012ms (rows: 10 → 10)
``$
\text{When} \text{the} \text{estimated} \text{cost} \text{diverges} \text{from} \text{actual} \text{time} \text{by} \text{more} \text{than} 10 \times , \text{a} $⚠`
warning marker is displayed to highlight potential cost model inaccuracies.
#### HTTP API
Send a POST to `/query/explain` with `"analyze": true` in the request body:
```json
{
"query": "SELECT * FROM docs WHERE vector NEAR $v LIMIT 10",
"params": { "v": [0.1, 0.2, 0.3] },
"analyze": true
}
The response includes actual_stats and node_stats fields alongside the
plan. When analyze is false or absent, the endpoint behaves identically
to the existing EXPLAIN-only mode.
DDL Statements (v3.3+) and Admin Statements (v3.5+)
CREATE COLLECTION
Create vector, graph, or metadata-only collections.
Vector Collection
-- Minimal: dimension only (metric defaults to cosine)
CREATE COLLECTION docs (dimension = 768)
-- Explicit metric
CREATE COLLECTION docs (dimension = 768, metric = 'euclidean')
-- With storage quantization and HNSW parameters
CREATE COLLECTION docs (dimension = 768, metric = 'cosine')
WITH (storage = 'sq8', m = 16, ef_construction = 200)
-- Dot product metric
CREATE COLLECTION embeddings (dimension = 384, metric = 'dotproduct')
Graph Collection
-- Schemaless graph with embeddings
CREATE GRAPH COLLECTION knowledge (dimension = 768, metric = 'cosine') SCHEMALESS
-- Schemaless pure graph (no embeddings)
CREATE GRAPH COLLECTION pure_graph
-- Typed graph with schema definition
CREATE GRAPH COLLECTION ontology (dimension = 768, metric = 'cosine') WITH SCHEMA (
NODE Person (name: STRING, age: INTEGER),
NODE Document (title: STRING, year: INTEGER),
EDGE AUTHORED_BY FROM Person TO Document,
EDGE CITED_BY FROM Document TO Document
)
Metadata Collection
-- No vectors, metadata-only storage
CREATE METADATA COLLECTION tags
CREATE Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
dimension | integer | Yes (vector/graph) | -- | Embedding dimension |
metric | string | No | cosine | Distance metric: cosine, euclidean, dotproduct |
WITH Options
| Option | Type | Default | Description |
|---|---|---|---|
storage | string | full | Storage mode: full, sq8, binary |
m | integer | 16 | HNSW M parameter (max links per node) |
ef_construction | integer | 200 | HNSW build-time expansion factor |
Schema Type Names
| Type | Description |
|---|---|
STRING | Text value |
INTEGER | Integer value |
FLOAT | Floating-point value |
BOOLEAN | True/false value |
VECTOR | Vector field (special) |
DROP COLLECTION
-- Error if collection not found
DROP COLLECTION documents
-- No error if absent
DROP COLLECTION IF EXISTS documents
Index Management (v3.5+)
Create or drop secondary metadata indexes on collection payload fields. Secondary indexes use a BTree structure for O(log n) equality lookups on metadata fields, accelerating WHERE-clause filters.
CREATE INDEX
-- Create a secondary index on a metadata field
CREATE INDEX ON docs (category)
-- Case-insensitive keywords
create index on docs (status)
Index creation is idempotent -- creating an index that already exists is a no-op.
DROP INDEX
-- Remove a secondary metadata index
DROP INDEX ON docs (category)
Dropping a non-existent index succeeds silently (no error).
Notes
- Secondary indexes only apply to metadata payload fields (not vector data).
- Indexes accelerate equality filters in WHERE clauses (e.g.,
WHERE category = 'tech'). CREATE INDEXretroactively indexes all existing payloads for the field (a one-time backfill scan); the index is then maintained incrementally on subsequent upserts and deletes.- Secondary index definitions persist: a
CREATE INDEXrecords the field inconfig.jsonand the index is rebuilt from the stored payloads on the next open, so it survives a process restart (EPIC-081).DROP INDEXremoves the field fromconfig.json, so a dropped index does not reappear on restart.
ANALYZE (v3.5+, histograms v3.9+)
Computes cost-based optimizer (CBO) statistics for a collection. The statistics
are cached in memory and persisted to disk (collection.stats.json).
-- Compute statistics for query optimizer
ANALYZE docs
ANALYZE COLLECTION docs -- optional COLLECTION keyword
Returns a JSON payload with collection statistics including total_points,
row_count, deleted_count, avg_row_size_bytes, payload_size_bytes,
and per-field cardinality information.
Histogram Construction (v3.9+)
ANALYZE builds equi-depth histograms on Int, Float, and String columns to
enable accurate selectivity estimation for the CBO. Key characteristics:
- Sampling: up to 10,000 rows sampled per column (separate from the 1,000-row payload stats sample).
- Bucket count: 64 buckets by default. Fewer buckets when distinct values < 64 (one bucket per distinct value).
- String columns: mapped to ordinal ranks (lexicographic sort order) for histogram construction.
- Persistence: histograms are serialized as part of
collection.stats.jsonand restored on database startup. - Incremental maintenance: upsert/delete operations update histogram bucket
counts incrementally (O(log B) binary search). After 20% cumulative updates,
the histogram is marked stale and a
debug!log recommends re-running ANALYZE. - Backward compatibility: new histogram fields use
#[serde(default)]so pre-histogram stats files deserialize without error.
TRUNCATE (v3.5+)
Deletes all rows from a collection without dropping the collection itself.
Unlike DELETE FROM which requires a WHERE clause, TRUNCATE removes
all data unconditionally.
-- Delete all rows
TRUNCATE docs
TRUNCATE COLLECTION docs -- optional COLLECTION keyword
Returns a payload with {"deleted_count": N}. Truncating an empty collection
succeeds with deleted_count: 0. The collection structure (config, indexes)
is preserved after truncation.
ALTER COLLECTION (v3.5+)
Modifies collection settings at runtime. Uses the same key-value option
syntax as CREATE COLLECTION.
-- Enable auto-reindex
ALTER COLLECTION docs SET (auto_reindex = true)
-- Disable auto-reindex
ALTER COLLECTION docs SET (auto_reindex = false)
Supported options:
| Option | Type | Description |
|---|---|---|
auto_reindex | boolean | Enable/disable automatic HNSW parameter tuning |
Unknown options are rejected with an error message listing supported options.
The change is applied to the live collection and persisted immediately, so it
survives a restart — the auto-reindex policy is restored automatically on the
next collection open. Setting auto_reindex = false keeps the policy attached
but disabled (preserving any previously configured thresholds for a symmetric
round-trip).
FLUSH (v3.6+)
Persists collection data to disk. Supports fast flush (WAL only) and full flush (includes index serialization). Can target all collections or a specific one.
-- Fast flush all collections (WAL only)
FLUSH
-- Full flush all collections (includes index serialization)
FLUSH FULL
-- Fast flush a specific collection
FLUSH docs
-- Full flush a specific collection
FLUSH FULL docs
Returns a payload with {"status": "flushed", "full": true/false}.
- Fast flush (
FLUSH): Writes WAL data to disk. Fast but requires WAL replay on next startup if the process crashes before a full flush. - Full flush (
FLUSH FULL): Writes WAL data and serializes vector indexes. Slower but ensures fast startup without WAL replay. Recommended before graceful shutdown.
Flushing a nonexistent collection returns a CollectionNotFound error.
Flushing an empty database (no collections) succeeds with no error.
Note:
full(case-insensitive) is a reserved modifier for FLUSH. To flush a collection namedfull, useFLUSH FULL fullor quote the identifier:FLUSH `full`.
DML Statements (INSERT, UPDATE, DELETE)
INSERT INTO (v3.2+, multi-row v3.5+)
Insert one or more rows into a collection. Column names are listed in parentheses, followed by corresponding values. Multiple rows are separated by commas (v3.5+).
-- Insert a single row
INSERT INTO docs (id, title, category) VALUES (1, 'Getting Started', 'tutorial')
-- Insert with numeric values
INSERT INTO products (id, price, stock) VALUES (42, 29.99, 100)
-- Insert with boolean values
INSERT INTO flags (id, active, verified) VALUES (1, TRUE, FALSE)
-- Insert with NULL
INSERT INTO events (id, description, metadata) VALUES (10, 'system boot', NULL)
-- Insert with parameter-bound vector
INSERT INTO docs (id, vector, category) VALUES (1, $vector, 'test')
-- Multi-row insert (v3.5+)
INSERT INTO docs (id, vector, title) VALUES (1, $v1, 'Hello'), (2, $v2, 'World'), (3, $v3, 'Foo')
Column count must match value count in every row. Types are inferred from
literal values. Multi-row INSERT executes as a single batch upsert() call.
UPSERT INTO (v3.5+)
Insert or update rows. Identical syntax to INSERT; semantically equivalent since VelesDB's INSERT is always idempotent (existing IDs are overwritten). UPSERT makes this intent explicit at the SQL level.
-- Single-row upsert
UPSERT INTO docs (id, vector, title) VALUES (1, $v, 'Updated')
-- Multi-row upsert
UPSERT INTO docs (id, vector, title) VALUES (1, $v1, 'A'), (2, $v2, 'B')
UPDATE (v3.2+)
Update one or more fields in existing rows.
-- Update a single field
UPDATE docs SET status = 'archived' WHERE id = 42
-- Update multiple fields
UPDATE products SET price = 19.99, stock = 50 WHERE category = 'sale'
-- Update with boolean
UPDATE products SET featured = TRUE WHERE id = 1
-- Update without WHERE (updates all rows -- use with caution)
UPDATE products SET featured = FALSE
The WHERE clause is optional but strongly recommended.
DELETE FROM (v3.3+)
Delete rows from a collection. The WHERE clause is mandatory to prevent accidental full-collection deletion.
-- Delete by single ID
DELETE FROM docs WHERE id = 42
-- Delete by multiple IDs
DELETE FROM docs WHERE id IN (1, 2, 3)
-- Delete with string comparison
DELETE FROM logs WHERE level = 'debug'
-- Delete with compound condition
DELETE FROM events WHERE status = 'expired' AND created_at < 1000
VelesQL rejects
DELETE FROM <collection>without a WHERE predicate as a safety measure.
INSERT EDGE (v3.3+)
Insert a labeled edge between two nodes in a graph collection.
-- Basic edge insertion
INSERT EDGE INTO knowledge (source = 1, target = 2, label = 'AUTHORED_BY')
-- With explicit edge ID
INSERT EDGE INTO knowledge (id = 10, source = 1, target = 2, label = 'KNOWS')
-- With edge properties
INSERT EDGE INTO knowledge (source = 1, target = 2, label = 'AUTHORED_BY')
WITH PROPERTIES (year = 2026, confidence = 0.95)
-- With ID and properties
INSERT EDGE INTO kg (id = 20, source = 2, target = 3, label = 'KNOWS')
WITH PROPERTIES (weight = 0.9)
Edge Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
source | integer | Yes | Source node ID |
target | integer | Yes | Target node ID |
label | string | Yes | Edge relationship type |
id | integer | No | Explicit edge ID (auto-generated if omitted) |
WITH PROPERTIES accepts key-value pairs (strings, integers, or floats).
DELETE EDGE (v3.3+)
Remove an edge by its ID from a graph collection.
DELETE EDGE 123 FROM knowledge
DELETE EDGE 10 FROM kg
SELECT EDGES (v3.5+)
Query edges from a graph collection with optional filtering by source node, target node, or edge label.
-- All edges
SELECT EDGES FROM kg LIMIT 100
-- Outgoing edges from a node
SELECT EDGES FROM kg WHERE source = 1
-- Incoming edges to a node
SELECT EDGES FROM kg WHERE target = 2
-- Edges by label
SELECT EDGES FROM kg WHERE label = 'KNOWS'
-- Combined: outgoing with label
SELECT EDGES FROM kg WHERE source = 1 AND label = 'KNOWS'
Each result contains a JSON payload with edge_id, source, target, label,
and properties fields.
Limitation: WHERE supports at most two conditions combined with AND. Three-condition queries like
WHERE source = 1 AND label = 'X' AND target = 2are not supported. Use two-condition AND and filter results client-side if needed.
INSERT NODE (v3.5+)
Insert or update a node's payload in a graph collection.
-- With JSON payload
INSERT NODE INTO kg (id = 42, payload = '{"name": "Alice", "_labels": ["Person"]}')
-- With inline fields (collected into a JSON object)
INSERT NODE INTO kg (id = 10, name = 'Bob', age = 30)
If a node with the given id already exists, the payload is replaced (upsert
semantics).
TRAIN QUANTIZER Command (v2.2+)
Explicitly train a Product Quantizer on a collection's vector data.
Syntax
TRAIN QUANTIZER ON <collection> WITH (<parameters>)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
m | integer | 8 | Number of subspaces |
k | integer | 256 | Codebook size per subspace |
type | string | -- | Quantizer type: pq, opq, rabitq |
oversampling | integer | -- | Training oversampling ratio |
sample | integer | -- | Number of vectors to sample |
force | boolean | false | Force retrain if exists |
Examples
-- Train with standard parameters
TRAIN QUANTIZER ON my_embeddings WITH (m = 8, k = 256)
-- Higher compression (more subspaces)
TRAIN QUANTIZER ON large_collection WITH (m = 16, k = 256)
-- Finer codebooks
TRAIN QUANTIZER ON precision_collection WITH (m = 8, k = 512)
-- With quantizer type
TRAIN QUANTIZER ON vectors WITH (m = 16, k = 256, type = opq)
-- Force retrain with oversampling
TRAIN QUANTIZER ON my_coll WITH (m = 8, k = 256, oversampling = 4, force = true)
-- Semicolons optional
TRAIN QUANTIZER ON my_collection WITH (m = 8, k = 256);
Notes
- Training is explicit -- it does not happen automatically.
- The collection must contain enough vectors (recommended: at least
k). - Re-training overwrites the existing quantizer.
- OPQ can be enabled via the
typeparameter. - Trained quantizers are persisted in the collection directory
(
codebook.pq/rotation.opqfor PQ/OPQ,rabitq.idxfor RaBitQ) and restored on database open: the PQ cache and RaBitQ encodings are rebuilt by re-encoding the stored vectors (O(n) at open). type = rabitqalso installs the trained quantizer into the live index when the collection was created withstorage = 'rabitq'. For collections created with another storage mode, training persists the index and flips the collection's storage mode; the RaBitQ backend takes effect at the next open.
Agent Memory VelesQL Queries (v3.2+)
The Agent Memory SDK creates three internal collections queryable via VelesQL:
Semantic Memory (_semantic_memory)
SELECT * FROM _semantic_memory
WHERE vector NEAR $query_embedding
LIMIT 10 WITH (mode = 'accurate')
Payload fields: content (string).
Episodic Memory (_episodic_memory)
-- Recent events (last 24h)
SELECT * FROM _episodic_memory
WHERE timestamp > 1711234567
ORDER BY timestamp DESC
LIMIT 20
-- Similarity search on events
SELECT * FROM _episodic_memory
WHERE vector NEAR $query_embedding
LIMIT 5
Payload fields: description (string), timestamp (integer, Unix epoch).
Procedural Memory (_procedural_memory)
SELECT * FROM _procedural_memory
WHERE confidence > 0.7
ORDER BY confidence DESC
LIMIT 10
Payload fields: name (string), steps (array), confidence (float),
usage_count (integer), created_at (integer), last_used_at (integer).
ACT-R activation decay (v1.15.0, opt-in). When the surrounding
ProceduralMemoryis built withProceduralMemory::with_activation_decay(d)(Phase 1 of #433), theconfidencevalue returned by recall queries is adjusted on read by an ACT-R base-level activation decay (c_effective = c_stored × (Δt^-d), Anderson 1996, defaultd ≈ 0.5). The stored value is unchanged — VelesQL filters andORDER BYsee the decayed (effective) confidence. Queries written before v1.15.0 keep working unchanged because the strategy is opt-in.
Convenience API (Rust/Python)
let results = agent_memory.query_semantic(
"SELECT * FROM _semantic_memory WHERE vector NEAR $v LIMIT 5",
¶ms,
)?;
All three subsystems support: vector NEAR, payload filters, ORDER BY, LIMIT/OFFSET, WITH options, LET bindings, and USING FUSION.
VelesQL queryability of agent memory collections is verified by BDD integration
tests (tests/bdd/agent_memory.rs) that exercise the full
Database::execute_query pipeline, and by unit tests in
src/agent/velesql_bridge_tests.rs that test the Collection::execute_query_str
path.
Value Types
Strings
Single-quoted. To include a single quote, double it (''):
'hello world'
'O''Brien' -- produces: O'Brien
'It''s a ''test''' -- produces: It's a 'test'
'' -- empty string
Unicode characters are fully supported:
'日本語テキスト'
'🚀 Launch'
Warning: Backslash escaping (
\') is not supported. Always use''.
Numbers
42 -- integer
3.14 -- float
-100 -- negative integer
-0.5 -- negative float
Booleans
TRUE
FALSE
Case-insensitive: true, True, TRUE are all equivalent.
Vectors
[0.1, 0.2, 0.3, 0.4] -- literal vector (dense)
$query_vector -- parameter reference
Sparse Vectors
{0: 0.9, 14: 0.7, 256: 0.5} -- literal sparse vector
$sparse_query -- parameter reference
NULL
NULL
Temporal Values
NOW() -- current Unix timestamp
INTERVAL '7 days' -- time duration
NOW() - INTERVAL '24 hours' -- temporal arithmetic
Parameters
Use $name syntax for parameterized queries. Parameters are resolved at runtime
from the query context.
SELECT * FROM docs WHERE vector NEAR $query_vector AND category = $cat LIMIT 10
Parameters can be used for:
- Vectors:
$v,$query_embedding - Sparse vectors:
$sparse,$sv - Scalar values:
$category,$min_price
Execution Surfaces & CLI REPL Limitations
VelesQL is accepted by the REST API (velesdb-server), the language SDKs, and
the interactive CLI REPL (velesdb-cli). The grammar is identical across all
three, but the CLI REPL has two execution limitations a newcomer should know:
- Parameterized vector search is not executed in the REPL. A query whose
WHEREusesvector NEAR $param(orMATCH ... $param) prints "Vector search with $parameter requires REST API" and returns zero rows — the REPL has no way to bind an external vector. Use an inline vector literal (WHERE vector NEAR [0.1, 0.2, 0.3]), a metadata-only query, or the REST API with the parameter supplied in the request body. MATCHrequires an active collection. Run.use <collection_name>first; otherwise the REPL reports "MATCH queries require an active collection".
These limitations are specific to the REPL; the REST API and SDKs execute both
parameterized vector search and MATCH normally.
Divergences from standard SQL
- Bare table aliases reserve clause keywords:
FROM docs dandFROM docs AS dare both accepted and equivalent, but a bare alias may not be a clause keyword (WHERE,LIMIT,JOIN,UNION, ...) — quote it with backticks to use one anyway. vectorandscoreare reserved keywords, not free payload field names —vector NEAR ...andscorein aggregates/ordering refer to the query vector and the computed similarity score.
Identifier Quoting
Reserved Keywords
The following keywords cannot be used as identifiers without quoting:
| Category | Keywords |
|---|---|
| Query structure | SELECT, FROM, WHERE, AS |
| Logical operators | AND, OR, NOT |
| Comparison operators | IN, BETWEEN, LIKE, ILIKE, MATCH |
| NULL handling | IS, NULL |
| Boolean literals | TRUE, FALSE |
| Pagination | LIMIT, OFFSET |
| Sorting | ORDER, BY, ASC, DESC |
| Aggregation | GROUP, HAVING |
| Vector operations | NEAR, NEAR_FUSED, SPARSE_NEAR, SIMILARITY |
| Extensions | FUSE, TRAIN, QUANTIZER, WITH |
| DDL | CREATE, DROP, COLLECTION, GRAPH, METADATA |
| Graph/DML | EDGE, SCHEMALESS, SCHEMA, NODE, PROPERTIES |
| Conditional | IF, EXISTS |
| DML | INSERT, INTO, UPDATE, SET, DELETE, VALUES |
| Set operations | UNION, ALL, INTERSECT, EXCEPT |
| JOINs | JOIN, INNER, LEFT, RIGHT, FULL, OUTER, ON, USING |
| Bindings | LET, RETURN, MATCH |
| Temporal | NOW, INTERVAL |
| Misc | DISTINCT, COUNT, SUM, AVG, MIN, MAX, FIRST, CONTAINS |
Quoting Styles
To use reserved keywords or special characters as identifiers, quote them:
Backtick quoting (MySQL-style):
SELECT `select`, `from`, `order` FROM docs
Double-quote quoting (SQL standard):
SELECT "select", "from", "order" FROM docs
Mixed styles in the same query:
SELECT `select`, "order" FROM my_table WHERE `limit` > 10
Escaping Quotes Inside Identifiers
Double the quote character to include it:
-- Column named: col"name
SELECT "col""name" FROM docs
Examples
-- Reserved keyword as collection name
SELECT * FROM `order`
-- Reserved keywords in WHERE
SELECT * FROM docs WHERE `select` = 'value' AND `from` LIKE '%pattern%'
-- Reserved keywords in ORDER BY
SELECT * FROM docs ORDER BY `order` ASC
-- Reserved keywords in GROUP BY
SELECT `group`, COUNT(*) FROM docs GROUP BY `group`
-- Reserved keyword as alias
SELECT id AS `select` FROM docs
Default Values Reference
| Feature | Default | Notes |
|---|---|---|
| LIMIT | 10 (every SELECT form) | Exceptions: MATCH ... RETURN and compound queries (UNION/INTERSECT/EXCEPT) have no implicit LIMIT 10 (bounded by the 100 000-row server ceiling). Always specify LIMIT for exhaustive retrieval |
| OFFSET | 0 | |
| ORDER BY direction | ASC | Explicit DESC recommended for similarity |
| metric (CREATE) | cosine | |
| storage (CREATE) | full | full = no quantization |
| m (HNSW) | 16 | |
| ef_construction | 200 | |
| WITH mode | balanced | |
| USING FUSION strategy | rrf | |
| USING FUSION k | 60 | |
| ef_search | Depends on mode | fast=32, balanced=64, accurate=256, perfect=512 |
Error Handling
VelesQL returns structured errors:
| Error Type | Description |
|---|---|
SyntaxError | Invalid query syntax |
SemanticError | Valid syntax but invalid semantics |
CollectionNotFound | Referenced collection doesn't exist |
ColumnNotFound | Referenced column doesn't exist |
TypeMismatch | Incompatible types in comparison |
Timeout | Query exceeded timeout_ms |
Quick Reference
Statement Types
| Statement | Purpose | Example |
|---|---|---|
SELECT | Query data | SELECT * FROM docs WHERE vector NEAR $v LIMIT 10 |
MATCH | Graph traversal | MATCH (a)-[:KNOWS]->(b) RETURN a.name |
INSERT INTO | Add rows (multi-row) | INSERT INTO docs (id, title) VALUES (1, 'A'), (2, 'B') |
UPSERT INTO | Insert or update rows | UPSERT INTO docs (id, title) VALUES (1, 'Updated') |
UPDATE | Modify rows | UPDATE docs SET status = 'done' WHERE id = 1 |
DELETE FROM | Remove rows | DELETE FROM docs WHERE id = 42 |
INSERT EDGE | Add graph edge | INSERT EDGE INTO kg (source=1, target=2, label='REL') |
DELETE EDGE | Remove graph edge | DELETE EDGE 123 FROM kg |
SELECT EDGES | Query graph edges | SELECT EDGES FROM kg WHERE source = 1 LIMIT 10 |
INSERT NODE | Add/update node payload | INSERT NODE INTO kg (id = 42, payload = '{"name": "Alice"}') |
CREATE COLLECTION | Create collection | CREATE COLLECTION docs (dimension=768) |
DROP COLLECTION | Delete collection | DROP COLLECTION IF EXISTS docs |
CREATE INDEX | Add metadata index | CREATE INDEX ON docs (category) |
DROP INDEX | Remove metadata index | DROP INDEX ON docs (category) |
TRAIN QUANTIZER | Train compression | TRAIN QUANTIZER ON docs WITH (m=8, k=256) |
SHOW COLLECTIONS | List collections | SHOW COLLECTIONS |
DESCRIBE | Collection metadata | DESCRIBE COLLECTION docs |
EXPLAIN | Query plan | EXPLAIN SELECT * FROM docs LIMIT 10 |
WHERE Operators
| Operator | Syntax | Example |
|---|---|---|
| NEAR | vector NEAR $v | WHERE vector NEAR $query |
| SPARSE_NEAR | vector SPARSE_NEAR $v | WHERE vector SPARSE_NEAR $sparse |
| NEAR_FUSED | vector NEAR_FUSED [$v1,$v2] | WHERE vector NEAR_FUSED [$a, $b] |
| similarity() | similarity(field, $v) op N | WHERE similarity(emb, $v) > 0.8 |
| MATCH (text) | column MATCH 'text' | WHERE content MATCH 'database' |
| CONTAINS_TEXT | column CONTAINS_TEXT 'literal' | WHERE name CONTAINS_TEXT 'rust' (case-insensitive, no BM25 ranking) |
| CONTAINS (array) | column CONTAINS value | WHERE tags CONTAINS 'rust' |
= != <> > >= < <= | column op value | WHERE price > 100 |
| IN / NOT IN | column [NOT] IN (values) | WHERE id IN (1, 2, 3) |
| BETWEEN | column BETWEEN low AND high | WHERE price BETWEEN 50 AND 200 |
| LIKE / ILIKE | column [I]LIKE 'pattern' | WHERE title LIKE 'rust%', WHERE name ILIKE '%rust%' |
| IS NULL / IS NOT NULL | column IS [NOT] NULL | WHERE email IS NOT NULL |
| Scalar subquery | column op (SELECT … ) | WHERE views > (SELECT AVG(views) FROM stats) — executed and substituted as a literal; correlated subqueries rejected with V010 (SubqueryNotExecutable) |
| Graph match predicate | MATCH (...) in WHERE | WHERE MATCH (a:Person)-[:KNOWS]->(b) AND a.id = $u |
| GEO_DISTANCE | GEO_DISTANCE(col, lat, lng) op meters | WHERE GEO_DISTANCE(location, 48.8566, 2.3522) < 500 |
| GEO_BBOX | GEO_BBOX(col, lat_min, lng_min, lat_max, lng_max) | WHERE GEO_BBOX(location, 48.8, 2.3, 48.9, 2.4) |
Geospatial Functions
GEO_DISTANCE
Computes the Haversine great-circle distance in meters between a GeoPoint column value and a reference coordinate pair.
SELECT * FROM places WHERE GEO_DISTANCE(location, 48.8566, 2.3522) < 500;
SELECT * FROM places WHERE GEO_DISTANCE(location, 48, 2) >= 1000;
- Coordinates accept both float (
48.8566) and integer (48) literals. - Returns distance in meters (SI unit).
- Null GeoPoint values are excluded from results.
- Non-GeoPoint or non-existent columns return empty results (no error).
- Combinable with AND, OR, NOT, and other WHERE operators.
GEO_BBOX
Tests whether a GeoPoint column value falls within a latitude/longitude bounding box (inclusive).
SELECT * FROM places WHERE GEO_BBOX(location, 48.8, 2.3, 48.9, 2.4);
- All four coordinate parameters accept float and integer literals.
- Boundary is inclusive: points exactly on the edge are included.
- Inverted coordinates (
lat_min > lat_max) return empty results. - Null GeoPoint values are excluded from results.
| NOT IN |
column NOT IN (values)|WHERE status NOT IN ('deleted')| | BETWEEN |column BETWEEN a AND b|WHERE price BETWEEN 10 AND 100| | LIKE |column LIKE 'pattern'|WHERE name LIKE 'John%'| | ILIKE |column ILIKE 'pattern'|WHERE name ILIKE '%john%'| | IS NULL |column IS NULL|WHERE deleted_at IS NULL| | IS NOT NULL |column IS NOT NULL|WHERE email IS NOT NULL| | AND / OR / NOT | logical combinators |WHERE a > 1 AND (b = 2 OR c = 3)|
Clauses
| Clause | Purpose | Example |
|---|---|---|
FROM ... AS | Source with alias | FROM docs AS d |
WHERE | Filter conditions | WHERE status = 'active' |
ORDER BY ... ASC/DESC | Sort results | ORDER BY similarity() DESC |
LIMIT N | Cap result count | LIMIT 10 |
OFFSET N | Skip first N results | OFFSET 20 |
GROUP BY | Aggregate grouping | GROUP BY category |
HAVING | Filter groups | HAVING COUNT(*) > 5 |
JOIN ... ON | Combine collections | JOIN users ON docs.author_id = users.id |
WITH (...) | Search options | WITH (mode = 'accurate') |
USING FUSION(...) | Hybrid strategy | USING FUSION(strategy = 'rrf') |
LET x = expr | Score binding | LET score = 0.7 * vector_score |
DISTINCT | Deduplicate | SELECT DISTINCT category FROM docs |
Aggregate Functions
| Function | Description | Example |
|---|---|---|
COUNT(*) | Count rows | SELECT COUNT(*) FROM docs |
COUNT(col) | Count non-null | SELECT COUNT(email) FROM users |
SUM(col) | Sum values | SELECT SUM(price) FROM orders |
AVG(col) | Average value | SELECT AVG(rating) FROM reviews |
MIN(col) | Minimum value | SELECT MIN(created_at) FROM logs |
MAX(col) | Maximum value | SELECT MAX(score) FROM results |
MAX(score) | Max similarity in group | SELECT MAX(score) AS rel FROM chunks ... GROUP BY parent_id |
AVG(score) | Mean similarity in group | SELECT AVG(score) AS avg FROM chunks ... GROUP BY parent_id |
FIRST(col) | Value from best chunk | SELECT FIRST(text) AS excerpt FROM chunks ... GROUP BY parent_id |
Value Types
| Type | Syntax | Example |
|---|---|---|
| String | 'single quotes' | 'hello world' |
| Integer | digits | 42, -7 |
| Float | digits.digits | 3.14, -0.5 |
| Boolean | TRUE / FALSE | TRUE |
| NULL | NULL | NULL |
| Vector | [floats] | [0.1, 0.2, 0.3] |
| Sparse Vector | {idx: val} | {12: 0.8, 45: 0.3} |
| Parameter | $name | $query_vector |
| Temporal | NOW(), INTERVAL | NOW() - INTERVAL '7 days' |
Score Variables (for ORDER BY and LET)
| Variable | Source | When Populated |
|---|---|---|
vector_score | HNSW dense search | NEAR clause present |
bm25_score | Full-text BM25 | MATCH text clause present |
sparse_score | Sparse vector search | SPARSE_NEAR clause present |
graph_score | Graph traversal | MATCH graph pattern present |
fused_score | After fusion | USING FUSION applied |
similarity() | Primary search score | Any search clause |
Complete Examples
Vector Similarity Search
-- Basic: find 10 nearest documents
SELECT * FROM documents WHERE vector NEAR $query LIMIT 10
-- With similarity score in results
SELECT id, title, similarity() AS score
FROM documents WHERE vector NEAR $query
ORDER BY similarity() DESC LIMIT 5
-- With metadata filter
SELECT * FROM articles
WHERE vector NEAR $query AND category = 'science' AND year >= 2024
LIMIT 10
-- With search quality tuning
SELECT * FROM embeddings WHERE vector NEAR $query
ORDER BY similarity() DESC LIMIT 20
WITH (mode = 'accurate', ef_search = 256)
Full-Text Search
-- Keyword search with BM25 scoring
SELECT * FROM articles WHERE content MATCH 'machine learning' LIMIT 10
-- Case-insensitive pattern matching
SELECT * FROM users WHERE name ILIKE '%john%'
-- Pattern with wildcards
SELECT * FROM products WHERE sku LIKE 'SKU-2024-%'
Hybrid Search (Vector + Text)
-- Dense vector + BM25 text with RRF fusion
SELECT * FROM docs
WHERE vector NEAR $query AND content MATCH 'neural networks'
LIMIT 10 USING FUSION(strategy = 'rrf', k = 60)
-- With custom scoring weights (the evaluated binding 'score' is injected
-- into each row's payload; `SELECT *, expr` is not valid syntax)
LET score = 0.7 * vector_score + 0.3 * bm25_score
SELECT *
FROM documents
WHERE vector NEAR $query AND content MATCH 'transformer'
ORDER BY score DESC
LIMIT 10
-- Dense + sparse vector fusion
SELECT * FROM docs
WHERE vector NEAR $dense_query AND vector SPARSE_NEAR $sparse_query
LIMIT 10 USING FUSION(strategy = 'rrf', k = 60)
-- Multi-vector fusion (text + image embeddings) — inline NEAR_FUSED form.
-- NEAR_FUSED supports only rrf, average, or maximum; 'weighted'/'rsf' are
-- rejected (ill-defined over homogeneous query vectors).
SELECT * FROM products
WHERE vector NEAR_FUSED [$text_emb, $image_emb] USING FUSION 'rrf'
LIMIT 20
Filtering and Aggregation
-- Complex WHERE conditions
SELECT * FROM products
WHERE category IN ('electronics', 'computers')
AND price BETWEEN 100 AND 500
AND brand IS NOT NULL
AND name NOT IN ('discontinued')
LIMIT 50
-- Aggregation with grouping
SELECT category, COUNT(*) AS total, AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING COUNT(*) > 5
ORDER BY avg_price DESC
-- Nested payload field access
SELECT payload.author, payload.metadata.language, COUNT(*)
FROM articles
GROUP BY payload.author, payload.metadata.language
-- Temporal filtering
SELECT * FROM events
WHERE timestamp > NOW() - INTERVAL '7 days'
ORDER BY timestamp DESC
LIMIT 100
-- Deduplicated results
SELECT DISTINCT category FROM products WHERE stock > 0
Graph Queries
-- Basic traversal
MATCH (person:Person)-[:AUTHORED]->(doc:Document)
WHERE similarity(doc.embedding, $query) > 0.7
RETURN person.name, doc.title, similarity() AS score
ORDER BY similarity() DESC
LIMIT 10
-- Multi-hop traversal
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company)
WHERE a.department = 'engineering'
RETURN a.name, b.name, c.name
LIMIT 20
-- Variable-length paths (1 to 3 hops)
MATCH (start:Document)-[*1..3]->(end:Document)
WHERE start.topic = 'AI'
RETURN start.title, end.title
-- Graph mutations
INSERT NODE INTO knowledge (id = 1, payload = '{"name": "Alice", "_labels": ["Author"]}');
INSERT EDGE INTO knowledge (source = 1, target = 2, label = 'AUTHORED_BY')
WITH PROPERTIES (confidence = 0.95, year = 2026);
-- Query edges
SELECT EDGES FROM knowledge WHERE source = 1;
SELECT EDGES FROM knowledge WHERE label = 'AUTHORED_BY' LIMIT 10;
DELETE EDGE 42 FROM knowledge;
Collection Management (DDL)
-- Create vector collection
CREATE COLLECTION documents (dimension = 768)
-- Create with explicit parameters
CREATE COLLECTION embeddings (dimension = 384, metric = 'euclidean')
WITH (storage = 'sq8', m = 16, ef_construction = 200)
-- Create graph collection
CREATE GRAPH COLLECTION knowledge (dimension = 768, metric = 'cosine') SCHEMALESS
-- Create graph with typed schema
CREATE GRAPH COLLECTION ontology (dimension = 768) WITH SCHEMA (
NODE Person (name: STRING, age: INTEGER),
NODE Organization (name: STRING),
EDGE WORKS_AT FROM Person TO Organization
)
-- Create metadata-only collection
CREATE METADATA COLLECTION tags
-- Drop collection safely
DROP COLLECTION IF EXISTS old_data
-- Create a secondary index on a metadata field
CREATE INDEX ON documents (category)
-- Drop a secondary index
DROP INDEX ON documents (category)
Data Manipulation (DML)
-- Insert a row
INSERT INTO documents (id, title, category) VALUES (1, 'Getting Started', 'tutorial')
-- Update fields
UPDATE documents SET status = 'published', updated = TRUE WHERE id = 1
-- Delete with mandatory WHERE
DELETE FROM documents WHERE id IN (10, 11, 12)
DELETE FROM logs WHERE level = 'debug' AND timestamp < 1000
Join Queries
-- Inner join
SELECT o.id, c.name, o.total
FROM orders AS o
JOIN customers AS c ON o.customer_id = c.id
WHERE o.total > 100
-- Self-join
SELECT a.title AS original, b.title AS related
FROM articles AS a
JOIN articles AS b ON a.parent_id = b.id
WHERE a.category = 'tech'
-- Join with USING shorthand
SELECT * FROM orders AS o
JOIN products AS p USING (product_id)
Set Operations
-- Combine results from two queries
SELECT id FROM recommended_docs WHERE user_id = 1
UNION
SELECT id FROM trending_docs WHERE score > 0.8
-- Find common items
SELECT item_id FROM user_1_likes
INTERSECT
SELECT item_id FROM user_2_likes
-- Exclude items
SELECT id FROM all_docs
EXCEPT
SELECT id FROM archived_docs
Advanced Scoring with LET
-- Weighted hybrid scoring
LET relevance = 0.6 * vector_score + 0.4 * bm25_score
SELECT id, title, relevance AS score
FROM articles
WHERE vector NEAR $query AND content MATCH 'deep learning'
ORDER BY relevance DESC
LIMIT 10
-- Chained bindings
LET base = 0.7 * vector_score + 0.3 * bm25_score
LET boosted = base * 1.5
SELECT * FROM docs
WHERE vector NEAR $v AND content MATCH 'AI'
ORDER BY boosted DESC
LIMIT 5
Training
-- Train product quantizer
TRAIN QUANTIZER ON embeddings WITH (m = 8, k = 256)
-- Train with higher compression
TRAIN QUANTIZER ON large_collection WITH (m = 16, k = 256)
Multi-Row INSERT and UPSERT
-- Multi-row insert (batch)
INSERT INTO products (id, title, price) VALUES
(1, 'Laptop', 999.99),
(2, 'Mouse', 29.99),
(3, 'Keyboard', 79.99)
-- Upsert: insert or update if exists
UPSERT INTO products (id, title, price) VALUES (1, 'Laptop Pro', 1299.99)
-- Multi-row upsert
UPSERT INTO products (id, title, price) VALUES
(1, 'Laptop Pro', 1299.99),
(4, 'Monitor', 499.99)
Graph Edge and Node Operations
-- Query all edges in a graph collection
SELECT EDGES FROM knowledge LIMIT 100
-- Query outgoing edges from a specific node
SELECT EDGES FROM knowledge WHERE source = 1
-- Query edges by relationship type
SELECT EDGES FROM knowledge WHERE label = 'AUTHORED_BY'
-- Insert a node with payload into a graph collection
INSERT NODE INTO knowledge (id = 42, payload = '{"name": "Alice", "_labels": ["Person"]}')
Introspection
-- List all collections with their types
SHOW COLLECTIONS
-- Get collection metadata (dimension, metric, point count, type)
DESCRIBE COLLECTION documents
DESCRIBE documents
-- View query execution plan without running
EXPLAIN SELECT * FROM docs WHERE vector NEAR $v AND category = 'tech' LIMIT 10
For instrumented execution (plan + actual rows / time / per-node stats,
plus feedback_calibration once the EMA loop is warm), use
POST /query/explain with "analyze": true — see
EXPLAIN ANALYZE above (EXPLAIN ANALYZE is not a
parsed statement).
Window Functions
-- Rank top-3 results per category by vector similarity
SELECT id, title, category, similarity() AS score,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY similarity() DESC) AS rank_in_cat
FROM articles
WHERE vector NEAR $query
LIMIT 50
-- Dense rank per author by publication year (window ORDER BY accepts
-- columns or similarity(), not aggregate expressions like COUNT(*))
SELECT author, year, COUNT(*) AS papers,
DENSE_RANK() OVER (PARTITION BY author ORDER BY year DESC) AS recency_rank
FROM publications
GROUP BY author, year
Agent Memory
-- Recall the 5 facts most similar to a query embedding (semantic memory)
SELECT * FROM _semantic_memory
WHERE vector NEAR $query
ORDER BY similarity() DESC
LIMIT 5
-- Recent events from the last day (episodic memory)
SELECT * FROM _episodic_memory
WHERE timestamp > NOW() - INTERVAL '1 day'
ORDER BY timestamp DESC
LIMIT 50
-- Recall high-confidence procedures matching a goal embedding.
-- With v1.15.0 ACT-R decay enabled (`with_activation_decay(0.5)`),
-- `confidence` here is the time-decayed effective confidence.
SELECT * FROM _procedural_memory
WHERE vector NEAR $goal AND confidence > 0.6
ORDER BY confidence DESC
LIMIT 10
Administration
-- Create a secondary index for fast metadata lookups
CREATE INDEX ON documents (category)
-- Drop an index
DROP INDEX ON documents (category)
-- Compute CBO statistics for query optimizer
ANALYZE documents
-- Delete all data from a collection (preserves the collection)
TRUNCATE documents
-- Modify collection settings
ALTER COLLECTION documents SET (auto_reindex = true)
-- Flush data to disk (WAL only)
FLUSH
-- Full flush (includes index serialization)
FLUSH FULL
-- Flush a specific collection
FLUSH documents
EBNF Grammar (v3.10.0)
Derived from the executable PEG grammar
crates/velesdb-core/src/velesql/grammar.pest
(the source of truth — regenerate this annex whenever the grammar changes).
(* ═══════════════════════════════════════════════════════ *)
(* Entry point *)
(* ═══════════════════════════════════════════════════════ *)
query = let_clause* (show_collections_stmt | describe_stmt
| explain_stmt | analyze_stmt | truncate_stmt
| alter_collection_stmt | flush_stmt
| match_query | select_edges_stmt
| compound_query | train_stmt
| create_index_stmt | create_collection_stmt
| drop_index_stmt | drop_collection_stmt
| insert_node_stmt | insert_edge_stmt
| delete_edge_stmt | delete_stmt
| insert_stmt | upsert_stmt | update_stmt) [";"] ;
(* ═══════════════════════════════════════════════════════ *)
(* Introspection statements (v3.4) *)
(* ═══════════════════════════════════════════════════════ *)
show_collections_stmt = "SHOW" "COLLECTIONS" ;
describe_stmt = "DESCRIBE" ["COLLECTION"] identifier ;
explain_stmt = "EXPLAIN" compound_query ;
(* ═══════════════════════════════════════════════════════ *)
(* LET bindings (v3.2) *)
(* ═══════════════════════════════════════════════════════ *)
let_clause = "LET" identifier "=" order_by_arithmetic ;
(* ═══════════════════════════════════════════════════════ *)
(* MATCH graph queries (v2.1) *)
(* ═══════════════════════════════════════════════════════ *)
match_query = "MATCH" graph_pattern
[where_clause] return_clause
[order_by_clause] [limit_clause] ;
graph_pattern = node_pattern (relationship_pattern node_pattern)* ;
node_pattern = "(" [node_spec] ")" ;
node_spec = [node_alias] [node_labels] [collection_annotation]
[node_properties] ;
node_alias = identifier ;
node_labels = ":" label_name (":" label_name)* ;
collection_annotation = "@" identifier ;
node_properties = "{" property ("," property)* "}" ;
property = identifier ":" property_value ;
property_value = string | float | integer | boolean | null | parameter ;
relationship_pattern = rel_incoming | rel_outgoing | rel_undirected ;
rel_incoming = "<-" [rel_spec] "-" ;
rel_outgoing = "-" [rel_spec] "->" ;
rel_undirected = "-" [rel_spec] "-" ;
rel_spec = "[" [rel_details] "]" ;
rel_details = [rel_alias] [rel_types] [rel_range] [node_properties] ;
rel_alias = identifier ;
rel_types = ":" rel_type_name ("|" rel_type_name)* ;
rel_range = "*" [range_spec] ;
range_spec = range_bound ".." [range_bound]
| ".." range_bound
| integer ;
return_clause = "RETURN" return_item ("," return_item)* ;
return_item = return_expr ["AS" identifier] ;
return_expr = "similarity" "(" ")" | property_access | identifier | "*" ;
property_access = identifier "." identifier ;
(* ═══════════════════════════════════════════════════════ *)
(* SELECT + compound queries *)
(* ═══════════════════════════════════════════════════════ *)
compound_query = select_stmt (set_operator select_stmt)* ;
set_operator = "UNION" "ALL" | "UNION" | "INTERSECT" | "EXCEPT" ;
select_stmt = "SELECT" [distinct_modifier] select_list
"FROM" from_clause join_clause*
[where_clause]
[group_by_clause] [having_clause]
[order_by_clause]
[limit_clause] [offset_clause]
[with_clause]
[using_fusion_clause] ;
distinct_modifier = "DISTINCT" ;
(* SELECT list *)
select_list = "*" | select_item ("," select_item)* ;
select_item = "similarity" "(" ")" ["AS" identifier]
| window_item
| aggregate_function ["AS" identifier]
| qualified_wildcard
| column ["AS" identifier] ;
qualified_wildcard = identifier "." "*" ;
column = identifier ("." identifier)* ;
(* Window functions (v1.13.0) *)
window_item = window_function_name "(" ")" "OVER" "(" over_clause ")"
["AS" identifier] ;
window_function_name = "ROW_NUMBER" | "DENSE_RANK" | "RANK" ;
over_clause = [partition_by_clause] [window_order_by_clause] ;
partition_by_clause = "PARTITION" "BY" column ("," column)* ;
window_order_by_clause = "ORDER" "BY" window_order_by_item
("," window_order_by_item)* ;
window_order_by_item = (order_by_similarity_bare | column) ["ASC" | "DESC"] ;
(* FROM clause — alias with AS or bare (non-reserved identifier) *)
from_clause = identifier [["AS"] identifier] ;
(* JOIN clause (v2.0) *)
join_clause = [join_type] "JOIN" identifier [["AS"] identifier]
(on_clause | using_clause) ;
join_type = "LEFT" ["OUTER"]
| "RIGHT" ["OUTER"]
| "FULL" ["OUTER"]
| "INNER" ;
on_clause = "ON" column_ref "=" column_ref ;
using_clause = "USING" "(" identifier ("," identifier)* ")" ;
column_ref = identifier "." identifier ;
(* ═══════════════════════════════════════════════════════ *)
(* WHERE clause *)
(* ═══════════════════════════════════════════════════════ *)
where_clause = "WHERE" or_expr ;
or_expr = and_expr ("OR" and_expr)* ;
and_expr = primary_expr ("AND" primary_expr)* ;
primary_expr = "(" or_expr ")"
| not_expr
| graph_match_expr
| similarity_expr
| vector_fused_search
| sparse_vector_search
| vector_search
| match_expr
| in_expr
| between_expr
| like_expr
| is_null_expr
| contains_text_expr
| contains_expr
| geo_distance_expr
| geo_bbox_expr
| compare_expr ;
not_expr = "NOT" primary_expr ;
graph_match_expr = "MATCH" graph_pattern ;
(* Vector operations *)
vector_search = "vector" "NEAR" vector_expr ;
vector_fused_search = "vector" "NEAR_FUSED" vector_array [fusion_clause_inline] ;
sparse_vector_search = "vector" "SPARSE_NEAR" sparse_vector_expr ["USING" string] ;
similarity_expr = "similarity" "(" similarity_field "," vector_expr ")"
compare_op numeric_threshold ;
vector_expr = vector_literal | parameter ;
vector_literal = "[" (float | integer) ("," (float | integer))* "]" ;
vector_array = "[" vector_expr ("," vector_expr)* "]" ;
sparse_vector_expr = sparse_literal | parameter ;
sparse_literal = "{" sparse_entry ("," sparse_entry)* "}" ;
sparse_entry = integer ":" float ;
fusion_clause_inline = "USING" "FUSION" string ["(" fusion_param_list ")"] ;
fusion_param_list = fusion_param ("," fusion_param)* ;
fusion_param = identifier "=" (float | integer) ;
(* Comparisons *)
compare_expr = where_column compare_op value ;
compare_op = ">=" | "<=" | "<>" | "!=" | "=" | ">" | "<" ;
where_column = identifier ("." identifier)* ;
(* Special conditions *)
in_expr = where_column ["NOT"] "IN" "(" value ("," value)* ")" ;
between_expr = where_column "BETWEEN" value "AND" value ;
like_expr = where_column ("ILIKE" | "LIKE") string ;
is_null_expr = where_column "IS" ["NOT"] "NULL" ;
match_expr = where_column "MATCH" string ;
(* Array / substring containment *)
contains_text_expr = where_column "CONTAINS_TEXT" string ;
contains_expr = where_column "CONTAINS" "ALL" "(" value ("," value)* ")"
| where_column "CONTAINS" "ANY" "(" value ("," value)* ")"
| where_column "CONTAINS" value ;
(* Geospatial predicates *)
geo_number = float | integer ;
geo_distance_expr = "GEO_DISTANCE" "(" column "," geo_number "," geo_number ")"
compare_op geo_number ;
geo_bbox_expr = "GEO_BBOX" "(" column "," geo_number "," geo_number ","
geo_number "," geo_number ")" ;
(* ═══════════════════════════════════════════════════════ *)
(* GROUP BY, HAVING (v2.0) *)
(* ═══════════════════════════════════════════════════════ *)
group_by_clause = "GROUP" "BY" column ("," column)* ;
having_clause = "HAVING" having_condition (("AND" | "OR") having_condition)* ;
having_condition = aggregate_function compare_op value ;
aggregate_function = ("FIRST" | "COUNT" | "SUM" | "AVG" | "MIN" | "MAX")
"(" ("*" | "score" | column) ")" ;
(* ═══════════════════════════════════════════════════════ *)
(* ORDER BY (v2.0 + arithmetic v3.0) *)
(* ═══════════════════════════════════════════════════════ *)
order_by_clause = "ORDER" "BY" order_by_item ("," order_by_item)* ;
order_by_item = order_by_expr ["ASC" | "DESC"] ;
order_by_expr = aggregate_function
| property_access
| order_by_arithmetic ;
order_by_arithmetic = oa_additive ;
oa_additive = oa_multiplicative (("+" | "-") oa_multiplicative)* ;
oa_multiplicative = oa_atom (("*" | "/") oa_atom)* ;
oa_atom = float | integer
| order_by_similarity
| order_by_similarity_bare
| "(" oa_additive ")"
| identifier ;
order_by_similarity = "similarity" "(" similarity_field "," vector_expr ")" ;
order_by_similarity_bare = "similarity" "(" ")" ;
similarity_field = (letter | "_") (letter | digit | "_" | ".")* ;
(* ═══════════════════════════════════════════════════════ *)
(* LIMIT, OFFSET, WITH, USING FUSION *)
(* ═══════════════════════════════════════════════════════ *)
limit_clause = "LIMIT" integer ;
offset_clause = "OFFSET" integer ;
with_clause = "WITH" "(" with_option ("," with_option)* ")" ;
with_option = identifier "=" with_value ;
with_value = string | float | integer | boolean | identifier ;
using_fusion_clause = "USING" "FUSION" ["(" fusion_option_list ")"] ;
fusion_option_list = fusion_option ("," fusion_option)* ;
fusion_option = identifier "=" (string | float | integer) ;
(* ═══════════════════════════════════════════════════════ *)
(* INSERT statement (v3.2) *)
(* ═══════════════════════════════════════════════════════ *)
insert_stmt = "INSERT" "INTO" identifier
"(" identifier ("," identifier)* ")"
"VALUES" values_row ("," values_row)* ;
values_row = "(" value ("," value)* ")" ;
(* ═══════════════════════════════════════════════════════ *)
(* UPDATE statement (v3.2) *)
(* ═══════════════════════════════════════════════════════ *)
update_stmt = "UPDATE" identifier "SET"
assignment ("," assignment)*
[where_clause] ;
assignment = identifier "=" value ;
(* ═══════════════════════════════════════════════════════ *)
(* DELETE statement (v3.3) *)
(* ═══════════════════════════════════════════════════════ *)
delete_stmt = "DELETE" "FROM" identifier where_clause ;
(* ═══════════════════════════════════════════════════════ *)
(* Graph mutation statements (v3.3) *)
(* ═══════════════════════════════════════════════════════ *)
insert_edge_stmt = "INSERT" "EDGE" "INTO" identifier
"(" edge_field ("," edge_field)* ")"
["WITH" "PROPERTIES" "(" create_option_list ")"] ;
edge_field = identifier "=" value ;
delete_edge_stmt = "DELETE" "EDGE" value "FROM" identifier ;
(* ═══════════════════════════════════════════════════════ *)
(* DDL: CREATE / DROP COLLECTION, INDEX (v3.3 / v3.5) *)
(* ═══════════════════════════════════════════════════════ *)
create_index_stmt = "CREATE" "INDEX" "ON" identifier "(" identifier ")" ;
drop_index_stmt = "DROP" "INDEX" "ON" identifier "(" identifier ")" ;
create_collection_stmt = "CREATE" ["GRAPH" | "METADATA"] "COLLECTION" identifier
[create_body] ;
create_body = "(" create_option_list ")" [create_suffix] ;
create_option_list = create_option ("," create_option)* ;
create_option = identifier "=" create_option_value ;
create_option_value = string | float | integer | boolean | identifier ;
create_suffix = "SCHEMALESS"
| "WITH" "SCHEMA" "(" schema_def_list ")"
| with_clause ;
schema_def_list = schema_def ("," schema_def)* ;
schema_def = node_type_def | edge_type_def ;
node_type_def = "NODE" identifier "(" property_def_list ")" ;
edge_type_def = "EDGE" identifier "FROM" identifier "TO" identifier ;
property_def_list = property_def ("," property_def)* ;
property_def = identifier ":" type_name ;
type_name = "STRING" | "INTEGER" | "FLOAT" | "BOOLEAN" | "VECTOR" ;
drop_collection_stmt = "DROP" "COLLECTION" ["IF" "EXISTS"] identifier ;
(* ═══════════════════════════════════════════════════════ *)
(* TRAIN QUANTIZER (v2.2) *)
(* ═══════════════════════════════════════════════════════ *)
train_stmt = "TRAIN" "QUANTIZER" "ON" identifier with_clause ;
(* ═══════════════════════════════════════════════════════ *)
(* Admin statements (v3.5-3.6) *)
(* ═══════════════════════════════════════════════════════ *)
analyze_stmt = "ANALYZE" ["COLLECTION"] identifier ;
truncate_stmt = "TRUNCATE" ["COLLECTION"] identifier ;
alter_collection_stmt = "ALTER" "COLLECTION" identifier "SET"
"(" create_option_list ")" ;
flush_stmt = "FLUSH" ["FULL"] [identifier] ;
(* ═══════════════════════════════════════════════════════ *)
(* UPSERT (v3.5) *)
(* ═══════════════════════════════════════════════════════ *)
upsert_stmt = "UPSERT" "INTO" identifier
"(" identifier ("," identifier)* ")"
"VALUES" values_row ("," values_row)* ;
(* ═══════════════════════════════════════════════════════ *)
(* Graph edge/node queries (v3.5) *)
(* ═══════════════════════════════════════════════════════ *)
select_edges_stmt = "SELECT" "EDGES" "FROM" identifier
[where_clause] [limit_clause] ;
insert_node_stmt = "INSERT" "NODE" "INTO" identifier
"(" edge_field ("," edge_field)* ")" ;
(* ═══════════════════════════════════════════════════════ *)
(* Subqueries (v3.2) *)
(* ═══════════════════════════════════════════════════════ *)
subquery_expr = "(" "SELECT" select_list "FROM" identifier
[where_clause] [group_by_clause] [having_clause]
[limit_clause] ")" ;
(* ═══════════════════════════════════════════════════════ *)
(* Values and literals *)
(* ═══════════════════════════════════════════════════════ *)
value = subquery_expr | temporal_expr | float | integer
| string | boolean | null | parameter ;
temporal_expr = temporal_arithmetic | now_function | interval_expr ;
temporal_arithmetic = (now_function | interval_expr) ("+" | "-")
(now_function | interval_expr) ;
now_function = "NOW" "(" ")" ;
interval_expr = "INTERVAL" string ;
string = "'" (char | "''")* "'" ;
integer = ["-"] digit+ ;
float = ["-"] digit+ "." digit+ ;
boolean = "TRUE" | "FALSE" ;
null = "NULL" ;
parameter = "$" identifier ;
identifier = quoted_identifier | regular_identifier ;
regular_identifier = (letter | "_") (letter | digit | "_")* ;
quoted_identifier = "`" backtick_inner "`" | '"' doublequote_inner '"' ;
backtick_inner = (!"`" ANY)+ ;
doublequote_inner = (doublequote_escape | !('"') ANY)* ;
doublequote_escape = '""' ;
(* ═══════════════════════════════════════════════════════ *)
(* Whitespace and comments *)
(* ═══════════════════════════════════════════════════════ *)
WHITESPACE = " " | "\t" | "\r" | "\n" ;
COMMENT = "--" (!"\n" ANY)* ;
License
VelesDB Core License 1.0