VelesQL Language Specification

June 30, 2026 · View on GitHub

SQL-like query language for vector + graph + column-store search in VelesDB.

Version: 3.10.0 | Last Updated: 2026-06-30 (VelesDB v3.5.0)


Overview

VelesQL is a SQL-inspired query language designed for hybrid search workloads. It combines familiar SQL syntax with vector-specific operations (NEAR, SPARSE_NEAR, NEAR_FUSED), graph pattern matching (MATCH), full-text search (MATCH BM25), and collection management (CREATE, DROP).

All keywords are case-insensitive: SELECT, select, and Select are equivalent. Identifiers (collection names, column names) are case-sensitive.

Feature Support Status

FeatureStatusVersion
SELECT, FROM, WHEREStable1.0
NEAR vector searchStable1.0
similarity() functionStable1.3
LIMIT, OFFSETStable1.0
WITH clauseStable1.0
ORDER BYStable2.0
GROUP BY, HAVINGStable2.0
JOIN (INNER ... ON)Stable2.0
JOIN (LEFT, RIGHT, FULL OUTER)Stable (column-store, primary-key join only — see crates/velesdb-core/src/collection/search/query/join.rs:155-171)2.0
JOIN USINGStable (single-column only; multi-column USING (a, b, …) parses but is rejected at execution by validate_join_condition)2.0
Set Operations (UNION, INTERSECT, EXCEPT)Stable2.0
USING FUSIONStable2.0
NOW() / INTERVAL temporalStable2.1
MATCH graph traversalStable2.1
SPARSE_NEAR sparse vector searchStable2.2
NEAR_FUSED multi-vector fusionStable2.2
TRAIN QUANTIZER commandStable2.2
ORDER BY arithmetic scoringStable3.0
LET score bindingsStable3.2
Agent Memory VelesQL queriesStable3.2
INSERT INTO statementStable3.2
UPDATE statementStable3.2
CREATE COLLECTIONStable3.3
DROP COLLECTIONStable3.3
INSERT EDGE graph mutationStable3.3
DELETE FROM statementStable3.3
DELETE EDGE graph mutationStable3.3
DISTINCT modifierStable3.3
ILIKE case-insensitive patternStable3.3
FROM / JOIN aliasesStable (INNER JOIN)2.0
Scalar subqueriesExecuted and substituted as a literal; correlated ones rejected (V010, see below)3.2
SQL comments (--)Stable1.0
Identifier quoting (backtick, double-quote)Stable1.3
SHOW COLLECTIONSStable3.4
DESCRIBE COLLECTIONStable3.4
EXPLAIN query planStable3.4
EXPLAIN ANALYZE execution statsAPI-level (via explain_analyze_query() and /query/explain?analyze=true, not a parsed VelesQL statement)3.8
CREATE INDEX / DROP INDEXStable3.5
ANALYZEStable3.5
TRUNCATEStable3.5
ALTER COLLECTIONStable3.5
Multi-row INSERTStable3.5
UPSERT INTO statementStable3.5
WITH (quality='...') aliasStable3.5
SELECT EDGES graph queryStable3.5
INSERT NODE graph mutationStable3.5
FLUSH / FLUSH FULLStable3.6
CONTAINS array filterStable3.7
GEO_DISTANCE / GEO_BBOX geospatialStable3.7
GROUP BY MAX(score) / AVG(score)Stable3.7
FIRST(column) projectionStable3.7
CONTAINS_TEXT strict text filterStable3.8
Window functions (ROW_NUMBER, RANK, DENSE_RANK) with OVER, PARTITION BY, ORDER BYStable3.9 (VelesDB v1.13.0)
CBO feedback calibration in EXPLAIN ANALYZEStable3.10 (VelesDB v1.15.0)
FUSE BY fusion clausePlanned--

REST Contract Notes

  • /query supports top-level MATCH, but request body must include collection.
  • /collections/{name}/match is the collection-scoped graph endpoint.
  • Canonical payload contract: docs/reference/VELESQL_CONTRACT.md.
  • Conformance test matrix: docs/reference/VELESQL_CONFORMANCE_MATRIX.md.
  • Recommended developer syntax for mixed filters: SELECT ... FROM <collection> WHERE ... AND MATCH (...). When FROM declares an alias, the MATCH anchor either reuses it or — when no pattern alias matches a declared alias — binds implicitly to the FROM rows (rule V011 — see Graph Match Predicate in WHERE).

Comments

VelesQL supports single-line comments with --. Everything after -- to the end of the line is ignored by the parser.

-- This is a comment
SELECT * FROM docs LIMIT 10  -- inline comment after a statement

Block comments (/* ... */) are not supported.


SELECT Statement

The SELECT statement is the primary way to query data from collections.

Full Syntax

[LET <name> = <expr> ...]
SELECT [DISTINCT] <columns> [, <window_fn>() OVER (...) [AS <alias>] ...]
FROM <collection> [[AS] <alias>]
[JOIN <collection2> [[AS] <alias>] ON <condition> | USING (<column>)]
[WHERE <conditions>]
[GROUP BY <columns>]
[HAVING <aggregate_condition>]
[ORDER BY <expressions>]
[LIMIT <n>]
[OFFSET <n>]
[WITH (<options>)]
[USING FUSION (<strategy>)]

SELECT Clause

Select All Columns

SELECT * FROM documents

Select Specific Columns

SELECT id, score FROM documents
SELECT id, payload.title, payload.category FROM documents

Nested Payload Fields

Access nested JSON fields using dot notation at any depth:

SELECT payload.metadata.author FROM articles
SELECT payload.stats.views, payload.stats.likes FROM posts

Column Aliases

Rename output columns with AS:

SELECT id, payload.title AS title FROM docs
SELECT id AS doc_id, category AS cat FROM products

DISTINCT Modifier

Eliminate duplicate rows from the result set. DISTINCT applies to ALL selected columns -- two rows are considered duplicates only when every selected column has the same value.

-- Unique categories
SELECT DISTINCT category FROM products

-- Unique authors from nested payload
SELECT DISTINCT payload.author FROM docs

-- DISTINCT with WHERE filter
SELECT DISTINCT category FROM products WHERE price > 50

-- DISTINCT on multiple columns (unique combinations)
SELECT DISTINCT category, status FROM products

When combined with vector search, DISTINCT deduplicates based on payload values after the search results are returned:

SELECT DISTINCT payload.source FROM docs WHERE vector NEAR $v LIMIT 20

Qualified Wildcard

Select all columns from a specific alias in a JOIN:

SELECT ctx.* FROM docs AS ctx WHERE vector NEAR $v LIMIT 10

Aggregate Functions in SELECT

SELECT COUNT(*) FROM docs
SELECT category, COUNT(*), AVG(price), MAX(rating) FROM products GROUP BY category
SELECT SUM(quantity) AS total FROM orders
FunctionDescriptionArgument
COUNT(*)Count all rowsWildcard
COUNT(col)Count non-null values in columnColumn name
SUM(col)Sum of numeric valuesColumn name
AVG(col)Average of numeric valuesColumn name
MIN(col)Minimum valueColumn name
MAX(col)Maximum valueColumn name
MAX(score)Max similarity score across group (v3.7+)score pseudo-column
AVG(score)Mean similarity score across group (v3.7+)score pseudo-column
FIRST(col)Value from highest-scoring row in group (v3.7+)Column name

Similarity Score in SELECT

Use the zero-argument similarity() to project the pre-computed search score:

SELECT similarity() FROM docs WHERE vector NEAR $v LIMIT 10
SELECT similarity() AS score FROM docs WHERE vector NEAR $v LIMIT 10
SELECT id, payload.title, similarity() AS relevance
FROM docs WHERE vector NEAR $v ORDER BY similarity() DESC LIMIT 5

FROM Clause

Specify the source collection. An optional alias enables shorter column references and is required for self-joins. The AS keyword is optional: FROM documents d and FROM documents AS d are strictly equivalent. A bare alias may not be a reserved clause keyword (WHERE, GROUP, HAVING, ORDER, LIMIT, OFFSET, WITH, USING, UNION, INTERSECT, EXCEPT, INNER, LEFT, RIGHT, FULL, OUTER, JOIN, ON, AS); quote it (`limit`) to use one anyway.

-- Simple
SELECT * FROM my_collection

-- With alias
SELECT * FROM documents AS d

-- Bare alias (no AS) — identical semantics
SELECT * FROM documents d

-- Alias used in column references
SELECT d.title, d.category FROM documents AS d WHERE d.price > 50

JOIN Clause (v2.0+)

Combine data from multiple collections.

Syntax

SELECT <columns>
FROM <table1> [[AS] <alias1>]
[INNER | LEFT | RIGHT | FULL] JOIN <table2> [[AS] <alias2>]
  ON <alias1>.<col> = <alias2>.<col>

Join Types

TypeKeywordDescriptionRuntime
InnerJOIN or INNER JOINOnly matching rows from both sidesFully executed
LeftLEFT JOIN or LEFT OUTER JOINAll left rows; unmatched right side gets NULLs for joined columnsFully executed (column-store, primary-key only — see collection/search/query/join.rs:164-171)
RightRIGHT JOIN or RIGHT OUTER JOINAll right rows; unmatched left side gets NULLsFully executed (join.rs:155-162)
FullFULL JOIN or FULL OUTER JOINAll rows from both sides with NULLs on the unmatched sideFully executed (join.rs:155-171)

Constraint: all four JOIN types require the join column to be the right-side collection's column-store primary key. The executor rejects JOIN ... ON other_col = ... at runtime with "JOIN on table 'T' requires primary key 'PK', got 'other_col'" (see validate_join_condition in join.rs:177-201).

ON Condition

Specify the join equality condition:

SELECT o.id, c.name
FROM orders AS o
JOIN customers AS c ON o.customer_id = c.id

USING Clause

Shorthand when both tables share the same column name:

SELECT o.id, c.name
FROM orders AS o
JOIN customers AS c USING (customer_id)

Limitation: USING currently supports a single column only. Multi-column USING (a, b, …) parses successfully but is rejected at execution by validate_join_condition (collection/search/query/join.rs:177) with the error "JOIN on table 'T' must use ON condition or USING(single_column)".

Self-Join

Join a collection with itself using aliases:

SELECT e.name AS employee, m.name AS manager
FROM employees AS e
JOIN employees AS m ON e.manager_id = m.id

Multiple Joins

Chain multiple JOINs in a single query:

SELECT e.name, m.name AS manager, d.name AS director
FROM employees AS e
JOIN employees AS m ON e.manager_id = m.id
JOIN employees AS d ON m.manager_id = d.id
SELECT p.title, c.name AS category_name
FROM products AS p
JOIN categories AS c ON p.category_id = c.id
WHERE similarity(p.embedding, $query) > 0.7
LIMIT 20

JOIN Optimizations (v1.12+)

The query engine automatically applies two optimizations to cross-collection JOINs:

Filter Pushdown: WHERE conditions that reference the joined table (using qualified column names like inventory.price > 100) are automatically pushed down and evaluated before loading points into the ColumnStore. This reduces memory usage and I/O for large joined collections.

-- inventory.stock > 0 is pushed down to filter inventory before JOIN
SELECT * FROM products
JOIN inventory ON products.id = inventory.id
WHERE inventory.stock > 0

Lookup Join: When the JOIN condition references the primary key (id) on both sides and no pushdown filters apply, the engine uses direct collection.get() lookups instead of building a full ColumnStore, achieving O(K) instead of O(N) retrieval.

-- Uses optimized lookup path (both sides reference id)
SELECT * FROM products
JOIN inventory ON products.id = inventory.id

Both optimizations are transparent and produce identical results to the non-optimized path.


WHERE Clause

The WHERE clause filters rows using conditions. Multiple conditions are combined with AND, OR, and NOT.

Comparison Operators

OperatorDescriptionExample
=Equalcategory = 'tech'
!= or <>Not equalstatus != 'deleted'
>Greater thanprice > 100
>=Greater or equalscore >= 0.8
<Less thancount < 50
<=Less or equalrating <= 5
SELECT * FROM products WHERE price > 100 AND category = 'electronics'
SELECT * FROM users WHERE age >= 18 AND age <= 65
SELECT * FROM docs WHERE status <> 'deleted'

Logical Operators (AND, OR, NOT)

PriorityOperatorDescription
1 (highest)NOTNegation
2ANDConjunction
3 (lowest)ORDisjunction
-- AND
SELECT * FROM docs WHERE category = 'tech' AND price > 50

-- OR
SELECT * FROM docs WHERE category = 'tech' OR category = 'science'

-- Parentheses to override precedence
SELECT * FROM docs WHERE (category = 'tech' OR category = 'ai') AND price > 100

NOT Expression

The NOT operator negates any condition, including comparisons, IN, BETWEEN, and parenthesized groups.

-- Negate a comparison
SELECT * FROM docs WHERE NOT (price > 100)

-- Negate an IN list
SELECT * FROM docs WHERE NOT (category IN ('draft', 'archived'))

-- Negate a compound condition
SELECT * FROM docs WHERE NOT (status = 'deleted' AND archived = true)

-- NOT with vector search filter
SELECT * FROM docs
WHERE vector NEAR $v AND NOT (category = 'spam')
LIMIT 10

IN / NOT IN

Test membership in a list of values:

-- String values
SELECT * FROM docs WHERE category IN ('tech', 'science', 'ai')

-- Integer values
SELECT * FROM docs WHERE id IN (1, 2, 3, 4, 5)

-- NOT IN
SELECT * FROM docs WHERE category NOT IN ('draft', 'deleted')
SELECT * FROM docs WHERE id NOT IN (1, 2, 3)

Index acceleration: When a secondary index exists on the filtered column, IN uses bitmap pre-filtering to restrict HNSW traversal to matching points only. The engine builds a RoaringBitmap by unioning per-value B-tree lookups (time complexity: O(N × log K) where N = list size, K = index cardinality). NOT IN on indexed fields computes the universe bitmap minus the IN bitmap (set subtraction). Queries on non-indexed fields fall back to post-filtering transparently. Results are identical in both paths.

BETWEEN

Test if a value falls within a range (inclusive):

SELECT * FROM docs WHERE price BETWEEN 10 AND 100
SELECT * FROM docs WHERE date BETWEEN '2024-01-01' AND '2024-12-31'
SELECT * FROM products WHERE rating BETWEEN 3.5 AND 5.0

LIKE (Pattern Matching)

Case-sensitive pattern matching with wildcards:

WildcardMeaning
%Any sequence of characters (including empty)
_Exactly one character
-- Suffix match
SELECT * FROM docs WHERE title LIKE '%database%'

-- Prefix match
SELECT * FROM docs WHERE name LIKE 'vec%'

-- Single character wildcard
SELECT * FROM docs WHERE code LIKE 'v_les'

-- Combined with other filters
SELECT * FROM docs WHERE name LIKE 'John%' AND active = true

ILIKE (Case-Insensitive Pattern Matching)

ILIKE works identically to LIKE but performs case-insensitive matching. The same % and _ wildcards apply.

-- Matches 'Database', 'DATABASE', 'database', etc.
SELECT * FROM docs WHERE title ILIKE '%database%'

-- Case-insensitive prefix match
SELECT * FROM docs WHERE name ILIKE 'vec%'

-- Single-character wildcard (case-insensitive)
SELECT * FROM docs WHERE code ILIKE 'v_les'

-- Combined with other filters
SELECT * FROM docs
WHERE category ILIKE '%science%' AND price > 50
LIMIT 20
OperatorCase-SensitiveExample Match for 'hello'
LIKEYesLIKE 'hello' matches hello only
ILIKENoILIKE 'hello' matches Hello, HELLO, hello

Unicode behavior: case folding follows Rust's default Unicode lowercase rules. Characters without a defined case mapping (e.g., CJK ideographs) are compared as-is.

IS NULL / IS NOT NULL

Test for null values:

SELECT * FROM docs WHERE category IS NULL
SELECT * FROM docs WHERE category IS NOT NULL
SELECT * FROM users WHERE email IS NOT NULL AND verified = true

CONTAINS (Array Filter, v3.7+)

Test whether an array field contains specific values. Three modes are supported:

SyntaxDescription
column CONTAINS valueArray contains the single value
column CONTAINS ANY (v1, v2, ...)Array contains at least one of the values
column CONTAINS ALL (v1, v2, ...)Array contains every listed value
-- Single value containment
SELECT * FROM hotels WHERE amenities CONTAINS 'pool' LIMIT 10

-- At least one match (OR semantics)
SELECT * FROM hotels WHERE amenities CONTAINS ANY ('spa', 'bar') LIMIT 10

-- All values must be present (AND semantics)
SELECT * FROM hotels WHERE amenities CONTAINS ALL ('pool', 'gym') LIMIT 10

-- Combined with scalar filters
SELECT * FROM hotels
WHERE amenities CONTAINS 'pool' AND rating > 4.5
LIMIT 10

-- Combined with vector search
SELECT * FROM hotels
WHERE amenities CONTAINS 'spa' OR vector NEAR $v
LIMIT 10

-- Integer arrays
SELECT * FROM items WHERE scores CONTAINS 10 LIMIT 10

-- NOT negation
SELECT * FROM hotels WHERE NOT (amenities CONTAINS 'pool') LIMIT 10

-- Multiple array fields
SELECT * FROM hotels
WHERE amenities CONTAINS 'gym' AND tags CONTAINS 'luxury'
LIMIT 10

Behavior on edge cases:

  • Empty array ([]): CONTAINS returns no match
  • Null/missing field: row excluded from results
  • Non-array field: CONTAINS returns no match (use LIKE for substring matching)
  • Non-existent field: returns empty result set

Full-Text Search (MATCH)

The MATCH operator performs BM25 full-text search against the collection's text index.

MATCH alone -- returns BM25 text search results ranked by text relevance:

SELECT * FROM docs WHERE content MATCH 'vector database' LIMIT 10

MATCH + NEAR (hybrid search) -- triggers Reciprocal Rank Fusion (RRF) between vector similarity and BM25 text relevance. In this mode, MATCH acts as a score boost, not a strict filter. Results that do not contain the keyword may still appear with lower fused scores:

-- Hybrid: vector similarity fused with BM25 text relevance via RRF
SELECT * FROM docs WHERE vector NEAR $v AND content MATCH 'database' LIMIT 10

Multi-condition hybrid search:

SELECT * FROM docs
WHERE vector NEAR $v AND title MATCH 'vector database' AND category = 'tech'
LIMIT 10

Known Limitations (v1.9.0):

  • Column parameter ignored: The column name (e.g., content) is parsed but the execution engine searches all indexed text fields regardless.

Tip: For strict text filtering (exclude results that do not contain a keyword), use CONTAINS_TEXT instead of MATCH. See the CONTAINS_TEXT section below.

Strict Text Filter (CONTAINS_TEXT, v3.8+)

The CONTAINS_TEXT operator performs case-sensitive substring matching on a string payload field. Unlike MATCH (which boosts via RRF), CONTAINS_TEXT is a strict filter: any result whose target field does not contain the specified substring is excluded from the result set.

column CONTAINS_TEXT 'substring'

The keyword is case-insensitive (contains_text, Contains_Text, CONTAINS_TEXT are all valid). The substring match itself is case-sensitive.

Standalone metadata filter — returns all documents where the field contains the substring:

SELECT * FROM docs WHERE content CONTAINS_TEXT 'rust' LIMIT 10

With vector search (hybrid strict filter) — applies as a post-filter on vector search results, guaranteeing every returned result contains the substring:

-- Every result is guaranteed to contain 'database' in the content field
SELECT * FROM docs
WHERE vector NEAR $v AND content CONTAINS_TEXT 'database'
LIMIT 10

Combined with MATCH (boost + strict filter)MATCH provides RRF score boosting while CONTAINS_TEXT enforces strict inclusion:

-- MATCH boosts results mentioning 'database' via RRF
-- CONTAINS_TEXT guarantees every result actually contains 'database'
SELECT * FROM docs
WHERE vector NEAR $v
  AND content MATCH 'database'
  AND content CONTAINS_TEXT 'database'
LIMIT 10

Combined with other filters:

SELECT * FROM docs
WHERE content CONTAINS_TEXT 'rust' AND category = 'tech'
LIMIT 10

MATCH vs CONTAINS_TEXT

FeatureMATCHCONTAINS_TEXT
PurposeBM25 text search / RRF boostStrict substring filter
With NEARBoosts score via RRF fusionExcludes non-matching results
Non-matching resultsMay still appear (lower score)Always excluded
MatchingBM25 tokenized relevanceCase-sensitive str::contains()
Best for"Rank by text relevance""Must contain this text"

Edge Cases

ScenarioBehavior
CONTAINS_TEXT '' (empty string)Matches all string fields (every string contains "")
Missing payload fieldReturns false (row excluded)
Non-string field (e.g., integer)Returns false (row excluded)
All results filtered outEmpty result set, no error
Unicode textSupported (case-sensitive byte-level matching)

Vector Search (NEAR)

Use NEAR for approximate nearest neighbor search on dense vectors:

-- With parameter placeholder
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10

-- With literal vector
SELECT * FROM docs WHERE vector NEAR [0.1, 0.2, 0.3, 0.4] LIMIT 5

-- With metadata filter
SELECT * FROM docs
WHERE vector NEAR $v AND category = 'tech' AND price > 50
LIMIT 10

-- With similarity score projection and ordering
SELECT id, title, similarity() AS score
FROM docs WHERE vector NEAR $query
ORDER BY similarity() DESC
LIMIT 5

-- With search quality tuning
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10
WITH (mode = 'accurate', ef_search = 256)

Sparse Vector Search (SPARSE_NEAR, v2.2+)

Use SPARSE_NEAR for sparse vector similarity search (SPLADE, BM42, or custom sparse embeddings).

-- With parameter placeholder
SELECT * FROM docs WHERE vector SPARSE_NEAR $sv LIMIT 10

-- With inline sparse vector literal (dimension_index: weight)
SELECT * FROM docs
WHERE vector SPARSE_NEAR {0: 0.9, 14: 0.7, 256: 0.5, 1024: 0.3}
LIMIT 10

-- With named sparse index
SELECT * FROM docs
WHERE vector SPARSE_NEAR $sparse USING 'bm42_index'
LIMIT 10

-- Combined dense + sparse (hybrid)
SELECT * FROM docs
WHERE vector NEAR $dense AND vector SPARSE_NEAR $sparse
LIMIT 10

Sparse vectors use {index: weight} format where index is a non-negative integer (dimension) and weight is a float value. Scoring uses inner product with MaxScore DAAT for efficient top-K retrieval.

Multi-Vector Fusion (NEAR_FUSED, v2.2+)

NEAR_FUSED combines multiple embedding vectors into a single similarity search using a fusion strategy (multi-modal or ensemble search). It is executed by routing to the engine's multi-vector fusion (multi_query_search): each query vector runs its own search and the rankings are fused. USING FUSION 'rrf' (default, k=60), 'average', and 'maximum' are honored; other strategies fall back to RRF. An optional AND <metadata> predicate is applied as a pre-fusion filter.

NEAR_FUSED must be the only vector predicate in the WHERE: combining it with OR/NOT, or with another NEAR / similarity() / SPARSE_NEAR, is rejected (it can only be AND-ed with a metadata filter).

-- Two-vector fusion with RRF
SELECT * FROM products
WHERE vector NEAR_FUSED [$text_emb, $image_emb]
USING FUSION 'rrf' (k=60)
LIMIT 20

-- Three-vector ensemble
SELECT * FROM docs
WHERE vector NEAR_FUSED [$query_v1, $query_v2, $query_v3]
USING FUSION 'maximum'
LIMIT 10

-- Without explicit fusion (defaults to RRF)
SELECT * FROM docs
WHERE vector NEAR_FUSED [$v1, $v2]
LIMIT 5

The vector array [$v1, $v2, ...] accepts parameters ($name) or vector literals ([0.1, 0.2, ...]).

StrategyBest ForParameters
rrfGeneral-purpose ensemble (default)k (default: 60)
averageBalanced score blending(none)
maximumConservative high-precision(none)

Only rrf, average, and maximum are honored. Any other strategy name (e.g. rsf, weighted) is rejected at validation — per-branch dense/sparse weights are ill-defined over N homogeneous query vectors.

Similarity Function (v1.3+)

The similarity() function enables threshold-based vector filtering -- filter results by similarity score rather than just finding nearest neighbors.

Syntax

similarity(field, vector_expr) <operator> threshold
ParameterDescription
fieldThe vector field name (e.g., vector, embedding)
vector_exprA parameter ($v) or literal vector ([0.1, 0.2, ...])
operatorComparison: >, >=, <, <=, =
thresholdSimilarity score (0.0 to 1.0 for cosine/dot)

Examples

-- Find documents with similarity > 0.8
SELECT * FROM docs WHERE similarity(vector, $query) > 0.8

-- High precision filtering (>= 0.9)
SELECT * FROM docs WHERE similarity(embedding, $v) >= 0.9 LIMIT 10

-- Exclude very similar documents (deduplication)
SELECT * FROM docs WHERE similarity(vector, $ref) < 0.95

-- Combined with metadata filters
SELECT * FROM docs
WHERE similarity(vector, $q) > 0.7 AND category = 'technology'
LIMIT 20

NEAR vs similarity()

FeatureNEARsimilarity()
PurposeFind K nearest neighborsFilter by score threshold
ReturnsTop-K resultsAll matching results
ControlLIMIT NThreshold value
Best for"Find similar""Filter by quality"
-- NEAR: "Give me 10 most similar docs"
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10

-- similarity(): "Give me docs with similarity > 0.8"
SELECT * FROM docs WHERE similarity(vector, $v) > 0.8

Temporal Functions (v2.1+)

VelesQL supports temporal expressions for date/time filtering using NOW() and INTERVAL.

NOW() Function

Returns the current Unix timestamp (seconds since epoch):

SELECT * FROM events WHERE timestamp > NOW()
SELECT * FROM items WHERE created_at > NOW()

INTERVAL Expression

Defines a time duration with the syntax INTERVAL '<magnitude> <unit>':

UnitAliasesSeconds
secondss, sec, second1
minutesm, min, minute60
hoursh, hour3,600
daysd, day86,400
weeksw, week604,800
monthsmonth~2,592,000 (30 days)

Temporal Arithmetic

Combine NOW() with INTERVAL using + or -:

-- Last 7 days
SELECT * FROM logs WHERE created_at > NOW() - INTERVAL '7 days'

-- Last hour
SELECT * FROM events WHERE timestamp > NOW() - INTERVAL '1 hour'

-- Next week (future events)
SELECT * FROM tasks WHERE due_date < NOW() + INTERVAL '7 days'

-- Shorthand units
SELECT * FROM metrics WHERE ts > NOW() - INTERVAL '30 min'

-- Filter items older than 30 days
SELECT * FROM logs WHERE created_at < NOW() - INTERVAL '30 days'

Common Temporal Patterns

Use CaseQuery
Last 24 hoursWHERE ts > NOW() - INTERVAL '24 hours'
This weekWHERE ts > NOW() - INTERVAL '7 days'
Last monthWHERE ts > NOW() - INTERVAL '1 month'
Recent activityWHERE last_seen > NOW() - INTERVAL '5 minutes'

Note: NOW() returns a Unix timestamp in seconds (timezone-agnostic). Month intervals are approximated as 2,592,000 seconds (30 days).

Scalar Subqueries (v3.2+)

A scalar subquery in WHERE/HAVING (or an INSERT/UPDATE value) is executed and substituted as a literal before the outer query runs (EPIC-039). It compares against a value computed from another (or the same) collection, and must return exactly one row and one column.

-- Executes: the subquery resolves to AVG(amount), then the outer filter runs.
SELECT * FROM orders
WHERE amount > (SELECT AVG(amount) FROM orders)

A subquery is enclosed in parentheses and contains a full SELECT statement (with optional WHERE, GROUP BY, HAVING, LIMIT). It runs through the core query engine, so the same query executes identically on the REST /query endpoint and the CLI REPL (both route through Database::execute_query).

Cardinality contract:

Subquery resultBehavior
Exactly one row, one columnresolves to that scalar value
0 rowsresolves to NULL (a comparison against NULL is never true, so the outer query returns no matching rows)
> 1 rowerror VELES-010 (Query): returned N rows but must return at most one row
> 1 column (e.g. SELECT *, multiple columns)error VELES-010 (Query): must select exactly one column

Supported forms:

  • Aggregate subqueries — (SELECT AVG(amount) FROM t), COUNT(*), MIN/MAX/SUM.
  • Single-column row projection — (SELECT amount FROM t WHERE id = 1).
  • WHERE, HAVING, BETWEEN, IN (...), CONTAINS (...) value positions.
  • INSERT ... VALUES (..., (SELECT MAX(x) FROM t)) and UPDATE ... SET c = (SELECT ...).
  • UPDATE ... WHERE and DELETE ... WHERE predicates — the subquery is resolved to a literal before the mutation runs, so UPDATE t SET flagged = true WHERE amount > (SELECT AVG(amount) FROM t) updates exactly the matching rows, and DELETE FROM t WHERE id = (SELECT … ) deletes the resolved id (DELETE … WHERE id still requires the resolved value to be an integer).
  • A subquery whose inner WHERE filters on a payload path — e.g. (SELECT AVG(price) FROM t WHERE meta.cat = 5) — is a plain payload filter on the subquery's own collection, not a correlation, so it executes normally.
  • Nesting up to 8 levels deep (a deeper nest errors with VELES-010).

Not yet supported (rejected at validation with V010):

  • Correlated subqueries — a subquery whose inner WHERE references one of the outer query's tables/aliases (e.g. with the outer query ... FROM orders ..., the inner (SELECT AVG(x) FROM s WHERE orders.id = 5)). A dotted reference is only correlated when its prefix names an outer table/alias; a payload path such as meta.cat is not. Rewrite without the outer reference.

Surface note. Scalar subqueries are resolved in velesdb-core, so the REST /query endpoint and the CLI REPL support them. WASM uses a separate executor that does not yet resolve subqueries and still rejects them.

Graph Match Predicate in WHERE

Embed a graph pattern inside a WHERE clause using the MATCH keyword:

SELECT * FROM docs
WHERE category = 'tech' AND MATCH (d:Doc)-[:REL]->(x)
LIMIT 10

This filters rows that participate in the specified graph pattern, combined with any other scalar conditions.

Anchor rule (V011) — explicit and implicit. The first node of the pattern (the anchor) must carry an alias. The pattern binds to the FROM rows:

  • Explicitly — when the anchor alias is one of the aliases declared by FROM or a JOIN, the pattern is anchored on the rows of that table.
  • Implicitly — when no alias in the pattern matches a declared alias, the leftmost node binds to the FROM rows. Three guards apply:
    • G1: if a pattern alias does match a declared alias, the anchor must be that alias (catches inverted pattern directions);
    • G2: the implicit anchor alias must not appear in another MATCH predicate of the same WHERE clause — chain into a single pattern instead, e.g. MATCH (m)-[:R]->(f)-[:S]->(g);
    • G3: the anchor node must not carry a @collection override (it would resolve outside the FROM collection).
  • When FROM has no alias, any anchor alias is accepted.

NOT MATCH applies the same rule uniformly (exact dual of the positive case).

Note. Implicit binding is not Cypher's existential MATCH semantics: the leftmost node always binds to the candidate row. MATCH (ctx)-[:R]->(f) keeps the rows that have an outgoing R edge — it does not test whether the pattern exists anywhere in the graph.

Violations are rejected at validation time with error code V011 (MATCH predicate anchor must be an alias declared in FROM/JOIN):

-- OK: anchor 'd' matches the FROM alias (explicit)
SELECT * FROM docs AS d
WHERE category = 'tech' AND MATCH (d)-[:REL]->(x)
LIMIT 10

-- OK: no pattern alias matches 'd' — 'ctx' binds implicitly to the docs rows
SELECT * FROM docs AS d
WHERE category = 'tech' AND MATCH (ctx)-[:REL]->(x)
LIMIT 10

-- Rejected with V011 (G1): the FROM alias 'd' is in a non-anchor position;
-- anchor the pattern on it instead, e.g. MATCH (d)-[:REL]->(w)
SELECT * FROM docs AS d
WHERE category = 'tech' AND MATCH (w)-[:REL]->(d)
LIMIT 10

Execution note (GraphFirst anchored fetch). When every MATCH (...) predicate is AND-required by the WHERE clause (not wrapped in OR/NOT), the engine evaluates the graph patterns FIRST and fetches within their anchor sets — retrieval is then exhaustive: a matching row is returned no matter how it ranks globally. This covers vector NEAR (anchor sets up to 10 000 ids are scored exactly; larger ones go through the bitmap-filtered HNSW path), metadata-only fetches (the anchors are hydrated directly), sparse-only fetches (anchors feed the sparse index's per-id filter), and NOT similarity() scans (restricted to the anchors).

Execution note (over-fetch window, residual shapes). Query shapes that cannot use the anchored fetch keep the windowed execution: graph predicates under OR/NOT, combinations with a similarity() threshold cascade, BM25 text MATCH fusion, or hybrid dense+sparse fusion. There, a ranked fetch retrieves up to max(LIMIT, min(LIMIT × 10, 10 000)) candidates and keeps those that satisfy the pattern (rows ranked beyond the window are not returned — increase LIMIT to widen it); unranked shapes scan up to 100 000 candidates in storage order.

Vector Search with Filters

Combine vector search with any number of metadata filters:

SELECT * FROM docs
WHERE vector NEAR $v
  AND category = 'tech'
  AND price > 50
  AND status IS NOT NULL
  AND tags IN ('ai', 'ml')
LIMIT 10

GROUP BY Clause (v2.0+)

Group results by one or more columns and apply aggregate functions.

-- Count by category
SELECT category, COUNT(*) FROM products GROUP BY category

-- Multiple aggregates
SELECT category, COUNT(*), AVG(price), MAX(rating)
FROM products
GROUP BY category

-- Nested payload fields
SELECT payload.author, payload.metadata.language, COUNT(*)
FROM articles
GROUP BY payload.author, payload.metadata.language

-- With vector similarity threshold
SELECT category, COUNT(*)
FROM documents
WHERE similarity(embedding, $query) > 0.6
GROUP BY category
ORDER BY COUNT(*) DESC
LIMIT 10

Vector-Search GROUP BY (Parent-Document Retrieval, v3.7+)

When combined with a vector NEAR search, GROUP BY enables parent-document retrieval from chunked collections. This groups search results by a parent field and aggregates similarity scores across chunks.

Use MAX(score) for ColBERT-style MaxSim (max score across chunks), AVG(score) for mean similarity, and FIRST(column) to surface the excerpt from the highest-scoring chunk.

-- Parent-document retrieval with MAX_SIM scoring
SELECT parent_id, MAX(score) AS relevance, FIRST(text) AS excerpt
FROM chunks
WHERE vector NEAR $embedding
GROUP BY parent_id
ORDER BY relevance DESC
LIMIT 20

-- AVG_SIM scoring (mean across chunks)
SELECT document_id, AVG(score) AS avg_sim
FROM paragraphs
WHERE vector NEAR $query
GROUP BY document_id
ORDER BY avg_sim DESC
LIMIT 10

-- Combined with metadata filter
SELECT parent_id, MAX(score) AS relevance, FIRST(text) AS excerpt
FROM chunks
WHERE vector NEAR $v AND category = 'science'
GROUP BY parent_id
ORDER BY relevance DESC
LIMIT 10
FunctionDescriptionRequires
MAX(score)Maximum similarity score across chunks in groupNEAR + GROUP BY
AVG(score)Mean similarity score across chunks in groupNEAR + GROUP BY
FIRST(col)Value of col from the highest-scoring chunkGROUP BY

Behavior:

  • Chunks missing the GROUP BY field are silently skipped
  • FIRST(col) returns null if the best chunk lacks the specified column
  • Multiple FIRST projections all come from the same highest-scoring chunk
  • FIRST(*) is not supported (parse error)
  • MAX(score) / AVG(score) without NEAR returns an error
  • FIRST(col) without GROUP BY returns an error

HAVING Clause (v2.0+)

Filter groups after aggregation. HAVING conditions compare aggregate functions against values using AND/OR.

-- Single condition
SELECT category, COUNT(*) FROM docs
GROUP BY category
HAVING COUNT(*) > 10

-- Multiple conditions
SELECT category, COUNT(*) FROM items
GROUP BY category
HAVING COUNT(*) > 10 AND AVG(price) < 100

-- OR logic
SELECT region, SUM(amount) FROM sales
GROUP BY region
HAVING SUM(amount) > 1000 OR COUNT(*) > 50

-- Full pipeline: GROUP BY + HAVING + ORDER BY
SELECT category, COUNT(*) AS total, AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING COUNT(*) > 5
ORDER BY AVG(price) DESC

Window Functions (v1.13.0+)

Window functions compute a value for each row based on a "window" of related rows, without collapsing the result set the way GROUP BY does. VelesDB supports three ranking functions over a PARTITION BY / ORDER BY window.

Syntax

<function>() OVER ([PARTITION BY <expr> [, <expr> ...]] [ORDER BY <expr> [ASC|DESC] [, <expr> ...]])

Place the window call in the SELECT list, optionally with an alias.

Supported Functions

FunctionDescription
ROW_NUMBER()Sequential position within the window (1, 2, 3, ...). Ties are broken by input order.
RANK()Rank with gaps after ties. Tied rows share a rank; the next distinct row skips ahead (e.g. 1, 2, 2, 4).
DENSE_RANK()Rank without gaps (e.g. 1, 2, 2, 3).

All three functions take no arguments; the OVER (...) clause is mandatory.

Semantics

  • PARTITION BY (optional) groups rows into independent windows. Each partition restarts the row counter / rank at 1.
  • ORDER BY (optional inside OVER) defines the ordering within each partition. Without an ORDER BY, ranks are assigned in the order rows arrive at the evaluator — deterministic for a given input but not meaningful.
  • Stability: when two rows tie on the ORDER BY key, relative input order is preserved (stable ordering).

Evaluation Order

Window functions are evaluated after WHERE, GROUP BY, HAVING, and DISTINCT, and before the outer ORDER BY / LIMIT / OFFSET. Each window call pre-captures its inputs and emits one output value per input row; the outer ORDER BY then sees those values as ordinary projected columns.

Grammar: grammar.pest:window_item, grammar.pest:over_clause. Evaluator: crates/velesdb-core/src/velesql/window_evaluator.rs.

Examples

-- Rank products by price within each category
SELECT
  category,
  name,
  price,
  ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS rank
FROM products

Expected output (shape, not literal rows):

category   name             price   rank
tech       laptop-pro       1999    1
tech       laptop-mid       999     2
tech       laptop-entry     499     3
books      encyclopedia     120     1
books      paperback        12      2
-- DENSE_RANK on similarity scores inside each department
SELECT
  payload.department AS dept,
  payload.title,
  similarity() AS score,
  DENSE_RANK() OVER (PARTITION BY payload.department ORDER BY similarity() DESC) AS r
FROM docs
WHERE vector NEAR $query
LIMIT 100
-- Global row numbering, no partition
SELECT
  id,
  payload.title,
  ROW_NUMBER() OVER (ORDER BY payload.created_at DESC) AS seq
FROM posts
LIMIT 50

Limitations

  • Framing clauses (ROWS BETWEEN ... AND ..., RANGE, GROUPS) are not supported. All three functions operate on the full partition.
  • Aggregate windows (SUM() OVER, AVG() OVER, LAG, LEAD, NTILE, PERCENT_RANK, CUME_DIST, FIRST_VALUE, LAST_VALUE) are not yet supported.
  • A single query may contain multiple window calls, each with its own OVER (...) clause.

ORDER BY Clause (v2.0+)

Sort results by one or more expressions, each with an optional direction.

Direction

KeywordDescription
ASCAscending (default)
DESCDescending

Basic Sorting

SELECT * FROM docs ORDER BY created_at DESC
SELECT * FROM docs ORDER BY category ASC, price DESC

Order by Similarity

-- Highest similarity first
SELECT * FROM docs
WHERE vector NEAR $v
ORDER BY similarity() DESC
LIMIT 10

-- Two-argument similarity
SELECT * FROM docs
WHERE similarity(embedding, $query) > 0.5
ORDER BY similarity(embedding, $query) DESC
LIMIT 10

-- Multi-column with similarity
SELECT * FROM docs
WHERE vector NEAR $v
ORDER BY similarity() DESC, created_at DESC
LIMIT 20

CBO routing (v1.13.0+): When ORDER BY contains similarity(<col>, $q) (two-argument form) against a vector column with an HNSW index, the cost-based optimiser (CBO) automatically routes the query through the native HNSW path — even if the query does not contain an explicit NEAR clause. The planner rewrites the sort into a top-k vector search and short-circuits when a LIMIT is present. The legacy full-scan-then-sort path remains available and is chosen by the CBO when the top-k target is large relative to the collection size or when no HNSW index exists for the column. See crates/velesdb-core/src/velesql/planner/similarity_route.rs and docs/guides/TUNING_GUIDE.md for the routing heuristics.

Order by Aggregate

SELECT category, COUNT(*) AS cnt
FROM products
GROUP BY category
ORDER BY COUNT(*) DESC
LIMIT 10

Arithmetic Scoring (v3.0+)

Combine multiple score components with arithmetic expressions in ORDER BY:

-- Weighted hybrid scoring: 70% vector + 30% text relevance
SELECT * FROM docs
WHERE vector NEAR $v AND content MATCH 'machine learning'
ORDER BY 0.7 * vector_score + 0.3 * bm25_score DESC
LIMIT 10

-- Average of two scores
SELECT * FROM docs
WHERE vector NEAR $v
ORDER BY (vector_score + graph_score) / 2 DESC
LIMIT 10

-- Boost similarity
SELECT * FROM docs
WHERE vector NEAR $v
ORDER BY similarity() * 2 + 1 ASC
LIMIT 10

-- Hybrid: vector + BM25 + graph
SELECT * FROM docs
WHERE vector NEAR $v AND content MATCH 'AI'
ORDER BY 0.5 * similarity() + 0.5 * bm25_score DESC
LIMIT 10

Arithmetic expressions support +, -, *, /, parentheses, numeric literals, score variables, and similarity().

Score Variables

VariableSourceWhen Populated
vector_scoreHNSW dense searchNEAR clause present
bm25_scoreBM25 full-text searchMATCH text clause present
sparse_scoreSparse vector searchSPARSE_NEAR clause present
graph_scoreGraph traversalMATCH graph pattern present
fused_scoreAfter fusion pipelineUSING FUSION applied
similarity()Primary search scoreAny search clause

Component scores are populated independently by each search path. In hybrid queries (NEAR + MATCH text), vector_score and bm25_score have different values reflecting each component's individual contribution before fusion.

Query Typevector_scorebm25_scoresparse_scoregraph_score
NEAR onlyPopulated000
MATCH text only0Populated00
NEAR + MATCHPopulatedPopulated00
SPARSE_NEAR00Populated0
MATCH (graph)000Populated

When a score variable is not populated, it defaults to 0. similarity() always returns the primary search score regardless of type. fused_score is populated after USING FUSION is applied.


LIMIT and OFFSET

-- Limit results
SELECT * FROM docs LIMIT 10

-- Pagination: skip first 20, return next 10
SELECT * FROM docs LIMIT 10 OFFSET 20

Default LIMIT. Unlike standard SQL, VelesQL applies LIMIT 10 by default to every SELECT statement that has no explicit LIMIT clause — vector NEAR, sparse, scalar-filter, and hybrid forms alike. VelesQL is ANN-first: a SELECT is a top-k retrieval, so an unbounded default would defeat top-k vector search. EXPLAIN surfaces the implicit limit as Limit: 10 (default).

Exceptions (no implicit LIMIT 10):

  • MATCH ... RETURN graph queries return all matching rows, bounded only by the server-wide ceiling of 100 000 rows per query;
  • compound queries (UNION / INTERSECT / EXCEPT) evaluate their operands up to the same 100 000-row ceiling; only an explicit outer LIMIT caps the merged result.

Always specify LIMIT explicitly — both to bound vector search and for exhaustive retrieval beyond the default 10 rows.


LET Clause (Score Bindings, v3.2+)

Define named score bindings evaluated once and reusable in SELECT and ORDER BY. LET clauses appear before the SELECT statement.

Syntax

LET <name> = <arithmetic_expression>

Rules

  • LET clauses appear before the SELECT statement.
  • Each binding has a name and an arithmetic expression.
  • Bindings can reference earlier bindings (forward references are invalid).
  • Available in ORDER BY and SELECT projection.
  • NOT available in WHERE (WHERE runs before scores exist).
  • LET names take highest priority in variable resolution (overrides component scores).
  • Case-insensitive keyword (LET, let, Let all work).

Unsupported query shapes

LET bindings are evaluated in the SELECT finalization stage. Query shapes that take a dedicated early-return execution path bypass that stage, so the engine rejects (rather than silently dropping) a LET clause on any of these:

ShapeTriggerError
Graph patternMATCH (...) RETURN ...LET bindings are not supported with MATCH queries in this version
Sparse vectorWHERE vector SPARSE_NEAR ...LET bindings are not supported with SPARSE_NEAR queries in this version
Multi-vector fusionWHERE vector NEAR_FUSED ...LET bindings are not supported with NEAR_FUSED queries in this version
Negated similarityWHERE NOT similarity(...) ...LET bindings are not supported with NOT similarity() queries in this version
OR / unionWHERE similarity(...) OR ...LET bindings are not supported with OR/union queries in this version

LET is supported on dense NEAR, text MATCH '...', hybrid NEAR + MATCH text, and scalar-filter SELECTs.

Ordering by a built-in score on the excluded shapes

These shapes reject only the LET ... = ... clause, but ORDER BY still runs in their own finalization. A bare built-in score variable in ORDER BY (vector_score, bm25_score, sparse_score, graph_score, fused_score) ranks by that component score, resolved from the result's component breakdown — identical to the arithmetic form. (An absent component defaults to 0 on a tagged result, per the score-variable rules above.)

-- Ranks by sparse score (bare and arithmetic forms are equivalent):
SELECT * FROM docs
WHERE vector SPARSE_NEAR $sparse
ORDER BY sparse_score DESC
LIMIT 5

Expression Support

LET expressions support the same arithmetic as ORDER BY:

LET boost = 1.5
LET weighted = 0.7 * vector_score + 0.3 * bm25_score
LET final_score = weighted * boost
LET sim = similarity()

Available variables: vector_score, bm25_score, graph_score, sparse_score, fused_score, similarity(), numeric literals, and earlier LET bindings.

Examples

-- RAG scoring: blend vector similarity with text relevance
-- (the evaluated binding 'relevance' is injected into each row's payload;
--  note that `SELECT *, expr` is not valid syntax — use `*` or an explicit list)
LET relevance = 0.7 * vector_score + 0.3 * bm25_score
SELECT *
FROM documents
WHERE vector NEAR $query AND content MATCH 'machine learning'
ORDER BY relevance DESC
LIMIT 10

-- Chained bindings: later bindings can reference earlier ones
LET base = 0.6 * vector_score + 0.4 * bm25_score
LET boosted = base * 1.5
SELECT * FROM docs
WHERE vector NEAR $query AND content MATCH 'AI'
ORDER BY boosted DESC
LIMIT 5

-- A LET clause on a MATCH graph query is REJECTED (see "Unsupported query
-- shapes" above). MATCH runs on a dedicated traversal path that bypasses
-- LET evaluation, so the engine returns an explicit error rather than
-- silently discarding the binding:
--   LET x = similarity() MATCH (a)-[r]->(b) RETURN a LIMIT 5
--   -> Error: LET bindings are not supported with MATCH queries in this version
-- Rank graph rows with RETURN ... ORDER BY instead.

-- Sparse vector queries also reject LET, but `ORDER BY` runs: rank by the
-- built-in sparse_score variable directly (the bare form resolves the component
-- score, just like the arithmetic form `ORDER BY sparse_score * 1.0 DESC`):
SELECT * FROM docs
WHERE vector SPARSE_NEAR $sparse
ORDER BY sparse_score DESC
LIMIT 5

WITH Clause (Search Options)

Control search behavior with per-query configuration overrides.

SELECT * FROM docs WHERE vector NEAR $v LIMIT 10
WITH (mode = 'accurate', ef_search = 512, timeout_ms = 5000)

Available Options

OptionTypeValuesDescription
modestringfast, balanced, accurate, perfect, autotuneSearch quality preset (maps to ef_search: 64/128/512/4096/auto)
qualitystringsame as modeAlias for mode (v3.5+). If both are set, mode takes precedence.
ef_searchinteger16--4096HNSW ef_search parameter (overrides mode)
timeout_msinteger>= 100Per-query timeout in milliseconds
rerankbooleantrue/falseTwo-stage SIMD reranking (retrieves 4x candidates, re-ranks with exact distance)
quantizationstringf32, int8, dual, autoQuantization mode for search
oversamplingfloat>= 1.0Oversampling ratio for dual-precision mode
max_groups (alias group_limit)integer1 .. 1,000,000GROUP BY group budget. Lowers the default (10,000); clamped down to the server ceiling of 1,000,000 — cannot raise it. See GROUP BY.

Query guard-rails. Two hard limits protect the server from adversarial queries. (1) A query is rejected before parsing if its length exceeds the configured max_query_length or its bracket/NOT nesting depth exceeds 64 (prevents a parser stack-overflow DoS). (2) WITH (max_groups = N) for GROUP BY is clamped to a server-side ceiling of 1,000,000 groups; the query can lower its budget but never raise the server memory ceiling.

Examples

-- Fast search for autocomplete
SELECT * FROM suggestions WHERE vector NEAR $v LIMIT 5 WITH (mode = 'fast')

-- High accuracy for production
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10 WITH (mode = 'accurate')

-- Custom ef_search
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10 WITH (ef_search = 512)

-- Combined options
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10
WITH (mode = 'balanced', ef_search = 256, rerank = true)

-- Dual quantization with oversampling
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10
WITH (quantization = 'dual', oversampling = 2)

-- quality alias for mode (v3.5+)
SELECT * FROM docs WHERE vector NEAR $v LIMIT 10 WITH (quality = 'accurate')

USING FUSION -- Hybrid Search (v2.0+)

Combine multiple search strategies with result fusion.

Syntax

SELECT * FROM docs
WHERE vector NEAR $v AND content MATCH 'query'
USING FUSION(strategy = 'rrf', k = 60)

Fusion Strategies

StrategyDescriptionParametersUse Case
rrfReciprocal Rank Fusionk (default: 60)Balanced ranking (default)
weightedWeighted combinationvector+text: vector_weight, graph_weight; dense+sparse: dense_weight, sparse_weightCustom importance
maximumTake highest score(none)Best match wins
rsfRelative Score Fusiondense_weight, sparse_weight (must sum to 1.0)Dense + sparse blending

USING FUSION requires at least two fusable branches (e.g. vector NEAR

  • MATCH, or vector NEAR + vector SPARSE_NEAR) or a single NEAR_FUSED predicate. Applied to a single-branch query it is rejected at validation with error code V012 (FusionMisconfigured). The same code covers RSF weights that do not sum to 1.0, negative weights, and weighted/rsf on a NEAR_FUSED predicate. dense_w/sparse_w are accepted as short aliases of dense_weight/sparse_weight. Unknown strategy names and unknown option keys are rejected (no silent RRF fallback).

Examples

USING FUSION(...) is a trailing clause: it must come after LIMIT (and OFFSET), not before it.

-- Default RRF fusion
SELECT * FROM docs
WHERE vector NEAR $v AND content MATCH 'neural networks'
LIMIT 10 USING FUSION(strategy = 'rrf', k = 60)

-- Weighted fusion (70% vector, 30% graph/text)
SELECT * FROM docs
WHERE vector NEAR $semantic AND content MATCH 'machine learning'
LIMIT 20 USING FUSION(strategy = 'weighted', vector_weight = 0.7, graph_weight = 0.3)

-- Dense + sparse hybrid with RSF
SELECT * FROM docs
WHERE vector NEAR $dense AND vector SPARSE_NEAR $sparse
LIMIT 10 USING FUSION(strategy = 'rsf', dense_weight = 0.7, sparse_weight = 0.3)

-- Maximum score fusion (requires two fusable branches; USING FUSION on a
-- single-branch query — e.g. similarity()-only or pure NEAR — is rejected)
SELECT * FROM docs
WHERE vector NEAR $q AND content MATCH 'transformers'
LIMIT 10 USING FUSION(strategy = 'maximum')

FUSE BY (Planned Syntax)

PLANNED: FUSE BY is not yet implemented in the grammar. Use USING FUSION(...) instead for all hybrid search queries.


Set Operations (v2.0+)

Combine results from multiple SELECT queries. N-ary chaining is supported.

Precedence note: VelesQL evaluates set operators strictly left-to-right, unlike standard SQL where INTERSECT binds tighter than UNION. Use parenthesized subqueries if different evaluation order is needed.

UNION

Combine results, removing duplicates:

SELECT id, title FROM articles WHERE category = 'tech'
UNION
SELECT id, title FROM articles WHERE category = 'science'

UNION ALL

Combine results, keeping duplicates:

SELECT * FROM table1 WHERE similarity(v, $q) > 0.8
UNION ALL
SELECT * FROM table2 WHERE similarity(v, $q) > 0.8

INTERSECT

Only rows present in both queries:

SELECT item_id FROM user_likes WHERE user_id = 1
INTERSECT
SELECT item_id FROM user_likes WHERE user_id = 2

EXCEPT

Rows in first query but not in second:

SELECT id FROM all_items
EXCEPT
SELECT id FROM purchased_items

Three-Way Set Operations

-- Evaluated left-to-right: (A UNION B) UNION C
SELECT id FROM recent_docs WHERE created_at > 1000
UNION
SELECT id FROM popular_docs WHERE views > 100
UNION
SELECT id FROM featured_docs WHERE featured = TRUE

-- Mixed operators: (A UNION B) INTERSECT C
SELECT * FROM a UNION SELECT * FROM b INTERSECT SELECT * FROM c

MATCH Statement (Graph Queries, v2.1+)

Execute graph pattern matching queries with optional vector similarity filtering.

Syntax

MATCH <pattern>
[WHERE <conditions>]
RETURN <projection>
[ORDER BY <expression>]
[LIMIT <n>]

Pattern Syntax

Node Patterns

A node is enclosed in parentheses with optional alias, labels, and properties:

(alias)                          -- Any node
(alias:Label)                    -- Node with label
(alias:Label1:Label2)            -- Node with multiple labels
(alias:Label {prop: value})      -- Node with property filter
(:Label)                         -- Anonymous node with label
()                               -- Anonymous node (any)
(alias:Label@collection)         -- Node resolved from a specific collection (cross-collection)

Cross-Collection Annotation (@collection)

The @collection suffix on a node pattern specifies which collection the node's data should be resolved from. This enables cross-collection MATCH queries where the graph traversal runs on one collection (the primary/edge collection) and node payloads are enriched from other collections.

-- Products in 'catalog' graph, enriched with pricing from 'inventory' collection
MATCH (p:Product)-[:STORED_IN]->(w:Warehouse@inventory)
RETURN p.name, w.price, w.stock
LIMIT 20

When using @collection:

  • The graph traversal (edges) runs on the primary collection specified via _collection param or FROM clause
  • Node payloads are looked up from the annotated collection after traversal
  • Enriched fields are prefixed with the node alias (e.g., w.price, w.stock)
  • If the annotated collection doesn't exist, the enrichment is silently skipped

Relationship Patterns

Relationships connect nodes with direction and optional type/range:

-[r:TYPE]->                      -- Outgoing, named
<-[r:TYPE]-                      -- Incoming, named
-[r:TYPE]-                       -- Undirected, named
-[:TYPE]->                       -- Outgoing, anonymous
-[r:TYPE1|TYPE2]->               -- Multiple relationship types
-[*1..3]->                       -- Variable-length (1 to 3 hops)
-[r:TYPE *2..5]->                -- Named with type and range

Range Specification

SyntaxMeaningExecutable
*1..3Between 1 and 3 hopsYes
*..5Up to 5 hops (lower bound defaults to 1)Yes
*3Exactly 3 hopsYes
*2..At least 2 hops, open-endedParses, rejected at validation
*Any number of hopsParses, rejected at validation

Hop bound. The maximum hop count of any range is capped by the validator at DEFAULT_MAX_GRAPH_EXPANSION = 32 (configurable via ValidationConfig::max_graph_expansion). Open-ended forms (*, *2..) map to an unbounded upper limit and are therefore always rejected under the default configuration with a Graph expansion exceeded: max=32 complexity error (surfaced directly by Parser::parse) — use an explicit bounded range such as *1..32 instead.

Variable-length alias semantics (openCypher lists). A relationship alias on a ranged pattern (-[r:TYPE*1..3]->) binds the ordered list of traversed relationships, not a single edge:

  • RETURN r projects the ordered edge-id list (e.g. [100, 101]).
  • RETURN r.prop projects the positional list of per-edge values (missing properties yield null), like openCypher's [rel IN r | rel.prop].
  • WHERE r.prop = x uses ANY-element semantics: the path matches when at least one traversed edge satisfies the condition (openCypher's any(rel IN r WHERE rel.prop = x)).
  • Distinct edge paths to the same target are distinct result rows, and parallel edges between the same node pair yield one row per aliased edge. Anonymous relationships (-[:TYPE]->) keep the collapsed cardinality — bind an alias when parallel edges must be distinguished (a known divergence from openCypher, which always counts one row per relationship).

RETURN Clause

Project fields from matched nodes and relationships:

RETURN a.name, b.title             -- Property access
RETURN a.name AS author            -- With alias
RETURN *                           -- All bound variables
RETURN similarity()                -- Pre-computed similarity score
RETURN a.name, similarity() AS score

Full Examples

-- Simple graph traversal
MATCH (a:Person)-[:KNOWS]->(b:Person)
RETURN a.name, b.name

-- With similarity filter (RAG use case)
MATCH (doc:Document)
WHERE similarity(doc.embedding, $query) > 0.8
RETURN doc.title, doc.content
ORDER BY similarity() DESC
LIMIT 5

-- Multi-hop traversal
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company)
WHERE a.department = 'engineering'
RETURN a.name, b.name, c.name
LIMIT 20

-- Variable-length paths
MATCH (start:Document)-[*1..3]->(end:Document)
WHERE start.topic = 'AI'
RETURN start.title, end.title

-- Bidirectional relationship
MATCH (a:Person)-[:COLLABORATES]-(b:Person)
RETURN a.name, b.name

-- Multiple relationship types
MATCH (p:Person)-[r:KNOWS|WORKS_WITH]->(q:Person)
RETURN p.name, q.name
LIMIT 10

-- Combined graph + vector + column (AIOps)
MATCH (incident:Ticket)-[:IMPACTS]->(service:Microservice)
WHERE similarity(incident.log_embedding, $error_vec) > 0.85
  AND incident.status = 'RESOLVED'
  AND service.criticality = 'HIGH'
RETURN incident.solution, service.name
ORDER BY similarity() DESC
LIMIT 3

Scope and Requirements

Collection Scope: MATCH patterns operate within the current collection's internal graph store. They do NOT traverse across different collections.

Label Requirement: Points must have a _labels array in their payload:

{ "_labels": ["Product"], "name": "Headphones", "price": 99.99 }

Edge Requirement: Edges must exist in the collection's edge store. Create them via collection.add_edge() (Rust/Python) or the REST API.

Execution Strategies

The query planner automatically chooses the optimal strategy:

StrategyWhen UsedDescription
GraphFirstNo similarity conditionTraverse graph, then filter
VectorFirstSimilarity on start nodeVector search, then validate graph
ParallelLarge collection, high thresholdExecute both in parallel

REST API

POST /collections/{name}/match
Content-Type: application/json

{
  "query": "MATCH (a:Person)-[:KNOWS]->(b) WHERE similarity(a.vec, $v) > 0.8 RETURN a.name",
  "params": { "v": [0.1, 0.2, ...] }
}

Introspection Statements (v3.4+)

SHOW COLLECTIONS

Lists all collections in the database with their names and types.

-- List all collections
SHOW COLLECTIONS

-- With trailing semicolon
SHOW COLLECTIONS;

Returns one row per collection, each containing:

FieldTypeDescription
namestringCollection name
typestringCollection type: vector, graph, or metadata

DESCRIBE COLLECTION

Returns metadata about a specific collection. The COLLECTION keyword is optional.

-- With COLLECTION keyword
DESCRIBE COLLECTION docs

-- Without COLLECTION keyword (equivalent)
DESCRIBE docs

Returns a single row with:

FieldTypePresent When
namestringAlways
typestringAlways (vector, graph, or metadata)
dimensionintegerVector collections
metricstringVector collections
point_countintegerAll types

Returns an error if the collection does not exist.

EXPLAIN

Returns the query execution plan for a SELECT query without executing it.

-- Explain a simple query
EXPLAIN SELECT * FROM docs LIMIT 10

-- Explain a complex query
EXPLAIN SELECT * FROM docs WHERE vector NEAR $v LIMIT 10

Phase 2A enhancements (v1.13.0)

EXPLAIN output is now stat-aware: Filter nodes report a cardinality estimate derived from column histograms (ANALYZE output) rather than a hard-coded fallback, and join / sort cost factors are calibrated against CollectionStats::calibrated_cost_factors. As a result, the estimated_cost field returned in the plan tracks real execution time closely enough for the CBO to decide between HNSW routing, bitmap pre-filter, and full scan on the same query shape. Plan-cost consistency fixes (#607 / #608 / #609) ensure a single query produces the same plan whether invoked via the REST /query/explain endpoint, the CLI .explain command, or EXPLAIN ANALYZE.

When ORDER BY similarity(<col>, $q) is present without NEAR, the plan tree now explicitly surfaces the similarity-as-predicate routing decision (see Order by Similarity above) — the VectorSearch node appears even though the SQL text has no NEAR.

v3.2.0 — REST plan single-sourced from core

The REST /query/explain step list is now produced directly from the engine plan via QueryPlan::to_plan_steps(), the same PlanNode tree the CLI .explain renders — it is no longer reconstructed from the parsed AST. The wire operation vocabulary is unchanged (additive): a full scan still reports FullScan and joins still report {Type}Join. As a side effect, GROUP BY, aggregation, ORDER BY, and JOIN steps now also appear in the CLI/Python to_tree() output (previously only the REST view surfaced them), and an indexed-equality predicate may surface an IndexLookup step.

Returns a single row with:

FieldTypeDescription
planobjectStructured query plan (JSON)
treestringHuman-readable plan tree

The tree field now surfaces additional context when the query uses WITH, LET, or FUSION clauses:

  • WITH optionsef_search, mode, rerank, timeout_ms are read from the WITH clause and displayed as-is (no longer hardcoded to 100 for ef_search).
  • LET bindings — each named binding is listed under a LET bindings: node.
  • FUSION details — strategy name, k, and per-source weights appear under a FUSION: node.

Example enriched output:

── VectorSearch (docs) ef_search=512
   ├── WITH options:
   │   ├── ef_search: 512
   │   ├── mode: accurate
   │   └── rerank: true
   ├── LET bindings:
   │   └── boost = vector_score * 1.5
   ├── FUSION: rrf (k=60, vector_weight=0.7, bm25_weight=0.3)
   ├── Filter (PreFilter)
   └── Limit 10

EXPLAIN ANALYZE (v3.8+)

Executes the query with lightweight instrumentation and returns both the estimated plan and actual execution statistics side-by-side. Unlike EXPLAIN (which only plans), EXPLAIN ANALYZE runs the query to collect real metrics.

Caution: EXPLAIN ANALYZE executes the query. Use with care on write queries (INSERT, UPDATE, DELETE) as they will modify data.

Invocation

EXPLAIN ANALYZE is not a parsed VelesQL statement — the grammar only accepts EXPLAIN <select-query>. ANALYZE mode is requested at the API level:

  • REST: POST /query/explain with "analyze": true in the request body.
  • Rust API: Database::explain_analyze_query(...) / Collection::explain_analyze_query(...).
POST /query/explain
{
  "query": "SELECT * FROM docs WHERE vector NEAR $v LIMIT 10",
  "params": { "v": [0.1, 0.2, 0.3] },
  "analyze": true
}

Return Fields

The result includes the estimated plan (identical to EXPLAIN) plus:

FieldTypeDescription
planobjectEstimated query plan (same as EXPLAIN)
actual_statsobjectTop-level actual execution statistics
node_statsarrayPer-plan-node actual statistics
cost_factorsobject | nullEffective cost factors used for this plan (operation-by-operation cost weights). null when EXPLAIN is called without ANALYZE.
calibration_sourcestring | nullWhere the cost model picked its values: "histogram" if the collection's ANALYZE histogram was used, "default" otherwise. null without ANALYZE.
feedback_calibrationobject | nullRuntime CBO calibration from the EMA feedback loop (v1.15.0, issue #469 Phase 2). Present only once the loop has observed ≥ 10 vector queries on this collection. See subsection below.

actual_stats fields:

FieldTypeDescription
actual_rowsu64Number of rows returned by execution
actual_time_msf64Wall-clock execution time in milliseconds
loopsu64Number of execution iterations (always 1)
nodes_visitedu64For MATCH queries, an approximate (best-effort, lower-bound) graph-traversal node count (start nodes examined + nodes reached by following edges); 0 for non-MATCH queries
edges_traversedu64For MATCH queries, an approximate (best-effort, lower-bound) count of edges followed during traversal; 0 for non-MATCH queries
traversal_counters_approximatebooltrue when nodes_visited / edges_traversed are strategy-dependent approximations (a lower bound), not exact measured counts; false for non-graph queries where both counters are 0.

Note: nodes_visited / edges_traversed are an approximate, best-effort lower bound on the graph traversal across all MATCH strategies — not exact figures. For GraphFirst they are the start nodes examined plus the edges/nodes reached; for the similarity-anchored VectorFirst strategy (a similarity() predicate on the start node) they are the candidate nodes evaluated plus the per-candidate existence-BFS edges/nodes — each BFS uses limit(1) and so undercounts the true frontier; the Parallel strategy sums both legs (a node touched by both is counted twice).

feedback_calibration fields (v1.15.0+, EXPLAIN ANALYZE only):

FieldTypeDescription
ms_per_cost_unitf64EMA-adjusted milliseconds per cost unit observed at runtime. Compares against the static default of 0.1 ms/unit. Values consistently below 0.1 indicate the cost model is over-estimating latency; above indicate under-estimation.
sample_countu64Number of vector query samples that produced this estimate. Low counts (< ~50) mean the EMA is still in warm-up; higher counts indicate a stable, deployment-specific calibration.

The block is omitted from the JSON (via skip_serializing_if) when the feedback loop has not yet warmed (< 10 samples for that collection) and on plain EXPLAIN (no execution → no runtime samples).

See crates/velesdb-core/src/velesql/explain/types.rs:160-180 for the canonical type definition and ExplainOutput::with_feedback_calibration for how the planner attaches it.

node_stats entry fields:

FieldTypeDescription
node_labelstringPlan node type (e.g. "VectorSearch", "Filter")
actual_time_msf64Estimated (not measured) wall-clock time for this node, derived by distributing the plan-global actual_time_ms across nodes via fixed weight fractions. Heuristic until per-node instrumentation lands (#467).
actual_rows_inu64Estimated (heuristic) rows entering this node, not a measured count.
actual_rows_outu64Estimated (heuristic) rows leaving this node, not a measured count.
loopsu64Loop iterations for this node
estimatedboolAlways true until instrumented timing lands (#467): the actual_* row/time values above are heuristic estimates, not real per-node measurements. The actual_ prefix is kept for API stability.

Filter plan fields (v3.9+):

When histogram data is available, Filter plan nodes include additional fields:

FieldTypeDescription
`estimated_rows$\text{u64}?\text{Histogram}-\text{based} \text{cardinality} \text{estimate} (\text{selectivity} \times \text{total_rows})
$estimation_method`string?"histogram", "cardinality", or "no histogram"

Examples

Queries to analyze are passed in the query field of /query/explain with "analyze": true (the EXPLAIN keyword itself must not be part of the string):

// Analyze a vector search query
{ "query": "SELECT * FROM docs WHERE vector NEAR $v LIMIT 10",
  "params": { "v": [0.1, 0.2] }, "analyze": true }

// Analyze a filtered query
{ "query": "SELECT * FROM products WHERE category = 'tech' AND price > 50 LIMIT 20",
  "analyze": true }

After ≥ 10 vector queries against the same collection, the EMA feedback loop (v1.15.0) attaches a feedback_calibration block:

{
  "plan": { ... },
  "actual_stats": {
    "actual_rows": 10,
    "actual_time_ms": 1.234,
    "loops": 1,
    "nodes_visited": 0,
    "edges_traversed": 0
  },
  "node_stats": [ ... ],
  "cost_factors": { ... },
  "calibration_source": "histogram",
  "feedback_calibration": {
    "ms_per_cost_unit": 0.087,
    "sample_count": 42
  }
}

Read the block as: "On this deployment, every CBO cost unit has cost 0.087 ms in practice (vs the 0.1 ms default) across the last 42 vector queries". A persistent gap between estimated and actual time (the warning below) usually shows up here as ms_per_cost_unit drifting significantly away from 0.1.

CLI Usage

The CLI REPL provides the .explain-analyze command:

velesdb> .explain-analyze SELECT * FROM docs WHERE vector NEAR [0.1, 0.2, 0.3] LIMIT 10

Query Plan:
└─ VectorSearch
   ├─ Collection: docs
   ├─ ef_search: 100
   └─ Candidates: 10

Estimated cost: 0.050ms

Actual Statistics:
  Actual rows:      10
  Actual time:      1.234ms
  Loops:            1
  Nodes visited:    0
  Edges traversed:  0

Per-Node Statistics:
  VectorSearch:  1.173ms (rows: 10 → 10)
  Limit:         0.012ms (rows: 10 → 10)
``$

\text{When} \text{the} \text{estimated} \text{cost} \text{diverges} \text{from} \text{actual} \text{time} \text{by} \text{more} \text{than} 10 \times , \text{a} $⚠`
warning marker is displayed to highlight potential cost model inaccuracies.

#### HTTP API

Send a POST to `/query/explain` with `"analyze": true` in the request body:

```json
{
  "query": "SELECT * FROM docs WHERE vector NEAR $v LIMIT 10",
  "params": { "v": [0.1, 0.2, 0.3] },
  "analyze": true
}

The response includes actual_stats and node_stats fields alongside the plan. When analyze is false or absent, the endpoint behaves identically to the existing EXPLAIN-only mode.


DDL Statements (v3.3+) and Admin Statements (v3.5+)

CREATE COLLECTION

Create vector, graph, or metadata-only collections.

Vector Collection

-- Minimal: dimension only (metric defaults to cosine)
CREATE COLLECTION docs (dimension = 768)

-- Explicit metric
CREATE COLLECTION docs (dimension = 768, metric = 'euclidean')

-- With storage quantization and HNSW parameters
CREATE COLLECTION docs (dimension = 768, metric = 'cosine')
  WITH (storage = 'sq8', m = 16, ef_construction = 200)

-- Dot product metric
CREATE COLLECTION embeddings (dimension = 384, metric = 'dotproduct')

Graph Collection

-- Schemaless graph with embeddings
CREATE GRAPH COLLECTION knowledge (dimension = 768, metric = 'cosine') SCHEMALESS

-- Schemaless pure graph (no embeddings)
CREATE GRAPH COLLECTION pure_graph

-- Typed graph with schema definition
CREATE GRAPH COLLECTION ontology (dimension = 768, metric = 'cosine') WITH SCHEMA (
  NODE Person (name: STRING, age: INTEGER),
  NODE Document (title: STRING, year: INTEGER),
  EDGE AUTHORED_BY FROM Person TO Document,
  EDGE CITED_BY FROM Document TO Document
)

Metadata Collection

-- No vectors, metadata-only storage
CREATE METADATA COLLECTION tags

CREATE Parameters

ParameterTypeRequiredDefaultDescription
dimensionintegerYes (vector/graph)--Embedding dimension
metricstringNocosineDistance metric: cosine, euclidean, dotproduct

WITH Options

OptionTypeDefaultDescription
storagestringfullStorage mode: full, sq8, binary
minteger16HNSW M parameter (max links per node)
ef_constructioninteger200HNSW build-time expansion factor

Schema Type Names

TypeDescription
STRINGText value
INTEGERInteger value
FLOATFloating-point value
BOOLEANTrue/false value
VECTORVector field (special)

DROP COLLECTION

-- Error if collection not found
DROP COLLECTION documents

-- No error if absent
DROP COLLECTION IF EXISTS documents

Index Management (v3.5+)

Create or drop secondary metadata indexes on collection payload fields. Secondary indexes use a BTree structure for O(log n) equality lookups on metadata fields, accelerating WHERE-clause filters.

CREATE INDEX

-- Create a secondary index on a metadata field
CREATE INDEX ON docs (category)

-- Case-insensitive keywords
create index on docs (status)

Index creation is idempotent -- creating an index that already exists is a no-op.

DROP INDEX

-- Remove a secondary metadata index
DROP INDEX ON docs (category)

Dropping a non-existent index succeeds silently (no error).

Notes

  • Secondary indexes only apply to metadata payload fields (not vector data).
  • Indexes accelerate equality filters in WHERE clauses (e.g., WHERE category = 'tech').
  • CREATE INDEX retroactively indexes all existing payloads for the field (a one-time backfill scan); the index is then maintained incrementally on subsequent upserts and deletes.
  • Secondary index definitions persist: a CREATE INDEX records the field in config.json and the index is rebuilt from the stored payloads on the next open, so it survives a process restart (EPIC-081). DROP INDEX removes the field from config.json, so a dropped index does not reappear on restart.

ANALYZE (v3.5+, histograms v3.9+)

Computes cost-based optimizer (CBO) statistics for a collection. The statistics are cached in memory and persisted to disk (collection.stats.json).

-- Compute statistics for query optimizer
ANALYZE docs
ANALYZE COLLECTION docs   -- optional COLLECTION keyword

Returns a JSON payload with collection statistics including total_points, row_count, deleted_count, avg_row_size_bytes, payload_size_bytes, and per-field cardinality information.

Histogram Construction (v3.9+)

ANALYZE builds equi-depth histograms on Int, Float, and String columns to enable accurate selectivity estimation for the CBO. Key characteristics:

  • Sampling: up to 10,000 rows sampled per column (separate from the 1,000-row payload stats sample).
  • Bucket count: 64 buckets by default. Fewer buckets when distinct values < 64 (one bucket per distinct value).
  • String columns: mapped to ordinal ranks (lexicographic sort order) for histogram construction.
  • Persistence: histograms are serialized as part of collection.stats.json and restored on database startup.
  • Incremental maintenance: upsert/delete operations update histogram bucket counts incrementally (O(log B) binary search). After 20% cumulative updates, the histogram is marked stale and a debug! log recommends re-running ANALYZE.
  • Backward compatibility: new histogram fields use #[serde(default)] so pre-histogram stats files deserialize without error.

TRUNCATE (v3.5+)

Deletes all rows from a collection without dropping the collection itself. Unlike DELETE FROM which requires a WHERE clause, TRUNCATE removes all data unconditionally.

-- Delete all rows
TRUNCATE docs
TRUNCATE COLLECTION docs   -- optional COLLECTION keyword

Returns a payload with {"deleted_count": N}. Truncating an empty collection succeeds with deleted_count: 0. The collection structure (config, indexes) is preserved after truncation.

ALTER COLLECTION (v3.5+)

Modifies collection settings at runtime. Uses the same key-value option syntax as CREATE COLLECTION.

-- Enable auto-reindex
ALTER COLLECTION docs SET (auto_reindex = true)

-- Disable auto-reindex
ALTER COLLECTION docs SET (auto_reindex = false)

Supported options:

OptionTypeDescription
auto_reindexbooleanEnable/disable automatic HNSW parameter tuning

Unknown options are rejected with an error message listing supported options. The change is applied to the live collection and persisted immediately, so it survives a restart — the auto-reindex policy is restored automatically on the next collection open. Setting auto_reindex = false keeps the policy attached but disabled (preserving any previously configured thresholds for a symmetric round-trip).

FLUSH (v3.6+)

Persists collection data to disk. Supports fast flush (WAL only) and full flush (includes index serialization). Can target all collections or a specific one.

-- Fast flush all collections (WAL only)
FLUSH

-- Full flush all collections (includes index serialization)
FLUSH FULL

-- Fast flush a specific collection
FLUSH docs

-- Full flush a specific collection
FLUSH FULL docs

Returns a payload with {"status": "flushed", "full": true/false}.

  • Fast flush (FLUSH): Writes WAL data to disk. Fast but requires WAL replay on next startup if the process crashes before a full flush.
  • Full flush (FLUSH FULL): Writes WAL data and serializes vector indexes. Slower but ensures fast startup without WAL replay. Recommended before graceful shutdown.

Flushing a nonexistent collection returns a CollectionNotFound error. Flushing an empty database (no collections) succeeds with no error.

Note: full (case-insensitive) is a reserved modifier for FLUSH. To flush a collection named full, use FLUSH FULL full or quote the identifier: FLUSH `full`.


DML Statements (INSERT, UPDATE, DELETE)

INSERT INTO (v3.2+, multi-row v3.5+)

Insert one or more rows into a collection. Column names are listed in parentheses, followed by corresponding values. Multiple rows are separated by commas (v3.5+).

-- Insert a single row
INSERT INTO docs (id, title, category) VALUES (1, 'Getting Started', 'tutorial')

-- Insert with numeric values
INSERT INTO products (id, price, stock) VALUES (42, 29.99, 100)

-- Insert with boolean values
INSERT INTO flags (id, active, verified) VALUES (1, TRUE, FALSE)

-- Insert with NULL
INSERT INTO events (id, description, metadata) VALUES (10, 'system boot', NULL)

-- Insert with parameter-bound vector
INSERT INTO docs (id, vector, category) VALUES (1, $vector, 'test')

-- Multi-row insert (v3.5+)
INSERT INTO docs (id, vector, title) VALUES (1, $v1, 'Hello'), (2, $v2, 'World'), (3, $v3, 'Foo')

Column count must match value count in every row. Types are inferred from literal values. Multi-row INSERT executes as a single batch upsert() call.

UPSERT INTO (v3.5+)

Insert or update rows. Identical syntax to INSERT; semantically equivalent since VelesDB's INSERT is always idempotent (existing IDs are overwritten). UPSERT makes this intent explicit at the SQL level.

-- Single-row upsert
UPSERT INTO docs (id, vector, title) VALUES (1, $v, 'Updated')

-- Multi-row upsert
UPSERT INTO docs (id, vector, title) VALUES (1, $v1, 'A'), (2, $v2, 'B')

UPDATE (v3.2+)

Update one or more fields in existing rows.

-- Update a single field
UPDATE docs SET status = 'archived' WHERE id = 42

-- Update multiple fields
UPDATE products SET price = 19.99, stock = 50 WHERE category = 'sale'

-- Update with boolean
UPDATE products SET featured = TRUE WHERE id = 1

-- Update without WHERE (updates all rows -- use with caution)
UPDATE products SET featured = FALSE

The WHERE clause is optional but strongly recommended.

DELETE FROM (v3.3+)

Delete rows from a collection. The WHERE clause is mandatory to prevent accidental full-collection deletion.

-- Delete by single ID
DELETE FROM docs WHERE id = 42

-- Delete by multiple IDs
DELETE FROM docs WHERE id IN (1, 2, 3)

-- Delete with string comparison
DELETE FROM logs WHERE level = 'debug'

-- Delete with compound condition
DELETE FROM events WHERE status = 'expired' AND created_at < 1000

VelesQL rejects DELETE FROM <collection> without a WHERE predicate as a safety measure.

INSERT EDGE (v3.3+)

Insert a labeled edge between two nodes in a graph collection.

-- Basic edge insertion
INSERT EDGE INTO knowledge (source = 1, target = 2, label = 'AUTHORED_BY')

-- With explicit edge ID
INSERT EDGE INTO knowledge (id = 10, source = 1, target = 2, label = 'KNOWS')

-- With edge properties
INSERT EDGE INTO knowledge (source = 1, target = 2, label = 'AUTHORED_BY')
  WITH PROPERTIES (year = 2026, confidence = 0.95)

-- With ID and properties
INSERT EDGE INTO kg (id = 20, source = 2, target = 3, label = 'KNOWS')
  WITH PROPERTIES (weight = 0.9)

Edge Parameters

ParameterTypeRequiredDescription
sourceintegerYesSource node ID
targetintegerYesTarget node ID
labelstringYesEdge relationship type
idintegerNoExplicit edge ID (auto-generated if omitted)

WITH PROPERTIES accepts key-value pairs (strings, integers, or floats).

DELETE EDGE (v3.3+)

Remove an edge by its ID from a graph collection.

DELETE EDGE 123 FROM knowledge
DELETE EDGE 10 FROM kg

SELECT EDGES (v3.5+)

Query edges from a graph collection with optional filtering by source node, target node, or edge label.

-- All edges
SELECT EDGES FROM kg LIMIT 100

-- Outgoing edges from a node
SELECT EDGES FROM kg WHERE source = 1

-- Incoming edges to a node
SELECT EDGES FROM kg WHERE target = 2

-- Edges by label
SELECT EDGES FROM kg WHERE label = 'KNOWS'

-- Combined: outgoing with label
SELECT EDGES FROM kg WHERE source = 1 AND label = 'KNOWS'

Each result contains a JSON payload with edge_id, source, target, label, and properties fields.

Limitation: WHERE supports at most two conditions combined with AND. Three-condition queries like WHERE source = 1 AND label = 'X' AND target = 2 are not supported. Use two-condition AND and filter results client-side if needed.

INSERT NODE (v3.5+)

Insert or update a node's payload in a graph collection.

-- With JSON payload
INSERT NODE INTO kg (id = 42, payload = '{"name": "Alice", "_labels": ["Person"]}')

-- With inline fields (collected into a JSON object)
INSERT NODE INTO kg (id = 10, name = 'Bob', age = 30)

If a node with the given id already exists, the payload is replaced (upsert semantics).


TRAIN QUANTIZER Command (v2.2+)

Explicitly train a Product Quantizer on a collection's vector data.

Syntax

TRAIN QUANTIZER ON <collection> WITH (<parameters>)

Parameters

ParameterTypeDefaultDescription
minteger8Number of subspaces
kinteger256Codebook size per subspace
typestring--Quantizer type: pq, opq, rabitq
oversamplinginteger--Training oversampling ratio
sampleinteger--Number of vectors to sample
forcebooleanfalseForce retrain if exists

Examples

-- Train with standard parameters
TRAIN QUANTIZER ON my_embeddings WITH (m = 8, k = 256)

-- Higher compression (more subspaces)
TRAIN QUANTIZER ON large_collection WITH (m = 16, k = 256)

-- Finer codebooks
TRAIN QUANTIZER ON precision_collection WITH (m = 8, k = 512)

-- With quantizer type
TRAIN QUANTIZER ON vectors WITH (m = 16, k = 256, type = opq)

-- Force retrain with oversampling
TRAIN QUANTIZER ON my_coll WITH (m = 8, k = 256, oversampling = 4, force = true)

-- Semicolons optional
TRAIN QUANTIZER ON my_collection WITH (m = 8, k = 256);

Notes

  • Training is explicit -- it does not happen automatically.
  • The collection must contain enough vectors (recommended: at least k).
  • Re-training overwrites the existing quantizer.
  • OPQ can be enabled via the type parameter.
  • Trained quantizers are persisted in the collection directory (codebook.pq / rotation.opq for PQ/OPQ, rabitq.idx for RaBitQ) and restored on database open: the PQ cache and RaBitQ encodings are rebuilt by re-encoding the stored vectors (O(n) at open).
  • type = rabitq also installs the trained quantizer into the live index when the collection was created with storage = 'rabitq'. For collections created with another storage mode, training persists the index and flips the collection's storage mode; the RaBitQ backend takes effect at the next open.

Agent Memory VelesQL Queries (v3.2+)

The Agent Memory SDK creates three internal collections queryable via VelesQL:

Semantic Memory (_semantic_memory)

SELECT * FROM _semantic_memory
WHERE vector NEAR $query_embedding
LIMIT 10 WITH (mode = 'accurate')

Payload fields: content (string).

Episodic Memory (_episodic_memory)

-- Recent events (last 24h)
SELECT * FROM _episodic_memory
WHERE timestamp > 1711234567
ORDER BY timestamp DESC
LIMIT 20

-- Similarity search on events
SELECT * FROM _episodic_memory
WHERE vector NEAR $query_embedding
LIMIT 5

Payload fields: description (string), timestamp (integer, Unix epoch).

Procedural Memory (_procedural_memory)

SELECT * FROM _procedural_memory
WHERE confidence > 0.7
ORDER BY confidence DESC
LIMIT 10

Payload fields: name (string), steps (array), confidence (float), usage_count (integer), created_at (integer), last_used_at (integer).

ACT-R activation decay (v1.15.0, opt-in). When the surrounding ProceduralMemory is built with ProceduralMemory::with_activation_decay(d) (Phase 1 of #433), the confidence value returned by recall queries is adjusted on read by an ACT-R base-level activation decay (c_effective = c_stored × (Δt^-d), Anderson 1996, default d ≈ 0.5). The stored value is unchanged — VelesQL filters and ORDER BY see the decayed (effective) confidence. Queries written before v1.15.0 keep working unchanged because the strategy is opt-in.

Convenience API (Rust/Python)

let results = agent_memory.query_semantic(
    "SELECT * FROM _semantic_memory WHERE vector NEAR $v LIMIT 5",
    &params,
)?;

All three subsystems support: vector NEAR, payload filters, ORDER BY, LIMIT/OFFSET, WITH options, LET bindings, and USING FUSION.

VelesQL queryability of agent memory collections is verified by BDD integration tests (tests/bdd/agent_memory.rs) that exercise the full Database::execute_query pipeline, and by unit tests in src/agent/velesql_bridge_tests.rs that test the Collection::execute_query_str path.


Value Types

Strings

Single-quoted. To include a single quote, double it (''):

'hello world'
'O''Brien'                -- produces: O'Brien
'It''s a ''test'''        -- produces: It's a 'test'
''                        -- empty string

Unicode characters are fully supported:

'日本語テキスト'
'🚀 Launch'

Warning: Backslash escaping (\') is not supported. Always use ''.

Numbers

42            -- integer
3.14          -- float
-100          -- negative integer
-0.5          -- negative float

Booleans

TRUE
FALSE

Case-insensitive: true, True, TRUE are all equivalent.

Vectors

[0.1, 0.2, 0.3, 0.4]           -- literal vector (dense)
$query_vector                    -- parameter reference

Sparse Vectors

{0: 0.9, 14: 0.7, 256: 0.5}   -- literal sparse vector
$sparse_query                    -- parameter reference

NULL

NULL

Temporal Values

NOW()                            -- current Unix timestamp
INTERVAL '7 days'               -- time duration
NOW() - INTERVAL '24 hours'     -- temporal arithmetic

Parameters

Use $name syntax for parameterized queries. Parameters are resolved at runtime from the query context.

SELECT * FROM docs WHERE vector NEAR $query_vector AND category = $cat LIMIT 10

Parameters can be used for:

  • Vectors: $v, $query_embedding
  • Sparse vectors: $sparse, $sv
  • Scalar values: $category, $min_price

Execution Surfaces & CLI REPL Limitations

VelesQL is accepted by the REST API (velesdb-server), the language SDKs, and the interactive CLI REPL (velesdb-cli). The grammar is identical across all three, but the CLI REPL has two execution limitations a newcomer should know:

  • Parameterized vector search is not executed in the REPL. A query whose WHERE uses vector NEAR $param (or MATCH ... $param) prints "Vector search with $parameter requires REST API" and returns zero rows — the REPL has no way to bind an external vector. Use an inline vector literal (WHERE vector NEAR [0.1, 0.2, 0.3]), a metadata-only query, or the REST API with the parameter supplied in the request body.
  • MATCH requires an active collection. Run .use <collection_name> first; otherwise the REPL reports "MATCH queries require an active collection".

These limitations are specific to the REPL; the REST API and SDKs execute both parameterized vector search and MATCH normally.

Divergences from standard SQL

  • Bare table aliases reserve clause keywords: FROM docs d and FROM docs AS d are both accepted and equivalent, but a bare alias may not be a clause keyword (WHERE, LIMIT, JOIN, UNION, ...) — quote it with backticks to use one anyway.
  • vector and score are reserved keywords, not free payload field names — vector NEAR ... and score in aggregates/ordering refer to the query vector and the computed similarity score.

Identifier Quoting

Reserved Keywords

The following keywords cannot be used as identifiers without quoting:

CategoryKeywords
Query structureSELECT, FROM, WHERE, AS
Logical operatorsAND, OR, NOT
Comparison operatorsIN, BETWEEN, LIKE, ILIKE, MATCH
NULL handlingIS, NULL
Boolean literalsTRUE, FALSE
PaginationLIMIT, OFFSET
SortingORDER, BY, ASC, DESC
AggregationGROUP, HAVING
Vector operationsNEAR, NEAR_FUSED, SPARSE_NEAR, SIMILARITY
ExtensionsFUSE, TRAIN, QUANTIZER, WITH
DDLCREATE, DROP, COLLECTION, GRAPH, METADATA
Graph/DMLEDGE, SCHEMALESS, SCHEMA, NODE, PROPERTIES
ConditionalIF, EXISTS
DMLINSERT, INTO, UPDATE, SET, DELETE, VALUES
Set operationsUNION, ALL, INTERSECT, EXCEPT
JOINsJOIN, INNER, LEFT, RIGHT, FULL, OUTER, ON, USING
BindingsLET, RETURN, MATCH
TemporalNOW, INTERVAL
MiscDISTINCT, COUNT, SUM, AVG, MIN, MAX, FIRST, CONTAINS

Quoting Styles

To use reserved keywords or special characters as identifiers, quote them:

Backtick quoting (MySQL-style):

SELECT `select`, `from`, `order` FROM docs

Double-quote quoting (SQL standard):

SELECT "select", "from", "order" FROM docs

Mixed styles in the same query:

SELECT `select`, "order" FROM my_table WHERE `limit` > 10

Escaping Quotes Inside Identifiers

Double the quote character to include it:

-- Column named: col"name
SELECT "col""name" FROM docs

Examples

-- Reserved keyword as collection name
SELECT * FROM `order`

-- Reserved keywords in WHERE
SELECT * FROM docs WHERE `select` = 'value' AND `from` LIKE '%pattern%'

-- Reserved keywords in ORDER BY
SELECT * FROM docs ORDER BY `order` ASC

-- Reserved keywords in GROUP BY
SELECT `group`, COUNT(*) FROM docs GROUP BY `group`

-- Reserved keyword as alias
SELECT id AS `select` FROM docs

Default Values Reference

FeatureDefaultNotes
LIMIT10 (every SELECT form)Exceptions: MATCH ... RETURN and compound queries (UNION/INTERSECT/EXCEPT) have no implicit LIMIT 10 (bounded by the 100 000-row server ceiling). Always specify LIMIT for exhaustive retrieval
OFFSET0
ORDER BY directionASCExplicit DESC recommended for similarity
metric (CREATE)cosine
storage (CREATE)fullfull = no quantization
m (HNSW)16
ef_construction200
WITH modebalanced
USING FUSION strategyrrf
USING FUSION k60
ef_searchDepends on modefast=32, balanced=64, accurate=256, perfect=512

Error Handling

VelesQL returns structured errors:

Error TypeDescription
SyntaxErrorInvalid query syntax
SemanticErrorValid syntax but invalid semantics
CollectionNotFoundReferenced collection doesn't exist
ColumnNotFoundReferenced column doesn't exist
TypeMismatchIncompatible types in comparison
TimeoutQuery exceeded timeout_ms

Quick Reference

Statement Types

StatementPurposeExample
SELECTQuery dataSELECT * FROM docs WHERE vector NEAR $v LIMIT 10
MATCHGraph traversalMATCH (a)-[:KNOWS]->(b) RETURN a.name
INSERT INTOAdd rows (multi-row)INSERT INTO docs (id, title) VALUES (1, 'A'), (2, 'B')
UPSERT INTOInsert or update rowsUPSERT INTO docs (id, title) VALUES (1, 'Updated')
UPDATEModify rowsUPDATE docs SET status = 'done' WHERE id = 1
DELETE FROMRemove rowsDELETE FROM docs WHERE id = 42
INSERT EDGEAdd graph edgeINSERT EDGE INTO kg (source=1, target=2, label='REL')
DELETE EDGERemove graph edgeDELETE EDGE 123 FROM kg
SELECT EDGESQuery graph edgesSELECT EDGES FROM kg WHERE source = 1 LIMIT 10
INSERT NODEAdd/update node payloadINSERT NODE INTO kg (id = 42, payload = '{"name": "Alice"}')
CREATE COLLECTIONCreate collectionCREATE COLLECTION docs (dimension=768)
DROP COLLECTIONDelete collectionDROP COLLECTION IF EXISTS docs
CREATE INDEXAdd metadata indexCREATE INDEX ON docs (category)
DROP INDEXRemove metadata indexDROP INDEX ON docs (category)
TRAIN QUANTIZERTrain compressionTRAIN QUANTIZER ON docs WITH (m=8, k=256)
SHOW COLLECTIONSList collectionsSHOW COLLECTIONS
DESCRIBECollection metadataDESCRIBE COLLECTION docs
EXPLAINQuery planEXPLAIN SELECT * FROM docs LIMIT 10

WHERE Operators

OperatorSyntaxExample
NEARvector NEAR $vWHERE vector NEAR $query
SPARSE_NEARvector SPARSE_NEAR $vWHERE vector SPARSE_NEAR $sparse
NEAR_FUSEDvector NEAR_FUSED [$v1,$v2]WHERE vector NEAR_FUSED [$a, $b]
similarity()similarity(field, $v) op NWHERE similarity(emb, $v) > 0.8
MATCH (text)column MATCH 'text'WHERE content MATCH 'database'
CONTAINS_TEXTcolumn CONTAINS_TEXT 'literal'WHERE name CONTAINS_TEXT 'rust' (case-insensitive, no BM25 ranking)
CONTAINS (array)column CONTAINS valueWHERE tags CONTAINS 'rust'
= != <> > >= < <=column op valueWHERE price > 100
IN / NOT INcolumn [NOT] IN (values)WHERE id IN (1, 2, 3)
BETWEENcolumn BETWEEN low AND highWHERE price BETWEEN 50 AND 200
LIKE / ILIKEcolumn [I]LIKE 'pattern'WHERE title LIKE 'rust%', WHERE name ILIKE '%rust%'
IS NULL / IS NOT NULLcolumn IS [NOT] NULLWHERE email IS NOT NULL
Scalar subquerycolumn op (SELECT … )WHERE views > (SELECT AVG(views) FROM stats) — executed and substituted as a literal; correlated subqueries rejected with V010 (SubqueryNotExecutable)
Graph match predicateMATCH (...) in WHEREWHERE MATCH (a:Person)-[:KNOWS]->(b) AND a.id = $u
GEO_DISTANCEGEO_DISTANCE(col, lat, lng) op metersWHERE GEO_DISTANCE(location, 48.8566, 2.3522) < 500
GEO_BBOXGEO_BBOX(col, lat_min, lng_min, lat_max, lng_max)WHERE GEO_BBOX(location, 48.8, 2.3, 48.9, 2.4)

Geospatial Functions

GEO_DISTANCE

Computes the Haversine great-circle distance in meters between a GeoPoint column value and a reference coordinate pair.

SELECT * FROM places WHERE GEO_DISTANCE(location, 48.8566, 2.3522) < 500;
SELECT * FROM places WHERE GEO_DISTANCE(location, 48, 2) >= 1000;
  • Coordinates accept both float (48.8566) and integer (48) literals.
  • Returns distance in meters (SI unit).
  • Null GeoPoint values are excluded from results.
  • Non-GeoPoint or non-existent columns return empty results (no error).
  • Combinable with AND, OR, NOT, and other WHERE operators.

GEO_BBOX

Tests whether a GeoPoint column value falls within a latitude/longitude bounding box (inclusive).

SELECT * FROM places WHERE GEO_BBOX(location, 48.8, 2.3, 48.9, 2.4);
  • All four coordinate parameters accept float and integer literals.
  • Boundary is inclusive: points exactly on the edge are included.
  • Inverted coordinates (lat_min > lat_max) return empty results.
  • Null GeoPoint values are excluded from results. | NOT IN | column NOT IN (values) | WHERE status NOT IN ('deleted') | | BETWEEN | column BETWEEN a AND b | WHERE price BETWEEN 10 AND 100 | | LIKE | column LIKE 'pattern' | WHERE name LIKE 'John%' | | ILIKE | column ILIKE 'pattern' | WHERE name ILIKE '%john%' | | IS NULL | column IS NULL | WHERE deleted_at IS NULL | | IS NOT NULL | column IS NOT NULL | WHERE email IS NOT NULL | | AND / OR / NOT | logical combinators | WHERE a > 1 AND (b = 2 OR c = 3) |

Clauses

ClausePurposeExample
FROM ... ASSource with aliasFROM docs AS d
WHEREFilter conditionsWHERE status = 'active'
ORDER BY ... ASC/DESCSort resultsORDER BY similarity() DESC
LIMIT NCap result countLIMIT 10
OFFSET NSkip first N resultsOFFSET 20
GROUP BYAggregate groupingGROUP BY category
HAVINGFilter groupsHAVING COUNT(*) > 5
JOIN ... ONCombine collectionsJOIN users ON docs.author_id = users.id
WITH (...)Search optionsWITH (mode = 'accurate')
USING FUSION(...)Hybrid strategyUSING FUSION(strategy = 'rrf')
LET x = exprScore bindingLET score = 0.7 * vector_score
DISTINCTDeduplicateSELECT DISTINCT category FROM docs

Aggregate Functions

FunctionDescriptionExample
COUNT(*)Count rowsSELECT COUNT(*) FROM docs
COUNT(col)Count non-nullSELECT COUNT(email) FROM users
SUM(col)Sum valuesSELECT SUM(price) FROM orders
AVG(col)Average valueSELECT AVG(rating) FROM reviews
MIN(col)Minimum valueSELECT MIN(created_at) FROM logs
MAX(col)Maximum valueSELECT MAX(score) FROM results
MAX(score)Max similarity in groupSELECT MAX(score) AS rel FROM chunks ... GROUP BY parent_id
AVG(score)Mean similarity in groupSELECT AVG(score) AS avg FROM chunks ... GROUP BY parent_id
FIRST(col)Value from best chunkSELECT FIRST(text) AS excerpt FROM chunks ... GROUP BY parent_id

Value Types

TypeSyntaxExample
String'single quotes''hello world'
Integerdigits42, -7
Floatdigits.digits3.14, -0.5
BooleanTRUE / FALSETRUE
NULLNULLNULL
Vector[floats][0.1, 0.2, 0.3]
Sparse Vector{idx: val}{12: 0.8, 45: 0.3}
Parameter$name$query_vector
TemporalNOW(), INTERVALNOW() - INTERVAL '7 days'

Score Variables (for ORDER BY and LET)

VariableSourceWhen Populated
vector_scoreHNSW dense searchNEAR clause present
bm25_scoreFull-text BM25MATCH text clause present
sparse_scoreSparse vector searchSPARSE_NEAR clause present
graph_scoreGraph traversalMATCH graph pattern present
fused_scoreAfter fusionUSING FUSION applied
similarity()Primary search scoreAny search clause

Complete Examples

-- Basic: find 10 nearest documents
SELECT * FROM documents WHERE vector NEAR $query LIMIT 10

-- With similarity score in results
SELECT id, title, similarity() AS score
FROM documents WHERE vector NEAR $query
ORDER BY similarity() DESC LIMIT 5

-- With metadata filter
SELECT * FROM articles
WHERE vector NEAR $query AND category = 'science' AND year >= 2024
LIMIT 10

-- With search quality tuning
SELECT * FROM embeddings WHERE vector NEAR $query
ORDER BY similarity() DESC LIMIT 20
WITH (mode = 'accurate', ef_search = 256)
-- Keyword search with BM25 scoring
SELECT * FROM articles WHERE content MATCH 'machine learning' LIMIT 10

-- Case-insensitive pattern matching
SELECT * FROM users WHERE name ILIKE '%john%'

-- Pattern with wildcards
SELECT * FROM products WHERE sku LIKE 'SKU-2024-%'

Hybrid Search (Vector + Text)

-- Dense vector + BM25 text with RRF fusion
SELECT * FROM docs
WHERE vector NEAR $query AND content MATCH 'neural networks'
LIMIT 10 USING FUSION(strategy = 'rrf', k = 60)

-- With custom scoring weights (the evaluated binding 'score' is injected
-- into each row's payload; `SELECT *, expr` is not valid syntax)
LET score = 0.7 * vector_score + 0.3 * bm25_score
SELECT *
FROM documents
WHERE vector NEAR $query AND content MATCH 'transformer'
ORDER BY score DESC
LIMIT 10

-- Dense + sparse vector fusion
SELECT * FROM docs
WHERE vector NEAR $dense_query AND vector SPARSE_NEAR $sparse_query
LIMIT 10 USING FUSION(strategy = 'rrf', k = 60)

-- Multi-vector fusion (text + image embeddings) — inline NEAR_FUSED form.
-- NEAR_FUSED supports only rrf, average, or maximum; 'weighted'/'rsf' are
-- rejected (ill-defined over homogeneous query vectors).
SELECT * FROM products
WHERE vector NEAR_FUSED [$text_emb, $image_emb] USING FUSION 'rrf'
LIMIT 20

Filtering and Aggregation

-- Complex WHERE conditions
SELECT * FROM products
WHERE category IN ('electronics', 'computers')
  AND price BETWEEN 100 AND 500
  AND brand IS NOT NULL
  AND name NOT IN ('discontinued')
LIMIT 50

-- Aggregation with grouping
SELECT category, COUNT(*) AS total, AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING COUNT(*) > 5
ORDER BY avg_price DESC

-- Nested payload field access
SELECT payload.author, payload.metadata.language, COUNT(*)
FROM articles
GROUP BY payload.author, payload.metadata.language

-- Temporal filtering
SELECT * FROM events
WHERE timestamp > NOW() - INTERVAL '7 days'
ORDER BY timestamp DESC
LIMIT 100

-- Deduplicated results
SELECT DISTINCT category FROM products WHERE stock > 0

Graph Queries

-- Basic traversal
MATCH (person:Person)-[:AUTHORED]->(doc:Document)
WHERE similarity(doc.embedding, $query) > 0.7
RETURN person.name, doc.title, similarity() AS score
ORDER BY similarity() DESC
LIMIT 10

-- Multi-hop traversal
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company)
WHERE a.department = 'engineering'
RETURN a.name, b.name, c.name
LIMIT 20

-- Variable-length paths (1 to 3 hops)
MATCH (start:Document)-[*1..3]->(end:Document)
WHERE start.topic = 'AI'
RETURN start.title, end.title

-- Graph mutations
INSERT NODE INTO knowledge (id = 1, payload = '{"name": "Alice", "_labels": ["Author"]}');
INSERT EDGE INTO knowledge (source = 1, target = 2, label = 'AUTHORED_BY')
  WITH PROPERTIES (confidence = 0.95, year = 2026);

-- Query edges
SELECT EDGES FROM knowledge WHERE source = 1;
SELECT EDGES FROM knowledge WHERE label = 'AUTHORED_BY' LIMIT 10;

DELETE EDGE 42 FROM knowledge;

Collection Management (DDL)

-- Create vector collection
CREATE COLLECTION documents (dimension = 768)

-- Create with explicit parameters
CREATE COLLECTION embeddings (dimension = 384, metric = 'euclidean')
  WITH (storage = 'sq8', m = 16, ef_construction = 200)

-- Create graph collection
CREATE GRAPH COLLECTION knowledge (dimension = 768, metric = 'cosine') SCHEMALESS

-- Create graph with typed schema
CREATE GRAPH COLLECTION ontology (dimension = 768) WITH SCHEMA (
  NODE Person (name: STRING, age: INTEGER),
  NODE Organization (name: STRING),
  EDGE WORKS_AT FROM Person TO Organization
)

-- Create metadata-only collection
CREATE METADATA COLLECTION tags

-- Drop collection safely
DROP COLLECTION IF EXISTS old_data

-- Create a secondary index on a metadata field
CREATE INDEX ON documents (category)

-- Drop a secondary index
DROP INDEX ON documents (category)

Data Manipulation (DML)

-- Insert a row
INSERT INTO documents (id, title, category) VALUES (1, 'Getting Started', 'tutorial')

-- Update fields
UPDATE documents SET status = 'published', updated = TRUE WHERE id = 1

-- Delete with mandatory WHERE
DELETE FROM documents WHERE id IN (10, 11, 12)
DELETE FROM logs WHERE level = 'debug' AND timestamp < 1000

Join Queries

-- Inner join
SELECT o.id, c.name, o.total
FROM orders AS o
JOIN customers AS c ON o.customer_id = c.id
WHERE o.total > 100

-- Self-join
SELECT a.title AS original, b.title AS related
FROM articles AS a
JOIN articles AS b ON a.parent_id = b.id
WHERE a.category = 'tech'

-- Join with USING shorthand
SELECT * FROM orders AS o
JOIN products AS p USING (product_id)

Set Operations

-- Combine results from two queries
SELECT id FROM recommended_docs WHERE user_id = 1
UNION
SELECT id FROM trending_docs WHERE score > 0.8

-- Find common items
SELECT item_id FROM user_1_likes
INTERSECT
SELECT item_id FROM user_2_likes

-- Exclude items
SELECT id FROM all_docs
EXCEPT
SELECT id FROM archived_docs

Advanced Scoring with LET

-- Weighted hybrid scoring
LET relevance = 0.6 * vector_score + 0.4 * bm25_score
SELECT id, title, relevance AS score
FROM articles
WHERE vector NEAR $query AND content MATCH 'deep learning'
ORDER BY relevance DESC
LIMIT 10

-- Chained bindings
LET base = 0.7 * vector_score + 0.3 * bm25_score
LET boosted = base * 1.5
SELECT * FROM docs
WHERE vector NEAR $v AND content MATCH 'AI'
ORDER BY boosted DESC
LIMIT 5

Training

-- Train product quantizer
TRAIN QUANTIZER ON embeddings WITH (m = 8, k = 256)

-- Train with higher compression
TRAIN QUANTIZER ON large_collection WITH (m = 16, k = 256)

Multi-Row INSERT and UPSERT

-- Multi-row insert (batch)
INSERT INTO products (id, title, price) VALUES
  (1, 'Laptop', 999.99),
  (2, 'Mouse', 29.99),
  (3, 'Keyboard', 79.99)

-- Upsert: insert or update if exists
UPSERT INTO products (id, title, price) VALUES (1, 'Laptop Pro', 1299.99)

-- Multi-row upsert
UPSERT INTO products (id, title, price) VALUES
  (1, 'Laptop Pro', 1299.99),
  (4, 'Monitor', 499.99)

Graph Edge and Node Operations

-- Query all edges in a graph collection
SELECT EDGES FROM knowledge LIMIT 100

-- Query outgoing edges from a specific node
SELECT EDGES FROM knowledge WHERE source = 1

-- Query edges by relationship type
SELECT EDGES FROM knowledge WHERE label = 'AUTHORED_BY'

-- Insert a node with payload into a graph collection
INSERT NODE INTO knowledge (id = 42, payload = '{"name": "Alice", "_labels": ["Person"]}')

Introspection

-- List all collections with their types
SHOW COLLECTIONS

-- Get collection metadata (dimension, metric, point count, type)
DESCRIBE COLLECTION documents
DESCRIBE documents

-- View query execution plan without running
EXPLAIN SELECT * FROM docs WHERE vector NEAR $v AND category = 'tech' LIMIT 10

For instrumented execution (plan + actual rows / time / per-node stats, plus feedback_calibration once the EMA loop is warm), use POST /query/explain with "analyze": true — see EXPLAIN ANALYZE above (EXPLAIN ANALYZE is not a parsed statement).

Window Functions

-- Rank top-3 results per category by vector similarity
SELECT id, title, category, similarity() AS score,
       ROW_NUMBER() OVER (PARTITION BY category ORDER BY similarity() DESC) AS rank_in_cat
FROM articles
WHERE vector NEAR $query
LIMIT 50

-- Dense rank per author by publication year (window ORDER BY accepts
-- columns or similarity(), not aggregate expressions like COUNT(*))
SELECT author, year, COUNT(*) AS papers,
       DENSE_RANK() OVER (PARTITION BY author ORDER BY year DESC) AS recency_rank
FROM publications
GROUP BY author, year

Agent Memory

-- Recall the 5 facts most similar to a query embedding (semantic memory)
SELECT * FROM _semantic_memory
WHERE vector NEAR $query
ORDER BY similarity() DESC
LIMIT 5

-- Recent events from the last day (episodic memory)
SELECT * FROM _episodic_memory
WHERE timestamp > NOW() - INTERVAL '1 day'
ORDER BY timestamp DESC
LIMIT 50

-- Recall high-confidence procedures matching a goal embedding.
-- With v1.15.0 ACT-R decay enabled (`with_activation_decay(0.5)`),
-- `confidence` here is the time-decayed effective confidence.
SELECT * FROM _procedural_memory
WHERE vector NEAR $goal AND confidence > 0.6
ORDER BY confidence DESC
LIMIT 10

Administration

-- Create a secondary index for fast metadata lookups
CREATE INDEX ON documents (category)

-- Drop an index
DROP INDEX ON documents (category)

-- Compute CBO statistics for query optimizer
ANALYZE documents

-- Delete all data from a collection (preserves the collection)
TRUNCATE documents

-- Modify collection settings
ALTER COLLECTION documents SET (auto_reindex = true)

-- Flush data to disk (WAL only)
FLUSH

-- Full flush (includes index serialization)
FLUSH FULL

-- Flush a specific collection
FLUSH documents

EBNF Grammar (v3.10.0)

Derived from the executable PEG grammar crates/velesdb-core/src/velesql/grammar.pest (the source of truth — regenerate this annex whenever the grammar changes).

(* ═══════════════════════════════════════════════════════ *)
(* Entry point                                            *)
(* ═══════════════════════════════════════════════════════ *)

query             = let_clause* (show_collections_stmt | describe_stmt
                    | explain_stmt | analyze_stmt | truncate_stmt
                    | alter_collection_stmt | flush_stmt
                    | match_query | select_edges_stmt
                    | compound_query | train_stmt
                    | create_index_stmt | create_collection_stmt
                    | drop_index_stmt | drop_collection_stmt
                    | insert_node_stmt | insert_edge_stmt
                    | delete_edge_stmt | delete_stmt
                    | insert_stmt | upsert_stmt | update_stmt) [";"] ;

(* ═══════════════════════════════════════════════════════ *)
(* Introspection statements (v3.4)                        *)
(* ═══════════════════════════════════════════════════════ *)

show_collections_stmt = "SHOW" "COLLECTIONS" ;
describe_stmt         = "DESCRIBE" ["COLLECTION"] identifier ;
explain_stmt          = "EXPLAIN" compound_query ;

(* ═══════════════════════════════════════════════════════ *)
(* LET bindings (v3.2)                                    *)
(* ═══════════════════════════════════════════════════════ *)

let_clause        = "LET" identifier "=" order_by_arithmetic ;

(* ═══════════════════════════════════════════════════════ *)
(* MATCH graph queries (v2.1)                             *)
(* ═══════════════════════════════════════════════════════ *)

match_query       = "MATCH" graph_pattern
                    [where_clause] return_clause
                    [order_by_clause] [limit_clause] ;

graph_pattern     = node_pattern (relationship_pattern node_pattern)* ;
node_pattern      = "(" [node_spec] ")" ;
node_spec         = [node_alias] [node_labels] [collection_annotation]
                    [node_properties] ;
node_alias        = identifier ;
node_labels       = ":" label_name (":" label_name)* ;
collection_annotation = "@" identifier ;
node_properties   = "{" property ("," property)* "}" ;
property          = identifier ":" property_value ;
property_value    = string | float | integer | boolean | null | parameter ;

relationship_pattern = rel_incoming | rel_outgoing | rel_undirected ;
rel_incoming      = "<-" [rel_spec] "-" ;
rel_outgoing      = "-" [rel_spec] "->" ;
rel_undirected    = "-" [rel_spec] "-" ;
rel_spec          = "[" [rel_details] "]" ;
rel_details       = [rel_alias] [rel_types] [rel_range] [node_properties] ;
rel_alias         = identifier ;
rel_types         = ":" rel_type_name ("|" rel_type_name)* ;
rel_range         = "*" [range_spec] ;
range_spec        = range_bound ".." [range_bound]
                  | ".." range_bound
                  | integer ;

return_clause     = "RETURN" return_item ("," return_item)* ;
return_item       = return_expr ["AS" identifier] ;
return_expr       = "similarity" "(" ")" | property_access | identifier | "*" ;
property_access   = identifier "." identifier ;

(* ═══════════════════════════════════════════════════════ *)
(* SELECT + compound queries                              *)
(* ═══════════════════════════════════════════════════════ *)

compound_query    = select_stmt (set_operator select_stmt)* ;
set_operator      = "UNION" "ALL" | "UNION" | "INTERSECT" | "EXCEPT" ;

select_stmt       = "SELECT" [distinct_modifier] select_list
                    "FROM" from_clause join_clause*
                    [where_clause]
                    [group_by_clause] [having_clause]
                    [order_by_clause]
                    [limit_clause] [offset_clause]
                    [with_clause]
                    [using_fusion_clause] ;

distinct_modifier = "DISTINCT" ;

(* SELECT list *)
select_list       = "*" | select_item ("," select_item)* ;
select_item       = "similarity" "(" ")" ["AS" identifier]
                  | window_item
                  | aggregate_function ["AS" identifier]
                  | qualified_wildcard
                  | column ["AS" identifier] ;
qualified_wildcard = identifier "." "*" ;
column            = identifier ("." identifier)* ;

(* Window functions (v1.13.0) *)
window_item       = window_function_name "(" ")" "OVER" "(" over_clause ")"
                    ["AS" identifier] ;
window_function_name = "ROW_NUMBER" | "DENSE_RANK" | "RANK" ;
over_clause       = [partition_by_clause] [window_order_by_clause] ;
partition_by_clause = "PARTITION" "BY" column ("," column)* ;
window_order_by_clause = "ORDER" "BY" window_order_by_item
                         ("," window_order_by_item)* ;
window_order_by_item = (order_by_similarity_bare | column) ["ASC" | "DESC"] ;

(* FROM clause — alias with AS or bare (non-reserved identifier) *)
from_clause       = identifier [["AS"] identifier] ;

(* JOIN clause (v2.0) *)
join_clause       = [join_type] "JOIN" identifier [["AS"] identifier]
                    (on_clause | using_clause) ;
join_type         = "LEFT" ["OUTER"]
                  | "RIGHT" ["OUTER"]
                  | "FULL" ["OUTER"]
                  | "INNER" ;
on_clause         = "ON" column_ref "=" column_ref ;
using_clause      = "USING" "(" identifier ("," identifier)* ")" ;
column_ref        = identifier "." identifier ;

(* ═══════════════════════════════════════════════════════ *)
(* WHERE clause                                           *)
(* ═══════════════════════════════════════════════════════ *)

where_clause      = "WHERE" or_expr ;
or_expr           = and_expr ("OR" and_expr)* ;
and_expr          = primary_expr ("AND" primary_expr)* ;
primary_expr      = "(" or_expr ")"
                  | not_expr
                  | graph_match_expr
                  | similarity_expr
                  | vector_fused_search
                  | sparse_vector_search
                  | vector_search
                  | match_expr
                  | in_expr
                  | between_expr
                  | like_expr
                  | is_null_expr
                  | contains_text_expr
                  | contains_expr
                  | geo_distance_expr
                  | geo_bbox_expr
                  | compare_expr ;

not_expr          = "NOT" primary_expr ;
graph_match_expr  = "MATCH" graph_pattern ;

(* Vector operations *)
vector_search     = "vector" "NEAR" vector_expr ;
vector_fused_search = "vector" "NEAR_FUSED" vector_array [fusion_clause_inline] ;
sparse_vector_search = "vector" "SPARSE_NEAR" sparse_vector_expr ["USING" string] ;
similarity_expr   = "similarity" "(" similarity_field "," vector_expr ")"
                    compare_op numeric_threshold ;

vector_expr       = vector_literal | parameter ;
vector_literal    = "[" (float | integer) ("," (float | integer))* "]" ;
vector_array      = "[" vector_expr ("," vector_expr)* "]" ;
sparse_vector_expr = sparse_literal | parameter ;
sparse_literal    = "{" sparse_entry ("," sparse_entry)* "}" ;
sparse_entry      = integer ":" float ;

fusion_clause_inline = "USING" "FUSION" string ["(" fusion_param_list ")"] ;
fusion_param_list = fusion_param ("," fusion_param)* ;
fusion_param      = identifier "=" (float | integer) ;

(* Comparisons *)
compare_expr      = where_column compare_op value ;
compare_op        = ">=" | "<=" | "<>" | "!=" | "=" | ">" | "<" ;
where_column      = identifier ("." identifier)* ;

(* Special conditions *)
in_expr           = where_column ["NOT"] "IN" "(" value ("," value)* ")" ;
between_expr      = where_column "BETWEEN" value "AND" value ;
like_expr         = where_column ("ILIKE" | "LIKE") string ;
is_null_expr      = where_column "IS" ["NOT"] "NULL" ;
match_expr        = where_column "MATCH" string ;

(* Array / substring containment *)
contains_text_expr = where_column "CONTAINS_TEXT" string ;
contains_expr     = where_column "CONTAINS" "ALL" "(" value ("," value)* ")"
                  | where_column "CONTAINS" "ANY" "(" value ("," value)* ")"
                  | where_column "CONTAINS" value ;

(* Geospatial predicates *)
geo_number        = float | integer ;
geo_distance_expr = "GEO_DISTANCE" "(" column "," geo_number "," geo_number ")"
                    compare_op geo_number ;
geo_bbox_expr     = "GEO_BBOX" "(" column "," geo_number "," geo_number ","
                    geo_number "," geo_number ")" ;

(* ═══════════════════════════════════════════════════════ *)
(* GROUP BY, HAVING (v2.0)                                *)
(* ═══════════════════════════════════════════════════════ *)

group_by_clause   = "GROUP" "BY" column ("," column)* ;

having_clause     = "HAVING" having_condition (("AND" | "OR") having_condition)* ;
having_condition  = aggregate_function compare_op value ;

aggregate_function = ("FIRST" | "COUNT" | "SUM" | "AVG" | "MIN" | "MAX")
                     "(" ("*" | "score" | column) ")" ;

(* ═══════════════════════════════════════════════════════ *)
(* ORDER BY (v2.0 + arithmetic v3.0)                      *)
(* ═══════════════════════════════════════════════════════ *)

order_by_clause   = "ORDER" "BY" order_by_item ("," order_by_item)* ;
order_by_item     = order_by_expr ["ASC" | "DESC"] ;
order_by_expr     = aggregate_function
                  | property_access
                  | order_by_arithmetic ;

order_by_arithmetic = oa_additive ;
oa_additive       = oa_multiplicative (("+" | "-") oa_multiplicative)* ;
oa_multiplicative = oa_atom (("*" | "/") oa_atom)* ;
oa_atom           = float | integer
                  | order_by_similarity
                  | order_by_similarity_bare
                  | "(" oa_additive ")"
                  | identifier ;

order_by_similarity      = "similarity" "(" similarity_field "," vector_expr ")" ;
order_by_similarity_bare = "similarity" "(" ")" ;

similarity_field  = (letter | "_") (letter | digit | "_" | ".")* ;

(* ═══════════════════════════════════════════════════════ *)
(* LIMIT, OFFSET, WITH, USING FUSION                      *)
(* ═══════════════════════════════════════════════════════ *)

limit_clause      = "LIMIT" integer ;
offset_clause     = "OFFSET" integer ;

with_clause       = "WITH" "(" with_option ("," with_option)* ")" ;
with_option       = identifier "=" with_value ;
with_value        = string | float | integer | boolean | identifier ;

using_fusion_clause = "USING" "FUSION" ["(" fusion_option_list ")"] ;
fusion_option_list = fusion_option ("," fusion_option)* ;
fusion_option     = identifier "=" (string | float | integer) ;

(* ═══════════════════════════════════════════════════════ *)
(* INSERT statement (v3.2)                                *)
(* ═══════════════════════════════════════════════════════ *)

insert_stmt       = "INSERT" "INTO" identifier
                    "(" identifier ("," identifier)* ")"
                    "VALUES" values_row ("," values_row)* ;
values_row        = "(" value ("," value)* ")" ;

(* ═══════════════════════════════════════════════════════ *)
(* UPDATE statement (v3.2)                                *)
(* ═══════════════════════════════════════════════════════ *)

update_stmt       = "UPDATE" identifier "SET"
                    assignment ("," assignment)*
                    [where_clause] ;
assignment        = identifier "=" value ;

(* ═══════════════════════════════════════════════════════ *)
(* DELETE statement (v3.3)                                *)
(* ═══════════════════════════════════════════════════════ *)

delete_stmt       = "DELETE" "FROM" identifier where_clause ;

(* ═══════════════════════════════════════════════════════ *)
(* Graph mutation statements (v3.3)                       *)
(* ═══════════════════════════════════════════════════════ *)

insert_edge_stmt  = "INSERT" "EDGE" "INTO" identifier
                    "(" edge_field ("," edge_field)* ")"
                    ["WITH" "PROPERTIES" "(" create_option_list ")"] ;
edge_field        = identifier "=" value ;

delete_edge_stmt  = "DELETE" "EDGE" value "FROM" identifier ;

(* ═══════════════════════════════════════════════════════ *)
(* DDL: CREATE / DROP COLLECTION, INDEX (v3.3 / v3.5)     *)
(* ═══════════════════════════════════════════════════════ *)

create_index_stmt = "CREATE" "INDEX" "ON" identifier "(" identifier ")" ;
drop_index_stmt   = "DROP" "INDEX" "ON" identifier "(" identifier ")" ;

create_collection_stmt = "CREATE" ["GRAPH" | "METADATA"] "COLLECTION" identifier
                         [create_body] ;
create_body       = "(" create_option_list ")" [create_suffix] ;
create_option_list = create_option ("," create_option)* ;
create_option     = identifier "=" create_option_value ;
create_option_value = string | float | integer | boolean | identifier ;

create_suffix     = "SCHEMALESS"
                  | "WITH" "SCHEMA" "(" schema_def_list ")"
                  | with_clause ;
schema_def_list   = schema_def ("," schema_def)* ;
schema_def        = node_type_def | edge_type_def ;
node_type_def     = "NODE" identifier "(" property_def_list ")" ;
edge_type_def     = "EDGE" identifier "FROM" identifier "TO" identifier ;
property_def_list = property_def ("," property_def)* ;
property_def      = identifier ":" type_name ;
type_name         = "STRING" | "INTEGER" | "FLOAT" | "BOOLEAN" | "VECTOR" ;

drop_collection_stmt = "DROP" "COLLECTION" ["IF" "EXISTS"] identifier ;

(* ═══════════════════════════════════════════════════════ *)
(* TRAIN QUANTIZER (v2.2)                                 *)
(* ═══════════════════════════════════════════════════════ *)

train_stmt        = "TRAIN" "QUANTIZER" "ON" identifier with_clause ;

(* ═══════════════════════════════════════════════════════ *)
(* Admin statements (v3.5-3.6)                            *)
(* ═══════════════════════════════════════════════════════ *)

analyze_stmt      = "ANALYZE" ["COLLECTION"] identifier ;
truncate_stmt     = "TRUNCATE" ["COLLECTION"] identifier ;
alter_collection_stmt = "ALTER" "COLLECTION" identifier "SET"
                        "(" create_option_list ")" ;
flush_stmt        = "FLUSH" ["FULL"] [identifier] ;

(* ═══════════════════════════════════════════════════════ *)
(* UPSERT (v3.5)                                          *)
(* ═══════════════════════════════════════════════════════ *)

upsert_stmt       = "UPSERT" "INTO" identifier
                    "(" identifier ("," identifier)* ")"
                    "VALUES" values_row ("," values_row)* ;

(* ═══════════════════════════════════════════════════════ *)
(* Graph edge/node queries (v3.5)                         *)
(* ═══════════════════════════════════════════════════════ *)

select_edges_stmt = "SELECT" "EDGES" "FROM" identifier
                    [where_clause] [limit_clause] ;
insert_node_stmt  = "INSERT" "NODE" "INTO" identifier
                    "(" edge_field ("," edge_field)* ")" ;

(* ═══════════════════════════════════════════════════════ *)
(* Subqueries (v3.2)                                      *)
(* ═══════════════════════════════════════════════════════ *)

subquery_expr     = "(" "SELECT" select_list "FROM" identifier
                    [where_clause] [group_by_clause] [having_clause]
                    [limit_clause] ")" ;

(* ═══════════════════════════════════════════════════════ *)
(* Values and literals                                    *)
(* ═══════════════════════════════════════════════════════ *)

value             = subquery_expr | temporal_expr | float | integer
                  | string | boolean | null | parameter ;

temporal_expr     = temporal_arithmetic | now_function | interval_expr ;
temporal_arithmetic = (now_function | interval_expr) ("+" | "-")
                      (now_function | interval_expr) ;
now_function      = "NOW" "(" ")" ;
interval_expr     = "INTERVAL" string ;

string            = "'" (char | "''")* "'" ;
integer           = ["-"] digit+ ;
float             = ["-"] digit+ "." digit+ ;
boolean           = "TRUE" | "FALSE" ;
null              = "NULL" ;
parameter         = "$" identifier ;

identifier        = quoted_identifier | regular_identifier ;
regular_identifier = (letter | "_") (letter | digit | "_")* ;
quoted_identifier = "`" backtick_inner "`" | '"' doublequote_inner '"' ;
backtick_inner    = (!"`" ANY)+ ;
doublequote_inner = (doublequote_escape | !('"') ANY)* ;
doublequote_escape = '""' ;

(* ═══════════════════════════════════════════════════════ *)
(* Whitespace and comments                                *)
(* ═══════════════════════════════════════════════════════ *)

WHITESPACE        = " " | "\t" | "\r" | "\n" ;
COMMENT           = "--" (!"\n" ANY)* ;

License

VelesDB Core License 1.0