Architecture
June 10, 2026 · View on GitHub
A bird's-eye view of the system, with pointers into the code.
Layered diagram
┌─────────────────────────────────────────────────────────┐
│ REPL (src/main.rs, src/repl/) │
│ rustyline editor, prompt, history, input validation │
└───────────────────────────┬─────────────────────────────┘
│ raw string lines
┌───────────────────┴─────────────┐
│ │
▼ ▼
┌──────────────────┐ ┌────────────────────┐
│ Meta dispatch │ │ SQL dispatch │
│ src/meta_command │ │ src/sql/mod.rs │
│ .exit/.open/.save│ │ .tables at fn top │
└────────┬─────────┘ └──────┬─────────────┘
│ mutates Database │
│ via pager::{open,save} │ parses with sqlparser
│ │ routes by Statement kind
│ ▼
│ ┌───────────────────────┐
│ │ Parser layer │
│ │ src/sql/parser/ │
│ │ create / insert / │
│ │ select │
│ └──────┬────────────────┘
│ │ clean query structs
│ ▼
│ ┌───────────────────────┐
│ │ Executor │
│ │ src/sql/executor.rs │
│ │ eval_expr, exec_* │
│ └──────┬────────────────┘
│ │ reads + mutates
│ ▼
└─────────────────────┬─────────────────────────┐
▼ │
┌───────────────────────────┐ │
│ In-memory data model │ │
│ src/sql/db/ │ │
│ database.rs │ │
│ table.rs │ │
└──────────┬────────────────┘ │
│ after write statements ────┘
│ auto-save triggers
▼
┌───────────────────────────┐
│ Pager + file format │
│ src/sql/pager/ │
│ mod.rs (high-level) │
│ pager.rs (cache+diff) │
│ file.rs (raw I/O) │
│ page.rs, header.rs │
└──────────┬────────────────┘
│ one .sqlrite file
▼
┌───────────────────────────┐
│ Disk │
│ 4 KiB pages │
│ page 0 = header │
│ page 1+ = typed payload │
└───────────────────────────┘
Workspace layout
The repo is a Cargo workspace. The engine is the root crate; everything else lives in a sibling directory.
| Crate / directory | Role |
|---|---|
Root (./) — sqlrite-engine on crates.io, use sqlrite::… in code | The engine. Library + the REPL [[bin]]. Library surface: Connection, Statement, Rows, Value, Database. |
sqlrite-ffi/ | C FFI shim. Builds libsqlrite_c.{so,dylib,dll} + cbindgen-generated sqlrite.h. Used by the Go SDK + by anyone wanting to dlopen SQLRite from another language. Phase 5b. |
sqlrite-ask/ | Pure-Rust LLM transport adapter (Anthropic-first; OpenAI / Ollama follow-ups pending). Takes a &str schema dump + &str question, returns generated SQL. No engine dep — the engine integration lives in sqlrite-engine's ask feature. Phase 7g.1 + the 7g.2 dep-direction flip. |
sqlrite-mcp/ | Model Context Protocol server binary. Hand-rolled JSON-RPC 2.0 over stdio. Seven tools (list_tables, describe_table, query, execute, schema_dump, vector_search, ask). Phase 7h + 7g.8. See mcp.md. |
sdk/python/ | PyO3 bindings — sqlrite on PyPI. Phase 5c. |
sdk/nodejs/ | napi-rs bindings — @joaoh82/sqlrite on npm. Phase 5d. |
sdk/go/ | cgo wrapper over sqlrite-ffi. database/sql driver. Phase 5e. |
sdk/wasm/ | wasm-bindgen build — @joaoh82/sqlrite-wasm on npm. Phase 5g. (Not a workspace member — wasm32 target only.) |
desktop/src-tauri/ | Tauri 2.0 + Svelte 5 desktop app. Embeds the engine directly. Phase 2.5. |
The engine never depends on the SDK crates; the SDK crates each depend on the engine via path-dep. sqlrite-mcp depends on the engine (default-features = false) plus its own optional ask feature that re-enables the engine's ask feature, which pulls sqlrite-ask transitively. The whole graph is acyclic — see the 7g.2 dep-direction flip retrospective in roadmap.md for the work that made it so.
Module map (engine)
| Module | What it owns |
|---|---|
src/main.rs | Binary entry: init env_logger, build rustyline editor, run the REPL loop, route input to either the meta or SQL dispatcher |
src/lib.rs | Library entry: re-exports Connection, Statement, Rows, Value, Database, process_command / process_command_with_render / CommandOutput, the ask module (when feature on), etc. — the stable public surface every SDK binds against |
src/connection.rs | Connection / Statement / Rows / Row / OwnedRow / FromValue — the Phase 5a public API. SQLR-23 added the per-connection prepare_cached LRU + the bound Statement::query_with_params / execute_with_params (parsed AST cached on the Statement; ? placeholders substituted at execute time without re-running sqlparser). |
src/sql/params.rs | SQLR-23 — placeholder rewriter (? → ?N source-order numbering at prepare time) and AST substitution pass that lowers &[Value] into the same in-band literal shapes the executor already recognizes (scalars become Expr::Value(...), vectors become bracket-array Expr::Identifier). Used by Statement::{query,execute}_with_params. |
src/ask/ | Engine integration with sqlrite-ask: ConnectionAskExt, ask_with_database, the schema::dump_schema_for_database helper. The schema submodule is always available; the rest is gated behind the ask feature. Phase 7g.2. |
src/repl/ | REPLHelper (implements rustyline's Helper trait: completer, hinter, highlighter, validator). Also get_config and get_command_type |
src/meta_command/ | MetaCommand enum, parsing (.open FOO.db → Open(PathBuf), .ask <Q> → Ask(String)), and dispatch to persistence + ask helpers |
src/error.rs | SQLRiteError (thiserror-derived), Result<T> alias, hand-rolled PartialEq that handles io::Error |
src/sql/mod.rs | SQLCommand classifier, process_command / process_command_with_render — the top-level entries that parse a SQL string and route to the right executor. SQLR-23 added process_ast_with_render(stmt, db) for callers (the Statement API) that already hold a parsed AST and want to skip the sqlparser walk. Also triggers auto-save. Never writes to stdout — for SELECT statements, the rendered prettytable comes back inside CommandOutput.rendered so the REPL can print it (the engine itself doesn't); the SDK / FFI / MCP callers ignore it. |
src/sql/parser/ | Takes a sqlparser::ast::Statement and produces internal query structs (CreateQuery, InsertQuery, SelectQuery) with only the fields we actually use |
src/sql/executor.rs | execute_select, execute_delete, execute_update, plus the shared expression evaluator eval_expr / eval_predicate. Also the bounded-heap top-k optimization (Phase 7c), the HNSW probe shortcut (Phase 7d.2), and the FTS probe shortcut (Phase 8b). |
src/sql/db/database.rs | Database: table map + optional source_path + optional long-lived Pager + transaction-snapshot state |
src/sql/db/table.rs | Table, Column, Row, Value (in-memory storage incl. VECTOR + JSON columns); helpers for row iteration (rowids, get_value, set_value, delete_row, insert_row) |
src/sql/hnsw.rs | Standalone HNSW algorithm — insert / search / layer assignment / beam search. Phase 7d.1. |
src/sql/fts/ | Full-text search — standalone tokenizer, BM25 scorer, and in-memory PostingList inverted index. Wired into the executor via the fts_match / bm25_score scalar functions and the try_fts_probe optimizer hook. Phase 8a-8b; persistence in 8c. See docs/fts.md. |
src/sql/json.rs | JSON column type + path-extraction functions (json_extract, json_type, json_array_length, json_object_keys). Phase 7e. |
src/sql/pragma.rs | PRAGMA dispatcher (SQLR-13). try_parse_pragma peeks at the SQL token stream before sqlparser sees it and routes any PRAGMA … shape to execute_pragma. First pragma wired up: auto_vacuum (read + set, with OFF / NONE to disable). Add new pragmas as a single arm in execute_pragma. |
src/sql/pager/ | On-disk file format and I/O — see file-format.md and pager.md for details. WAL + checkpointer + shared/exclusive lock modes (Phase 4a-4e) live here. |
Flow of a SQL statement
Take UPDATE users SET age = age + 1 WHERE name = 'bob';:
- REPL reads a line,
repl::get_command_typesees it doesn't start with., so it's aSQLCommand. process_command(src/sql/mod.rs) askssqlparserto parse the string into aStatement. It seesStatement::Update(_).- Before dispatching, it records
is_write_statement = trueso auto-save runs later. - It calls
executor::execute_update(src/sql/executor.rs). - The executor destructures
Update { table, assignments, selection, .. }, validates that the assignment targets exist on the table, then enters two passes:- Read pass: walk every rowid in the table, evaluate
selection(theWHERE), evaluate the RHS of each assignment expression under the matched row's context, collect(rowid, [(col, new_value)])tuples. - Write pass: take
&muton the table and callset_value(col, rowid, new_value)for each planned write.
- Read pass: walk every rowid in the table, evaluate
set_valueenforces the declared column type and theUNIQUEconstraint before touching storage, updates theBTreeMaprow storage, and refreshes any index.- Control returns to
process_command. Sinceis_write_statementis true anddb.source_pathisSome, it callspager::save_database(db, path). save_databasetakes the long-livedPageroff the Database, re-serializes every table withbincode, stages the resulting pages into the pager, and commits. Commit diffs staged bytes against the pager'son_disksnapshot and only writes pages whose bytes actually changed.
Steps 1–7 are purely in-memory; step 8 is the only disk contact, and after the first write it's sub-full-file.
What lives where — by concern
- Parsing:
src/sql/parser/+ upstreamsqlparsercrate. Converts SQL strings → ASTs → simplified internal structs. - Planning: intentionally not a thing yet. Execution is direct — a query plan is implicit in the executor code path.
- Execution:
src/sql/executor.rswalks the internal structs, drives reads againstTable, and writes viaTable::set_value/insert_row/delete_row. - Storage (in memory):
src/sql/db/table.rs— column-orientedBTreeMap<rowid, value>per column; indexes as separateBTreeMaps on UNIQUE/PK columns. - Storage (on disk):
src/sql/pager/— 4 KiB pages, real B-Tree per table (Phase 3d), secondary indexes (3e), HNSW indexes as their own page tree (7d.3), FTS posting lists as their own page tree (8c, on-demand v5 file format), WAL + crash-safe checkpointer (4c-4d), shared/exclusive lock modes (4e). - Persistence policy:
src/sql/mod.rs::process_commandfor when to auto-save;src/sql/pager/mod.rs::save_databasefor how. Inside aBEGIN/COMMITblock, auto-save is suppressed and changes accumulate against an in-memory snapshot —COMMITflushes the whole batch in one WAL frame;ROLLBACKrestores the snapshot. - Error handling:
src/error.rsdefines a singleSQLRiteErrorenum used throughout, with#[from]conversions fromParserErrorandio::Error.
What's deliberately missing
The roadmap has shipped far enough that the original "deliberately missing" list mostly turned into shipped features. What's still left:
- No query optimizer beyond the bounded-heap top-k pass for KNN (Phase 7c) and the HNSW probe shortcut (7d.2). Equality-on-PK probes are direct; everything else is a table scan. Joins use plain nested-loop (O(N×M) per join level); hash / merge joins on equi-join shapes are a future increment.
- No network layer. SQLRite is embedded-only. The closest thing is the
sqlrite-mcpserver, which is stdio (not network). A real wire protocol isn't on the roadmap. - No streaming row cursor.
Rowsis currently backed by an eagerVec(Phase 5a). TheRows::nextAPI is shaped to support a real cursor — the swap is deferred to 5a.2.
Everything else from the original "deliberately missing" list (transactions, file locking, concurrency, embedding API, FFI, language SDKs, WASM, AI extensions) has shipped. See roadmap.md for the full ledger.