README.md
May 3, 2026 · View on GitHub
EMBEDDED KV WITH JSON QUERY AND SEMANTIC VECTOR SEARCH!
A Go embedded KV database with dot-notation JSON queries, inline vector embeddings, cosine top-K search, and AOF persistence
Table of Contents
Use Cases
ToriiDB is a four-in-one embedded database — a single Go binary providing:
- KV cache — Redis-style commands and data model
- Document DB — MongoDB-style JSON field queries and infix expressions
- Vector DB — OpenAI embeddings inlined on each key, cosine top-K semantic retrieval
- Local persistence — AOF append log + JSON snapshots, with an in-memory parsed cache
No extra service, no secondary index engine, no separate vector store; embeddings live inline on each key and flow through the same AOF and compaction paths as the KV values. Aimed at Go projects that want to replace a Redis + MongoDB + Pinecone stack with a single import, not run one behind a network.
Best suited for single-process, single-host embedded scenarios with up to ~10k entries per database:
- CLI tools — local config and state storage
- Prototypes / MVPs — skip the DB setup and embed directly
- Desktop apps and IoT devices — embedded storage, cross-compile friendly
- LINE bots and Discord bots — conversation state and user data
- AI personal assistants — memory, chat history, preferences, context cache, semantic recall
- Small-scale RAG and semantic search — documents, FAQs, notes with inline embeddings
- Single-host API servers — sessions, tokens, config, and other lightweight storage
- Personal blogs and small CMS backends — up to a few thousand posts or users
- Small-to-mid projects that don't warrant Redis / MongoDB / SQLite / a dedicated vector DB
Not suitable for: high-concurrency online services, datasets larger than memory, cross-machine access, multi-process writers, heavy queries that require index acceleration, or vector corpora beyond ~10k entries where linear scan becomes a bottleneck.
A MongoDB export utility is planned; the data model maps across natively.
Features
go get github.com/agenvoy/toriidb· Documentation
Redis × Mongo × Vector Hybrid DX
Redis-style KV commands, MongoDB-style field predicates, and semantic vector search behind a single command surface — cache, document lookup, and similarity retrieval in one API.
Inline Per-Key Vector Embeddings
SET ... VECTOR attaches an OpenAI embedding directly to the key — no separate index, no sidecar store; the vector rides alongside the value through AOF and compaction.
Content-Addressed Embedding Cache
Embeddings are cached under __torii:embed:<sha256(model|dim|text)> and transparently reused — identical texts never hit the OpenAI API twice, even across restarts.
Top-K Cosine Search
VSEARCH runs a linear scan with a min-heap, filters by glob MATCH, honours LIMIT, skips expired and dimension-mismatched entries, and reuses the cached query embedding.
16 Databases with Lazy Replay
Redis-compatible DB 0-15 namespaces, each with its own memory and AOF; replay only happens on first access, keeping startup free.
Dual Persistence
AOF appends every write in order while each key also lands as a JSON snapshot under an MD5 three-level directory for external tooling.
In-Place JSON Field Ops
GET / SET / DEL / INCR operate on nested fields via dot-notation without read-modify-write round trips.
Infix Query Expression
QUERY supports AND / OR / NOT with parentheses, exposed both as composable structs and as string expressions for untrusted input.
Auto-Chunked Parallel Scan
FIND / QUERY shards scans past 1024 entries across goroutines; complex predicates over 10k entries benchmark at ~3ms on Apple M5.
Split-Lock Parsed Cache
JSON writes warm a parsed cache; reads take only an RLock and hit the cache, eliminating repeated json.Unmarshal on hot paths.
Independent Sessions
A single Store spawns Sessions that carry their own db index so concurrent goroutines can switch databases without interfering.
Size-Triggered Compaction
AOF auto-compacts inline when it doubles its baseline; Close() drains pending async embeds via sync.WaitGroup then compacts all active DBs in parallel.
Reserved Internal Namespace
Scan commands (KEYS / FIND / QUERY / VSEARCH) skip the __torii:* prefix so internal keys like the embedding cache never leak to users.
Automatic Type Detection
Values are classified as JSON / String / Int / Float / Bool / Date on write and queryable via TYPE.
Architecture
graph TB
Client[REPL / Embed API] --> Exec[Exec Router]
Exec --> Core[core - db index + embedder]
Core --> Store[Store - 16 DBs]
Core --> Filter[Filter Engine]
Core --> Vector[Vector Engine]
Store --> Memory[In-Memory Map]
Store --> AOF[AOF Append Log]
Store --> Snapshot[MD5 JSON Snapshot]
Filter --> Scan[Chunked Parallel Scan]
Vector --> TopK[Top-K Cosine Scan]
Vector --> EmbedCache[Embedding Cache]
Vector --> OpenAI[OpenAI Embedding Client]
Scan --> Memory
TopK --> Memory
File Structure
ToriiDB/
├── cmd/
│ └── test/
│ └── main.go # REPL entry point
├── core/
│ ├── openai/ # text-embedding-3-small client (singleton)
│ ├── store/ # storage engine and command impls
│ │ ├── vector.go # float32 codec + cosine + isInternal
│ │ ├── vcache.go # __torii:embed:* cache
│ │ ├── vsearch.go # top-K min-heap cosine scan
│ │ ├── vsim.go # VSim + VGet
│ │ └── filter/ # query expression and operators
│ └── utils/ # shared helpers
├── .env # OPENAI_API_KEY (optional, for vector features)
├── go.mod
├── Makefile
└── README.md
Version History
| Version | Date | Highlights |
|---|---|---|
| v0.5.0 | 2026-04-18 | Semantic vector search — SET ... VECTOR inline embeddings, VSEARCH top-K cosine, VSIM / VGET, content-hash embedding cache under __torii:embed:*, AOF vector persistence |
| v0.4.4 | 2026-04-16 | Parsed JSON cache in Entry to eliminate repeated Unmarshal on hot paths; split-lock read/write APIs |
| v0.4.3 | 2026-04-10 | AOF compaction switched from line count to byte size with 1MB floor |
| v0.4.2 | 2026-04-10 | Inline AOF compaction triggered when inflation ratio exceeds 2x |
| v0.4.1 | 2026-04-10 | Shared core struct enabling Session, lazy DB replay, parallel compaction, custom storage path |
| v0.4.0 | 2026-04-09 | NE operator, dedicated filter package with infix expression parser (AND/OR/NOT), concurrent slice scan |
| v0.3.0 | 2026-04-08 | FIND / QUERY commands with EQ/GT/GE/LT/LE/LIKE operators, LIMIT clause, time-based sorting |
| v0.2.0 | 2026-04-08 | Document field ops (GetField/SetField/DelField), KEYS glob matching, INCR with dot-notation nested fields |
| v0.1.0 | 2026-04-08 | Initial release — REPL, KV commands (SET/GET/DEL/EXIST/TYPE), AOF persistence, TTL, 16 databases (DB 0-15) |
License
This project is licensed under the MIT LICENSE.
Author
邱敬幃 Pardn Chiu
Stars
©️ 2026 邱敬幃 Pardn Chiu