README.md

June 21, 2026 · View on GitHub

GitHub Header Image

Content License Stars

With great retrieval comes great legislation
— Uncle Ben (probably)

Getting StartedOverviewRepository LayoutLicense

Warning

Canary is still under heavy development.

Canary is an AI legal assistant for Spanish law (for now). It parses BOE (Boletín Oficial del Estado) documents into navigable temporal fragments and indexes them with poly-vector matryoshka embeddings for legal retrieval.

The point is not just to find matching text. Canary keeps the shape of each document, the links between norms, and the valid version of the law inside the retrieval model itself. That makes it possible to answer with the right article, the right surrounding context, and the right legal version for the question at hand.

It is built for legislators, lawyers, public sector practitioners, law and political science students, and anyone else who needs something better than keyword search over legal text.

Overview

Canary treats law as structured material.

It parses BOE documents into legal fragments such as libros, títulos, capítulos, secciones, artículos, and párrafos. Those fragments keep their place in the hierarchy, so the system can move up, down, and sideways through the document instead of treating every paragraph as an isolated chunk.

It also keeps temporal validity in the model. If a provision changes over time, the system can distinguish what is currently in force from what used to be in force, and can retrieve the version that actually matters for a question.

The same idea applies to legal references. Modifications, repeals, interpretations, and related citations are part of the retrieval path. If one norm changes another, Canary can follow that relationship instead of pretending the documents are unrelated.

Retrieval model

Retrieval happens in stages.

First, Canary uses smaller scout vectors to search broadly and cheaply. Then it uses larger full vectors to rerank candidates with more precision. After that, it rebuilds context from the document hierarchy and follows relevant legal references when the answer depends on neighboring provisions or linked norms.

That combination matters. A legal question often depends on more than the first matching paragraph. It may depend on the article above it, the section around it, the norm that modified it, or the version that was valid on a particular date.

Canary is built around that reality.

Document model

BOE documents are represented as navigable legal hierarchies.

Each fragment carries:

  • its structural position in the document
  • its legal citation identity
  • its temporal validity
  • its links to related legal material

That gives retrieval a real document model to work with instead of a flat pile of embeddings.

Tech stack

The core of Canary is written in Rust. That includes the parser, the server, the file workflows, the database integration, and the supporting runtime components around them. The HTTP layer runs on Axum, operational state lives in SurrealDB, and orchestration uses Temporal.

The application layer around that core is written in TypeScript. The web UI, the TUI, the docs site, and the shared frontend tooling live in the TypeScript workspace and run on Bun.

Repository layout

Rust crates

  • crates/parser (document-hierarchy) parses BOE documents into the navigable document tree and fragment model
  • crates/server (canary-server) exposes the HTTP API, file flows, and runtime application wiring
  • crates/database owns SurrealDB runtime access and the Surrealkit schema project under crates/database/database/
  • crates/public-id provides typed public identifiers for API resources

TypeScript apps and packages

  • apps/web is the main web client
  • apps/tui is the terminal UI
  • apps/fumadocs is the documentation site
  • packages/env and packages/config hold shared TypeScript configuration

Getting started

Prerequisites

  • Rust toolchain
  • Mise
  • Bun
  • Docker for the local SurrealDB workflow

Install local tooling

mise i
bun install

Start the local database

just db-up
just db-sync

If you want seed data and schema-focused tests as well:

just db-seed
just db-test

Run the server

just server

At that point, SurrealDB is running locally, the schema is applied, and the Rust server is running against the current workspace code.

Run the frontend or docs

bun run dev:web
bun --cwd apps/fumadocs dev

Common commands

  • just help
  • just fmt
  • just check
  • just db-status
  • just db-down
  • cargo test --workspace

License

MIT