Python API

May 6, 2026 · View on GitHub

Installation

RUSTFLAGS="-C target-cpu=native" maturin develop --release

import tachiom  # exposes: tachiom.Tachiom, tachiom.Tac

Input format

All .npy inputs use C-contiguous (row-major) layout.

File	Shape	Dtype	Description
`vectors.npy`	`[N, dim]`	`f16`	One row per token across all documents
`token_ids.npy`	`[N]`	`i64` or `u32`	Vocabulary id of each token
`doclens.npy`	`[n_docs]`	`i32` or `i64`	Number of tokens per document

Tokens must be concatenated in document order: the first doclens[0] rows in vectors.npy belong to document 0, the next doclens[1] to document 1, and so on.

`Tachiom` — IVF-PQ index

Building

Full pipeline (TAC + PQ + HNSW)

index = tachiom.Tachiom.build(
    vectors_path,
    token_ids_path,
    doclens_path,
    total_centroids=4_194_304,   # coarse centroid budget
    tac_n_iter=10,               # k-means iterations inside TAC
    pq_sample_size=10_000_000,   # training vectors for the PQ encoder
    pq_n_iter=10,                # PQ k-means iterations
    normalize=True,             # L2-normalise residuals before PQ encoding
    pq_seed=42,
    hnsw_m=32,                   # HNSW neighbour count
    ef_construction=1500,        # HNSW build-time beam width
    pq_subspaces=32,             # PQ subspace count (only 32 supported)
)

From pre-computed TAC output

If you have already run TAC (e.g. to inspect centroids or tune the centroid budget separately), skip the clustering step:

index = tachiom.Tachiom.build_from_tac(
    vectors_path,
    token_ids_path,
    doclens_path,
    centroids_path,    # [K, dim] f32 .npy
    assignments_path,  # [N]      u32 .npy
    pq_sample_size=10_000_000,
    pq_n_iter=10,
    normalize=True,
    pq_seed=42,
    hnsw_m=32,
    ef_construction=1500,
    pq_subspaces=32,
)

Saving and loading

index.save("index.bin")
index = tachiom.Tachiom.load("index.bin")

Searching

Single query

# query: [n_tokens, dim] f32 C-contiguous array
scores, doc_ids = index.search(
    query,
    k=10,
    k_centroids=20,       # coarse centroids retrieved per query token
    k_docs_to_score=500,  # candidates passed to PQ reranking
    ef_search=30,         # HNSW beam width during coarse scoring
    alpha=0.45,           # fraction of k-th coarse score used as candidates pruning threshold
    beta=None,            # stop PQ reranking after this many candidates scored
    lambda_=None,         # distance-adaptive HNSW early-exit factor
)
# scores:   [k] f32   (−∞ sentinel for unfilled positions)
# doc_ids:  [k] u32   (u32::MAX sentinel for unfilled positions)

Batch search

# queries: [n_queries, n_tokens, dim] f32 C-contiguous array
scores, doc_ids = index.batch_search(
    queries,
    k=10,
    num_threads=0,        # 0 = all cores, 1 = serial, n = custom pool
    k_centroids=20,
    k_docs_to_score=500,
    ef_search=30,
    alpha=0.45,
    beta=None,
    lambda_=None,
)
# scores:   [n_queries, k] f32
# doc_ids:  [n_queries, k] u32

Search parameters

Search runs in two phases: Gather (HNSW traversal over TAC centroids) then Refine (PQ reranking of surviving candidates).

Parameter	Default	Phase	Description
`k_centroids`	`20`	Gather	Coarse centroids retrieved per query token via HNSW. Higher values increase recall and latency.
`ef_search`	`30`	Gather	HNSW beam width. Increase together with `k_centroids` for deeper search.
`alpha`	`0.45`	Gather→Refine	After accumulating coarse scores, only documents scoring above `alpha × score_k` are forwarded to Refine. Lower values prune more aggressively. Set to `None` to disable.
`k_docs_to_score`	`500`	Refine	Maximum candidates passed to PQ reranking (cap applied after alpha-pruning).
`beta`	`None`	Refine	Early-exit threshold: stop PQ reranking after `beta` candidates have been scored. Set to `None` to score all `k_docs_to_score` candidates.
`lambda_`	`None`	Gather	Distance-adaptive HNSW termination factor. Set to `None` to disable.

Inspection

index.len          # number of indexed documents
index.dim          # token-vector dimensionality
index.n_tokens     # total tokens across all documents
index.n_centroids  # number of coarse centroids
index.print_space_usage()  # per-component size in GB

`Tac` — Token-Aware Clustering

Tac runs a separate k-means per token type and distributes a total centroid budget proportionally across groups. Use it when you want to inspect or reuse the clustering step independently of the full index build.

Training

tac = tachiom.Tac(
    n_centroids=2_000_000,  # total centroid budget
    n_iter=10,              # k-means iterations per token group
    verbose=True,
    max_sample_size=None,   # None = auto (cap at ~1M per group)
)
tac.train("vectors.npy", "token_ids.npy")

Inspecting results

tac.n_centroids        # actual centroids produced (may be < budget)
tac.dim                # dimensionality
tac.centroids          # [K, dim] f32
tac.centroids_f16      # [K, dim] f16
tac.assignments        # [N]      u32 — centroid id for each token

Saving and feeding into `Tachiom`

import numpy as np

np.save("centroids.npy",   tac.centroids)
np.save("assignments.npy", tac.assignments)

index = tachiom.Tachiom.build_from_tac(
    "vectors.npy", "token_ids.npy", "doclens.npy",
    "centroids.npy", "assignments.npy",
)

End-to-end example

See notebooks/tachiom_demo.ipynb for a complete walkthrough on the LOTTE dataset (2.4 M documents, 266 M tokens, dim=128, 2 M centroids, ~12.8 GB index, ~0.45 ms/query).

See notebooks/tac_demo.ipynb for TAC centroid budget analysis and saving TAC output for later reuse.