API Reference
May 11, 2026 ยท View on GitHub
This page groups the SQL surface by role instead of listing every function in implementation order.
For the full signature inventory, result types, operational helpers, and catalog-visible support functions, see Functions.
Canonical Exact Retrieval
Preferred exact BM25 APIs:
psql_bm25s_query_ids(regclass, int4[], k, weight_mask)psql_bm25s_query_tokens(regclass, text[], k, weight_mask)
These are the canonical exact BM25 interfaces and the main performance path.
For supported source-column types and their trade-offs, see Supported Input Types.
They are also the recommended exact retrieval surface for multicolumn fusion indexes. See Multi-Column Fusion Indexes.
Raw Query Retrieval
Convenience retrieval over text-backed indexes:
-
text[] -
varchar[] -
text -
varchar -
psql_bm25s_query(...)
The optional tail arguments expose explicit query-time normalization:
lowercasestopwordsstem_englishfold_diacritics
Prepared/index-bound query values:
psql_bm25s_prepared_query(...)psql_bm25s_ranked_query(...)psql_bm25s_fusion_weighted_query(...)psql_bm25s_fusion_field_query(...)psql_bm25s_fusion_weighted_queries(index_names[], query_text, weights, ...)psql_bm25s_fusion_field_queries(field_names[], index_names[], query_text, weights, ...)psql_bm25s_query_prepared(...)psql_bm25s_filter_query(ranked_query)psql_bm25s_fusion_query_weighted(weighted_queries[], k, candidate_k, weight_mask)psql_bm25s_fusion_query_fields(field_queries[], k, candidate_k, weight_mask)psql_bm25s_fusion_query(index_names[], query_text, weights, k, candidate_k, weight_mask, ...)psql_bm25s_fusion_query(field_names[], index_names[], query_text, weights, k, candidate_k, weight_mask, ...)psql_bm25s_field_aware_query_tokens(index_name, query_tokens, field_names, weights, k)psql_bm25s_field_aware_query(index_name, query_text, field_names, weights, k)psql_bm25s_order_tokens(prepared_query)psql_bm25s_order_tokens(ranked_query)psql_bm25s_order_tokens(index_name, query_text, ...)
These let SQL bind a parsed query configuration to an explicit index context without changing the canonical execution path.
For scalar text and varchar indexes, omitted query options in
index-bound raw-query helpers inherit the named index's text reloptions.
That includes psql_bm25s_prepared_query(index_name, ...),
psql_bm25s_query(...), and
psql_bm25s_query_prepared(...). This
makes prepared-query predicates and helper functions stable even when the
surrounding SQL does not use a real index scan.
psql_bm25s_ranked_query(...) packages:
- a prepared/index-bound query
- the matching
<=>order tokens - the intended top-k
- an optional
weight_mask
This is a convenience bundle for common filtered/ranked SQL shapes. It does not change ranking semantics and it does not introduce a second retrieval implementation.
For a full advanced example of weighted title/abstract/body search, see Multi-Field Search.
For one-index fused text[] retrieval across multiple indexed columns,
see Multi-Column Fusion Indexes.
For one-index field-aware retrieval over a multicolumn index
created with field_aware = true, generic direct APIs search all fields
with equal weight for token and simple raw term queries. Use
psql_bm25s_field_aware_query_tokens(...) or
psql_bm25s_field_aware_query(...) when a query needs custom
field weights or a field subset. The field-aware engine stores
field-scoped terms in one BM25 payload, but it does not implement BM25F
or per-field length normalization. See
Field-Aware Indexes.
For BM25/vector late fusion that keeps vector extensions optional, see Hybrid Vector/BM25 Search and Hybrid Fusion Engine.
For type-specific usage guidance, see Supported Input Types.
Recommended score-carrying SQL surface:
psql_bm25s_query(...)psql_bm25s_query_prepared(...)psql_bm25s_fusion(left_hits, left_weight, right_hits, right_weight, k)
These return psql_bm25s_result_hit, which carries:
ctiddoc_idscore
The intended pattern is to join ctid back to application rows when a
query needs both row data and the query-time score.
This is preferred over a hypothetical scalar score(id) API because:
- retrieval and scoring stay on the same exact top-k path
- scores remain attached to the current query execution
- SQL avoids row-by-row re-scoring outside the retrieval executor
- planner behavior stays easier to reason about
For limited multi-field or multi-index score fusion, use:
psql_bm25s_fusion(...)psql_bm25s_fusion_weighted_query(...)psql_bm25s_fusion_field_query(...)psql_bm25s_fusion_query_weighted(...)psql_bm25s_fusion_query_fields(...)
These helpers return SETOF psql_bm25s_result_hit. Fusion is post-retrieval
composition of query-scoped hit rows, not a replacement for the
underlying exact retrieval path.
psql_bm25s_fusion_query_weighted(...) is the more structured helper
for a small set of field- or index-specific prepared queries with
weights. It runs exact retrieval per query, then fuses the top-k result
sets by weighted score sum.
psql_bm25s_fusion_field_query(...) and
psql_bm25s_fusion_query_fields(...) make the field-labeled fusion contract
more explicit:
- each field has a stable field name
- each field still owns one weighted prepared query
- retrieval stays per field/index
- fusion still happens only after those top-k results exist
psql_bm25s_fusion_weighted_queries(...), psql_bm25s_fusion_field_queries(...),
and psql_bm25s_fusion_query(...) are the convenience layer for the
common case where multiple field indexes share the same query text and
normalization options.
For end-to-end examples, see Multi-Field Search and Multi-Column Fusion Indexes.
For the opt-in single-index field-aware engine, see Field-Aware Indexes.
Hybrid Vector/BM25 Fusion
Hybrid fusion accepts generic candidate rows and therefore does not require
pgvector, VectorChord, or any vector type at extension install time.
Types:
psql_bm25s_result_hybrid_candidatepsql_bm25s_result_hybrid_hit
Candidate constructors:
psql_bm25s_hybrid_candidate(source_name, ctid, raw_value, source_rank, weight, normalizer, direction)psql_bm25s_hybrid_bm25_candidate(source_name, ctid, score, source_rank, weight, normalizer)psql_bm25s_hybrid_vector_candidate(source_name, ctid, distance, source_rank, weight, normalizer)psql_bm25s_hybrid_bm25_candidates(source_name, index_name, query_text, weight, candidate_k, normalizer, ...)
Fusion:
psql_bm25s_hybrid_fuse_candidates(candidates, k, fusion, rrf_k, epsilon)
The public fusion function is C-backed and is the supported API surface.
See Hybrid Fusion Engine for the execution model, performance boundary, and validation coverage.
Supported fusion methods:
rrfscore
Supported score normalizers:
identitynegative_distanceinverse_distanceminmaxzscorerank
Use rrf as the default for mixed BM25/vector retrieval. Use score only
when the application has chosen and benchmarked a normalization strategy.
Operators
Document predicate:
tokens @@ 'query text'tokens @@@ psql_bm25s_prepared_query(...)tokens @@@ psql_bm25s_filter_query(psql_bm25s_ranked_query(...))column @@ 'query text'column @@@ psql_bm25s_prepared_query(...)
Ordered retrieval surface:
ORDER BY tokens <=> ... ASC LIMIT kORDER BY token_ids <=> ... ASC LIMIT kORDER BY tokens <=> psql_bm25s_order_tokens(psql_bm25s_prepared_query(...))ORDER BY tokens <=> psql_bm25s_order_tokens(psql_bm25s_ranked_query(...))ORDER BY tokens <=> psql_bm25s_order_tokens(index_name, query_text, ...)
Important:
@@is a boolean document-match predicate@@@is the prepared-query boolean predicate- if you want prepared-query filtering and SQL-native ranking together,
use
@@@pluspsql_bm25s_order_tokens(prepared_query) @@is useful for filtering, not for ranking- for scalar
textandvarchar, raw@@outside a real index scan does not discover hidden index reloptions on its own - for scalar
textandvarchar,@@@ psql_bm25s_prepared_query(...)inherits omitted query options from the named index reloptions <=>is only true BM25 ordering when PostgreSQL chooses a realpsql_bm25sindex scan- if you need the clearest exact BM25 contract regardless of planner
shape, prefer
psql_bm25s_query_tokens(...)orpsql_bm25s_query_ids(...)
Local Scalar / Token Helpers
These helpers operate on one provided document value at a time. They are useful for SQL composition, diagnostics, UI rendering, and explicit scalar text handling outside an index scan. They do not run index top-k retrieval and should not be used as the primary search path for large tables.
psql_bm25s_tokenize_text(text, ...)psql_bm25s_normalize_tokens(text[], ...)psql_bm25s_match_prepared_query(text[], psql_bm25s_result_prepared_query)psql_bm25s_match_prepared_query(text, psql_bm25s_result_prepared_query)psql_bm25s_match_prepared_query(varchar, psql_bm25s_result_prepared_query)psql_bm25s_match_query(text, index_name, query_text, ...)psql_bm25s_match_query(varchar, index_name, query_text, ...)psql_bm25s_score_prepared_query(text[], psql_bm25s_result_prepared_query)psql_bm25s_score_prepared_query(text, psql_bm25s_result_prepared_query)psql_bm25s_score_prepared_query(varchar, psql_bm25s_result_prepared_query)psql_bm25s_score_query(text, index_name, query_text, ...)psql_bm25s_score_query(varchar, index_name, query_text, ...)psql_bm25s_highlight(text[], text, ...)psql_bm25s_highlight(text, text, ...)psql_bm25s_highlight(varchar, text, ...)psql_bm25s_snippet(text[], text, ...)psql_bm25s_snippet(text, text, ...)psql_bm25s_snippet(varchar, text, ...)
Important:
- the local match and score helpers are document-local convenience helpers
- they use the existing local token match/score paths
- they do not access an index payload for top-k retrieval
- they are not the canonical exact BM25 retrieval contract
- wrappers that take
index_name, query_text, ...are thin convenience layers over prepared-query helpers, not a separate scoring model - when scalar text options are omitted,
index_nameoverloads resolve them from the named index reloptions and then apply the local helper path
Introspection and Maintenance
psql_bm25s_index_details(regclass)psql_bm25s_index_policy_recommend(regclass, profile text)psql_bm25s_index_refresh(regclass)psql_bm25s_index_maintain(regclass)psql_bm25s_index_try_maintain(regclass)psql_bm25s_index_maintain_due(max_indexes integer DEFAULT 1)psql_bm25s_generation_cache_clear()psql_bm25s_generation_cache_state(regclass)psql_bm25s_generation_cache_preload(regclass)
psql_bm25s_index_maintain(...) is intended for scheduled convergence of
pending or stale indexes. It no-ops when no maintenance is needed.
psql_bm25s_index_try_maintain(...) is the non-blocking scheduler primitive.
For query-first eventual indexes, it builds a replacement payload first and
takes only a short non-blocking swap lock. If the lock is busy, it returns a
retryable no-op result. Append-only delta records created while the replacement
was building are carried forward after the swap; non-tail-compatible concurrent
changes still return a retryable no-op result.
psql_bm25s_index_maintain_due(...) scans due eventual indexes owned by
the current role, then calls the same try-maintenance primitive. It is suitable
for pg_cron, systemd timers, or application schedulers. It prioritizes stale
indexes first, then larger record and byte maintenance debt.
psql_bm25s_index_details(...) is the structured inspection surface for index
metadata, persisted maintenance state, consistency mode, and active reloptions.
psql_bm25s_index_policy_recommend(...) returns a structured recommendation
for a workload profile. Use it as planning guidance; it does not change the
index.
psql_bm25s_generation_cache_state(...) reports the observable immutable
generation key and shared-cache state for one index, including DSM descriptor
state, optional shared-preload arena counters, and whether that specific index
is resident or currently loading in the shared-preload arena. Use it for
debugging shared generation reuse and invalidation, not in latency-sensitive
query paths.
The shared-preload counters intentionally separate background worker slots from
current phase: active_background_workers is the total worker-slot usage,
while active_preload_workers and active_index_maintenance_workers show
whether active workers are warming resident generations or rebuilding indexes.
psql_bm25s_generation_cache_clear() clears backend-local cache state and
best-effort volatile shared-generation descriptors, failure markers,
interrupted temp descriptors, old lock files, and shared-preload registry
entries. It does not modify durable index contents; later readers can rebuild
from the index relation.
Advanced Diagnostics
These functions are for inspecting planner behavior and index eligibility. They are not retrieval APIs and should not be presented as part of the ordinary query flow:
psql_bm25s_fast_path_advice(index_name)psql_bm25s_fast_path_plan(index_name, explain_plan_json)psql_bm25s_fast_path_explain(index_name, sql_text)
psql_bm25s_fast_path_advice(index_name) returns a JSON summary of:
- the index key type
- which predicate/order surfaces are supported
- whether filtered ranked SQL is eligible
- the canonical API
- the recommended SQL filter/order shape
psql_bm25s_fast_path_plan(...) and
psql_bm25s_fast_path_explain(...) report whether a concrete plan
actually used:
- a
psql_bm25sindex node - bitmap versus ordered index scan
@@/@@@-style match predicates<=>ordering inside the plan
psql_bm25s_generation_cache_preload(...) opens the index and warms the best
available generation-cache tier for the current deployment. In a configured
shared-preload deployment it can populate the main shared-memory arena before
application traffic reaches an index; otherwise it warms the DSM tier for
share-eligible generations or the selected backend-local path for small
indexes.
Indexes can also set auto_preload = <priority> as a reloption. The default
priority is 0, which disables automatic preload. Positive priorities are
best-effort hints consumed by the shared-preload background worker; larger
values are attempted first.
Warmup uses psql_bm25s.preload_timer_interval_ms, drains all currently due
marked indexes per cycle, and is intentionally independent from
psql_bm25s.maintenance_timer_interval_ms so rebuild throttling does not slow
startup residency.
Automatic rebuilds are separately guarded by
psql_bm25s.maintenance_rebuild_memory_budget. Maintenance reports
builder=standard, builder=compact, or builder=spill when a rebuild is
admitted. Automatic workers choose standard only when
standard_estimated_bytes <= budget_bytes * 0.60 and the active payload is
below the standard payload cap, choose compact only when
compact_estimated_bytes <= budget_bytes * 0.75 and the active payload is
below the compact payload cap, and otherwise choose spill when
spill_estimated_bytes <= budget_bytes. If even spill does not fit,
maintenance reports reason=memory_budget with the estimate fields and leaves
the readable resident generation in place instead of risking swap pressure.
Explicit CREATE INDEX / REINDEX uses the same estimates for builder choice,
but falls through to spill with a NOTICE because the operator asked for a
controlled rebuild.
See Index Policy for consistency modes, maintenance
behavior, and scheduler guidance.
See Shared Generation Cache for cache tiers,
large connection-pool deployment guidance, and the optional shared-preload
arena.
See Connection Memory and Index Prewarming for
workspace retention settings, memory sizing, and active warmup examples.
See Index Parameters for the complete
CREATE INDEX ... WITH (...) option reference.
Policy Recommendation Profiles
Current helper profiles:
query_firstbalancedsmall_mixed_churnheavy_mixed_churnheavy_insert_skewlongrun_mixed_churnwrite_tolerant_query_first