Performance and Benchmarks

May 11, 2026 ยท View on GitHub

This page is the default performance entry point for psql_bm25s.

Date: 2026-04-02

The current project reference is the latest PG18 15 x 5 BEIR matrix:

  • Python reference implementation bm25s
  • psql_bm25s ids
  • psql_bm25s text[]
  • ParadeDB pg_search
  • TensorChord vchord_bm25

Third-party project names identify the measured engines for reproducibility. The numbers on this page depend on the measured versions, configuration, hardware, workload, and query settings; they are not universal claims about any project outside this benchmark scope.

The exact BM25 PostgreSQL APIs remain the main anchor for psql_bm25s performance claims:

  • psql_bm25s_query_ids(...)
  • psql_bm25s_query_tokens(...)

The published matrix on this page is intentionally based on the pretokenized index paths:

  • psql_bm25s ids uses int4[]
  • psql_bm25s text[] uses text[]

Scalar text and varchar source columns are supported in the extension, but they are not the basis of the public 2026-04-02 cross-engine matrix. See Supported Input Types for the type-by-type contract and trade-offs.

Raw data used by this page:

  • official-beir-ids-current-2026-04-02.json
  • official-beir-text-current-2026-04-02.json
  • official-beir-python-reference-current-2026-04-02.json
  • official-beir-pg18-comparison-current-2026-04-02.json
  • pg18-beir-extension-matrix-current-2026-04-02.json
  • pg18-beir-quality-matrix-current-2026-05-06.json

Raw per-dataset result cells:

  • data/raw/pg18-beir-current-matrix-2026-04-02/results/<dataset>/<engine>.json

Validation status:

  • 75/75 dataset-engine cells in the rolled-up current matrix were checked against the backing raw cells
  • 30/75 refreshed psql_bm25s cells come from the 2026-04-02 Google Cloud refresh run
  • 45/75 carried-forward Python reference / pg_search / vchord_bm25 cells remain pinned to the stable 2026-03-31 PG18 matrix
  • the current matrix and the raw cells match exactly on:
    • stats
    • build_ms
    • query.count
    • query.qps

How To Read This Section

This page is intentionally limited to the current public PG18 matrix and the raw data needed to verify it. No other benchmark artifacts are part of this public performance reference.

Scope

The current public read-performance report is based on two aligned GCP PG18 runs:

  • the stable 2026-03-31 full 15 x 5 matrix
  • the 2026-04-02 refresh run that updated only psql_bm25s ids and psql_bm25s text[]
  • all 15 official BEIR subsets used in the BM25S benchmark set
  • the uploaded dataset cache
  • top_k = 1000

Benchmark scope:

  • cloud: Google Cloud
  • zone: us-east4-a
  • machine type: n2-standard-16
  • PostgreSQL: 18
  • dataset delivery: uploaded dataset cache, not runtime download
  • Python reference implementation path: Python bm25s carried forward from the stable 2026-03-31 matrix
  • PostgreSQL paths:
    • psql_bm25s_query_ids(...) refreshed on 2026-04-02
    • psql_bm25s_query_tokens(...) refreshed on 2026-04-02
    • ParadeDB pg_search carried forward from 2026-03-31
    • TensorChord vchord_bm25 carried forward from 2026-03-31

For psql_bm25s, the benchmark configuration matched the latest tuned exact path:

  • method = 'lucene'
  • idf_method = 'lucene'
  • k1 = 1.5
  • b = 0.75
  • delta = 0.5
  • consistency = 'manual'

Query Summary

min, median, and max below mean the per-dataset ratio of engine QPS against the Python reference implementation bm25s across the full 15-dataset suite.

PathAt or above Python referenceMin vs Python referenceMedian vs Python referenceMax vs Python reference
psql_bm25s ids12/150.35x3.97x60.10x
psql_bm25s text[]11/150.31x3.93x51.06x
pg_search3/150.01x0.17x2.76x
vchord_bm257/150.07x0.54x11.31x

Index Build Summary

build here means index construction time only.

PathTotal build msRelative to Python reference
Python reference implementation bm25s848046.351.00x
psql_bm25s ids262955.790.31x
psql_bm25s text[]443975.350.52x
pg_search356944.250.42x
vchord_bm25739014.630.87x

Dataset Table

DatasetDocsQueriesPython reference implementation bm25s QPSpsql_bm25s ids QPSpsql_bm25s text[] QPSpg_search QPSvchord_bm25 QPS
arguana8,6741,4061158.341402.631112.01115.9478.77
climate-fever5,416,5931,5353.0457.7850.752.845.25
cqadupstack457,19913,145111.56443.13438.4213.9960.51
dbpedia-entity4,635,9224673.47128.1991.195.2123.66
fever5,416,568123,1423.1597.5680.155.6212.13
fiqa57,6386,648810.511409.521186.4117.76190.57
hotpotqa5,233,32997,8524.1655.4049.863.429.31
msmarco8,841,823509,9621.6196.6782.134.4418.20
nfcorpus3,6333,2373155.353373.943326.961132.171252.75
nq2,681,4683,45210.55174.34176.696.2821.96
quora522,93115,00090.56637.98619.6413.26154.36
scidocs25,6571,0001203.091835.851614.9217.89367.04
scifact5,1831,1092964.862557.472240.18500.04629.42
trec-covid171,33250210.50191.94154.488.7575.66
webis-touche2020382,54549240.3682.9774.108.1486.04

Scale Trend

QPS vs dataset scale

Index Build Matrix

Query throughput is still the headline metric, but build time matters for every bulk load, refresh, and reproducible end-to-end deployment. The matrix below reports index construction time only. It excludes query execution and VM orchestration overhead so the comparison stays focused on the actual indexing cost of each engine.

The same dataset order is used as the query matrix above. That makes it easy to compare the two dimensions directly: one table answers "how fast is query execution after the index exists?" and the second answers "how expensive is it to get to that state?".

DatasetDocsQueriesPython reference implementation bm25s buildpsql_bm25s ids buildpsql_bm25s text[] buildpg_search buildvchord_bm25 build
nfcorpus3,6333,237167ms82ms113ms254ms326ms
scifact5,1831,109224ms99ms150ms326ms449ms
arguana8,6741,406318ms101ms168ms376ms448ms
scidocs25,6571,000974ms395ms633ms490ms1.10s
fiqa57,6386,6481.88s672ms1.06s705ms1.32s
trec-covid171,332506.67s2.65s4.61s2.86s3.80s
webis-touche2020382,5454921.18s9.84s16.51s10.65s13.39s
cqadupstack457,19913,14515.68s5.83s9.29s7.21s19.56s
quora522,93115,0005.80s788ms1.06s936ms1.69s
nq2,681,4683,4521.2m23.71s40.80s31.46s55.57s
dbpedia-entity4,635,9224671.7m26.37s45.05s40.73s1.9m
hotpotqa5,233,32997,8521.8m30.79s50.38s41.97s1.9m
fever5,416,568123,1422.6m52.83s1.6m1.2m2.7m
climate-fever5,416,5931,5352.7m52.74s1.5m1.2m2.5m
msmarco8,841,823509,9623.2m56.06s1.5m1.2m1.7m

Build Trend

Index build time vs dataset scale

Build-time trend highlights:

  • Total build-time ratios versus Python reference implementation bm25s are 0.31x for psql_bm25s ids, 0.52x for psql_bm25s text[], 0.42x for pg_search, and 0.87x for vchord_bm25.
  • Dataset counts at or below Python reference build time are 15/15 for psql_bm25s ids, 15/15 for psql_bm25s text[], 12/15 for pg_search, and 7/15 for vchord_bm25.
  • The build-time matrix should be read together with query throughput and index size, because each path makes a different index-time versus query-time trade-off.

Index Size Matrix

The matrix also records PostgreSQL index relation size as build_bytes. The Python reference implementation bm25s path is omitted here because it is not a PostgreSQL index relation and does not report an equivalent byte count.

DatasetDocspsql_bm25s ids sizepsql_bm25s text[] sizepg_search sizevchord_bm25 size
nfcorpus3,6334.60 MiB4.80 MiB4.98 MiB165.37 MiB
scifact5,1836.09 MiB6.35 MiB5.70 MiB227.52 MiB
arguana8,6748.51 MiB8.73 MiB6.43 MiB205.01 MiB
scidocs25,65724.81 MiB25.43 MiB19.67 MiB510.23 MiB
fiqa57,63842.96 MiB43.62 MiB25.55 MiB533.44 MiB
trec-covid171,332151.95 MiB153.91 MiB77.53 MiB1.49 GiB
webis-touche2020382,545484.02 MiB488.45 MiB290.91 MiB3.02 GiB
cqadupstack457,199323.90 MiB334.80 MiB204.12 MiB6.92 GiB
quora522,93152.29 MiB52.95 MiB28.55 MiB602.66 MiB
nq2,681,4681.34 GiB1.35 GiB725.37 MiB7.75 GiB
dbpedia-entity4,635,9221.50 GiB1.54 GiB1022.44 MiB22.13 GiB
hotpotqa5,233,3291.56 GiB1.59 GiB1.11 GiB20.70 GiB
fever5,416,5682.66 GiB2.69 GiB1.70 GiB25.47 GiB
climate-fever5,416,5932.66 GiB2.69 GiB1.70 GiB25.47 GiB
msmarco8,841,8232.98 GiB3.00 GiB1.65 GiB12.59 GiB

Index Size Trend

Index size vs dataset scale

Quality Matrix

The quality matrix is a local PG18 relevance run over the same 15 BEIR datasets. It measures NDCG@10, MAP@100, Recall@100, and Precision@10 with top_k = 100.

This table uses qrels-bearing queries only. That means the evaluated query count can be smaller than the full query count shown in the QPS table.

The local machine had all five comparison engines available for this relevance run: Python reference implementation bm25s, psql_bm25s ids, psql_bm25s text[], pg_search, and vchord_bm25.

The primary chart is an absolute-score heatmap rather than a dataset-scale line chart. Each cell is the metric value for one engine on one dataset. Bold cells are within 0.001 of the best engine for that dataset and metric.

Quality score heatmap

DatasetDocsEval queriesEngineNDCG@10MAP@100Recall@100Precision@10
nfcorpus3,633323Python reference implementation bm25s0.32300.15330.24740.2319
nfcorpus3,633323psql_bm25s ids0.32350.15350.25030.2322
nfcorpus3,633323psql_bm25s text[]0.32350.15350.25040.2322
nfcorpus3,633323pg_search0.32150.15230.24860.2313
nfcorpus3,633323vchord_bm250.32090.15180.24680.2303
scifact5,183300Python reference implementation bm25s0.68630.64390.91270.0907
scifact5,183300psql_bm25s ids0.68630.64390.91270.0907
scifact5,183300psql_bm25s text[]0.68630.64390.91270.0907
scifact5,183300pg_search0.68190.63970.91270.0900
scifact5,183300vchord_bm250.67660.63500.91270.0893
arguana8,6741,406Python reference implementation bm25s0.36550.25240.96590.0760
arguana8,6741,406psql_bm25s ids0.36560.25240.96590.0760
arguana8,6741,406psql_bm25s text[]0.36560.25240.96590.0760
arguana8,6741,406pg_search0.30600.21000.91610.0654
arguana8,6741,406vchord_bm250.35970.24840.95800.0749
scidocs25,6571,000Python reference implementation bm25s0.15780.10770.36460.0816
scidocs25,6571,000psql_bm25s ids0.15780.10770.36460.0816
scidocs25,6571,000psql_bm25s text[]0.15780.10770.36460.0816
scidocs25,6571,000pg_search0.15670.10660.36070.0805
scidocs25,6571,000vchord_bm250.15610.10660.36160.0802
fiqa57,638648Python reference implementation bm25s0.25140.20410.55930.0699
fiqa57,638648psql_bm25s ids0.25140.20410.55930.0699
fiqa57,638648psql_bm25s text[]0.25140.20410.55930.0699
fiqa57,638648pg_search0.25040.20340.55890.0688
fiqa57,638648vchord_bm250.25170.20360.55420.0698
trec-covid171,33250Python reference implementation bm25s0.59880.33540.11210.6500
trec-covid171,33250psql_bm25s ids0.59940.33530.11210.6500
trec-covid171,33250psql_bm25s text[]0.59940.33530.11210.6500
trec-covid171,33250pg_search0.59030.32400.11040.6420
trec-covid171,33250vchord_bm250.58950.32710.11080.6460
webis-touche2020382,54549Python reference implementation bm25s0.32590.21050.55570.3041
webis-touche2020382,54549psql_bm25s ids0.32590.21060.55570.3041
webis-touche2020382,54549psql_bm25s text[]0.32590.21060.55570.3041
webis-touche2020382,54549pg_search0.33470.21200.56000.3102
webis-touche2020382,54549vchord_bm250.33790.21400.55890.3122
cqadupstack457,19913,145Python reference implementation bm25s0.29940.27230.55430.0488
cqadupstack457,19913,145psql_bm25s ids0.29940.27230.55430.0488
cqadupstack457,19913,145psql_bm25s text[]0.29940.27230.55430.0488
cqadupstack457,19913,145pg_search0.30050.27350.55080.0489
cqadupstack457,19913,145vchord_bm250.30070.27340.55030.0490
quora522,93110,000Python reference implementation bm25s0.80450.76300.97710.1216
quora522,93110,000psql_bm25s ids0.80560.76430.97750.1218
quora522,93110,000psql_bm25s text[]0.80560.76430.97750.1218
quora522,93110,000pg_search0.80750.76620.97860.1223
quora522,93110,000vchord_bm250.80690.76580.97700.1220
nq2,681,4683,452Python reference implementation bm25s0.28490.24090.74300.0521
nq2,681,4683,452psql_bm25s ids0.28490.24080.74300.0521
nq2,681,4683,452psql_bm25s text[]0.28490.24080.74300.0521
nq2,681,4683,452pg_search0.29330.24870.75060.0532
nq2,681,4683,452vchord_bm250.29350.24910.74940.0532
dbpedia-entity4,635,922400Python reference implementation bm25s0.28010.21130.44720.2658
dbpedia-entity4,635,922400psql_bm25s ids0.28030.21100.44720.2658
dbpedia-entity4,635,922400psql_bm25s text[]0.28030.21100.44720.2658
dbpedia-entity4,635,922400pg_search0.28370.21690.45380.2683
dbpedia-entity4,635,922400vchord_bm250.28450.21720.45360.2685
hotpotqa5,233,3297,405Python reference implementation bm25s0.56890.48580.75860.1199
hotpotqa5,233,3297,405psql_bm25s ids0.56890.48590.75860.1199
hotpotqa5,233,3297,405psql_bm25s text[]0.56890.48590.75860.1199
hotpotqa5,233,3297,405pg_search0.59270.50940.77600.1242
hotpotqa5,233,3297,405vchord_bm250.58840.50500.77160.1234
fever5,416,5686,666Python reference implementation bm25s0.48110.42840.84940.0712
fever5,416,5686,666psql_bm25s ids0.48110.42840.84940.0712
fever5,416,5686,666psql_bm25s text[]0.48110.42840.84940.0712
fever5,416,5686,666pg_search0.51250.45980.86330.0744
fever5,416,5686,666vchord_bm250.51210.45930.86380.0744
climate-fever5,416,5931,535Python reference implementation bm25s0.13610.10200.36660.0429
climate-fever5,416,5931,535psql_bm25s ids0.13610.10200.36660.0429
climate-fever5,416,5931,535psql_bm25s text[]0.13610.10200.36660.0429
climate-fever5,416,5931,535pg_search0.14210.10640.38670.0456
climate-fever5,416,5931,535vchord_bm250.14030.10460.37840.0449
msmarco8,841,82343Python reference implementation bm25s0.40050.32120.42480.5767
msmarco8,841,82343psql_bm25s ids0.39960.32250.42470.5767
msmarco8,841,82343psql_bm25s text[]0.39960.32250.42470.5767
msmarco8,841,82343pg_search0.40970.35260.45060.5907
msmarco8,841,82343vchord_bm250.40930.33460.43930.5837

Quality readout:

  • All five engines sit in a close relevance band on this BM25 quality matrix. Average NDCG@10 ranges from 0.3976 to 0.4019 across the compared engines.
  • psql_bm25s ids and psql_bm25s text[] remain quality-neutral exact PostgreSQL paths in this run. Their largest absolute metric difference from the Python reference implementation is below 0.0030, while their engineering cost profile is covered by the QPS, build-time, and index-size matrices above.
  • pg_search and vchord_bm25 are also competitive on relevance in this quality validation run, but they have different throughput and storage trade- offs in the main performance matrix.
  • These quality metrics do not replace the QPS, build-time, or index-size matrix above. They only show that the compared engines are retrieving similarly relevant top-100 candidates under the BEIR qrels.

Readout

  • Median QPS ratios versus Python reference implementation bm25s are 3.97x for psql_bm25s ids, 3.93x for psql_bm25s text[], 0.54x for vchord_bm25, and 0.17x for pg_search.
  • Dataset counts at or above Python reference implementation bm25s are 12/15 for psql_bm25s ids, 11/15 for psql_bm25s text[], 7/15 for vchord_bm25, and 3/15 for pg_search.
  • On the largest workload, msmarco, the measured QPS was 96.67 for psql_bm25s ids, 82.13 for psql_bm25s text[], 18.20 for vchord_bm25, 4.44 for pg_search, and 1.61 for the Python reference implementation bm25s.
  • Build-time, index-size, and quality matrices should be read alongside QPS because the compared paths make different operational trade-offs.

Practical Conclusion

  • The current public reference is the refreshed PG18 15 x 5 matrix.
  • If throughput is the priority today, the recommended path is still int4[] plus psql_bm25s_query_ids(...).
  • text[] remains a strong, exact, and practical alternative when the token-array path is not desirable.
  • The published cross-engine matrix is still anchored to the pretokenized int4[] and text[] paths.
  • The current matrix supports this throughput ordering by suite median:
    • psql_bm25s ids first
    • psql_bm25s text[] second
    • vchord_bm25 third
    • pg_search fourth
  • The PG18 matrix is the current public benchmark reference for this repository.