Pyserini: Reproducing Vector PRF Results

April 7, 2022 · View on GitHub

This guide provides instructions to reproduce the Vector PRF in the following work and on all datasets and DR models available in Pyserini:

Hang Li, Ahmed Mourad, Shengyao Zhuang, Bevan Koopman, Guido Zuccon. Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls

Starting with v0.12.0, you can reproduce these results directly from the Pyserini PyPI package. Since dense retrieval depends on neural networks, Pyserini requires a more complex set of dependencies to use this feature. See package installation notes for more details.

Note that we have observed minor differences in scores between different computing environments (e.g., Linux vs. macOS). However, the differences usually appear in the fifth digit after the decimal point, and do not appear to be a cause for concern from a reproducibility perspective. Thus, while the scoring script provides results to much higher precision, we have intentionally rounded to four digits after the decimal point.

Summary

Here's how our results stack up against all available models and datasets in Pyserini:

Passage Ranking Datasets

TREC DL 2019 Passage

ModelMethodMAPnDCG@10nDCG@100Recall@1000
ANCEOriginal0.37100.64520.55400.7554
ANCEAverage PRF 30.42470.65320.59370.7739
ANCERocchio PRF 5 A0.4 B0.60.42110.65390.59280.7825
TCT-ColBERT V1Original0.39060.67000.57300.7916
TCT-ColBERT V1Average PRF 30.43360.66390.61190.8230
TCT-ColBERT V1Rocchio PRF 5 A0.4 B0.60.44630.68750.61430.8393
TCT-ColBERT V2 HN+Original0.44690.72040.63180.8261
TCT-ColBERT V2 HN+Average PRF 30.48790.73120.67190.8586
TCT-ColBERT V2 HN+Rocchio PRF 5 A0.4 B0.60.48830.71110.66840.8694
DistillBERT KDOriginal0.40530.69940.57650.7653
DistillBERT KDAverage PRF 30.45750.70960.62170.7939
DistillBERT KDRocchio PRF 5 A0.4 B0.60.45480.70520.61890.8049
DistillBERT BalancedOriginal0.45900.72100.63600.8406
DistillBERT BalancedAverage PRF 30.48560.71900.65260.8515
DistillBERT BalancedRocchio PRF 5 A0.4 B0.60.49740.72310.66840.8775
SBERTOriginal0.40600.69300.59850.7872
SBERTAverage PRF 30.43540.70010.61490.7937
SBERTRocchio PRF 5 A0.4 B0.60.43710.69520.61490.7941
ADOREOriginal0.41880.68320.59460.7759
ADOREAverage PRF 30.46720.69580.62630.7890
ADORERocchio PRF 5 A0.4 B0.60.46290.70210.63250.7950

TREC DL 2020 Passage

ModelMethodMAPnDCG@10nDCG@100Recall@1000
ANCEOriginal0.40760.64580.56790.7764
ANCEAverage PRF 30.43250.65730.57930.7909
ANCERocchio PRF 5 A0.4 B0.60.43150.64710.58000.7957
TCT-ColBERT V1Original0.42900.66780.58260.8181
TCT-ColBERT V1Average PRF 30.47250.69570.61010.8667
TCT-ColBERT V1Rocchio PRF 5 A0.4 B0.60.46250.69450.60560.8576
TCT-ColBERT V2 HN+Original0.47540.68820.62060.8429
TCT-ColBERT V2 HN+Average PRF 30.48110.68360.62280.8579
TCT-ColBERT V2 HN+Rocchio PRF 5 A0.4 B0.60.48600.68040.62540.8518
DistillBERT KDOriginal0.41590.64470.57280.7953
DistillBERT KDAverage PRF 30.42140.63160.57550.8403
DistillBERT KDRocchio PRF 5 A0.4 B0.60.41450.62890.57600.8433
DistillBERT BalancedOriginal0.46980.68540.63460.8727
DistillBERT BalancedAverage PRF 30.48870.70860.64490.9030
DistillBERT BalancedRocchio PRF 5 A0.4 B0.60.48790.70830.64700.8926
SBERTOriginal0.41240.63440.57340.7937
SBERTAverage PRF 30.42580.64120.57810.8169
SBERTRocchio PRF 5 A0.4 B0.60.43420.65590.58510.8226
ADOREOriginal0.44180.66550.59490.8151
ADOREAverage PRF 30.47060.70860.61760.8323
ADORERocchio PRF 5 A0.4 B0.60.47600.70190.61930.8251

MS MARCO Passage V1

The PRF does not perform well with sparse judgements like in MS MARCO, the results here are just complements.

ModelMethodMAPnDCG@100Recall@1000MRR@10
ANCEOriginal0.33620.44570.95870.3302
ANCEAverage PRF 30.31330.42470.94900.3073
ANCERocchio PRF 5 A0.4 B0.60.31150.42500.95450.3048
TCT-ColBERT V1Original0.34160.45140.96400.3350
TCT-ColBERT V1Average PRF 30.28820.40140.94520.2816
TCT-ColBERT V1Rocchio PRF 5 A0.4 B0.60.28090.39880.95430.2740
TCT-ColBERT V2 HN+Original0.36440.47500.96950.3590
TCT-ColBERT V2 HN+Average PRF 30.31830.43250.95850.2995
TCT-ColBERT V2 HN+Rocchio PRF 5 A0.4 B0.60.31900.43600.96590.2933
DistillBERT KDOriginal0.33090.43910.95530.3250
DistillBERT KDAverage PRF 30.28300.39400.93250.2470
DistillBERT KDRocchio PRF 5 A0.4 B0.60.27870.39370.94320.2716
DistillBERT BalancedOriginal0.35150.46510.97710.3443
DistillBERT BalancedAverage PRF 30.29790.41510.96130.2630
DistillBERT BalancedRocchio PRF 5 A0.4 B0.60.29690.41780.97020.2897
SBERTOriginal0.33730.44530.95580.3314
SBERTAverage PRF 30.30940.41830.94460.3035
SBERTRocchio PRF 5 A0.4 B0.60.30340.41570.95290.2974
ADOREOriginal0.35230.46370.96880.3466
ADOREAverage PRF 30.31880.43300.95830.3127
ADORERocchio PRF 5 A0.4 B0.60.32090.43760.96690.3145

Reproducing Results

To reproduce the Average Vector PRF on different models, same command with different parameter values can be used:

$ python -m pyserini.dsearch --topics topic \
    --index index \
    --encoder encoder \
    --batch-size 64 \
    --threads 12 \
    --output runs/run.average_prf3.trec \
    --prf-depth 3 \
    --prf-method avg

To reproduce the Rocchio Vector PRF on different models, similar with Average:

$ python -m pyserini.dsearch --topics topic \
    --index index \
    --encoder encoder \
    --batch-size 64 \
    --threads 12 \
    --output runs/run.rocchio_prf5_a0.4_b0.6.trec \
    --prf-depth 5 \
    --prf-method rocchio \
    --rocchio-alpha 0.4 \
    --rocchio-beta 0.6

For different models and datasets, the --topics, --index, and --encoder are different, since Pyserini has all these datasets available, we can pass in different values to run on different datasets.

--topics:
    TREC DL 2019 Passage: dl19-passage
    TREC DL 2020 Passage: dl20
    MS MARCO Passage V1: msmarco-passage-dev-subset

--index:
    ANCE index with MS MARCO V1 passage collection: msmarco-passage-ance-bf
    TCT-ColBERT V1 index with MS MARCO V1 passage collection: msmarco-passage-tct_colbert-bf
    TCT-ColBERT V2 HN+ index with MS MARCO V1 passage collection: msmarco-passage-tct_colbert-v2-hnp-bf
    DistillBERT KD index with MS MARCO V1 passage collection: msmarco-passage-distilbert-dot-margin_mse-T2-bf
    DistillBERT Balanced index with MS MARCO V1 passage collection: msmarco-passage-distilbert-dot-tas_b-b256-bf
    SBERT index with MS MARCO V1 passage collection: msmarco-passage-sbert-bf

Note: TREC DL 2019, TREC DL 2020, and MS MARCO Passage V1 use the same passage collection, so the index of the same model will be the same among these three datasets.

--encoder:
    ANCE: castorini/ance-msmarco-passage
    TCT-ColBERT V1: castorini/tct_colbert-msmarco
    TCT-ColBERT V2 HN+: castorini/tct_colbert-v2-hnp-msmarco
    DistillBERT KD: sebastian-hofstaetter/distilbert-dot-margin_mse-T2-msmarco
    DistillBERT Balanced: sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco
    SBERT: sentence-transformers/msmarco-distilbert-base-v3

Note: If you have pre-computed queries available, the --encoder can be replaced with --encoded-queries to avoid "on-the-fly" query encoding by passing in the path to your pre-computed query file. For example, Pyserini has the ANCE pre-computed query available for MS MARCO Passage V1, so instead of using --encoder castorini/ance-msmarco-passage, one can use --encoded-queries ance-msmarco-passage-dev-subset. For ADORE model, you can only use --encoded-queries, otf encoding is not available.

With these parameters, one can easily reproduce the results above, for example, to reproduce TREC DL 2019 Passage with ANCE Average Vector PRF 3 the command will be:

$ python -m pyserini.search.faiss --topics dl19-passage \
    --index msmarco-passage-ance-bf \
    --encoder castorini/ance-msmarco-passage \
    --batch-size 64 \
    --threads 12 \
    --output runs/run.ance.dl19-passage.average_prf3.trec \
    --prf-depth 3 \
    --prf-method avg

To reproduce TREC DL 2019 Passage with ANCE Rocchio Vector PRF 5 Alpha 0.4 Beta 0.6, the command will be:

$ python -m pyserini.search.faiss --topics dl19-passage \
    --index msmarco-passage-ance-bf \
    --encoder castorini/ance-msmarco-passage \
    --batch-size 64 \
    --threads 12 \
    --output runs/run.ance.dl19-passage.rocchio_prf5_a0.4_b0.6.trec \
    --prf-method rocchio \
    --prf-depth 5 \
    --rocchio-topk 5 \
    --rocchio-alpha 0.4 \
    --rocchio-beta 0.6

To evaluate, we use trec_eval built in Pyserini:

For TREC DL 2019, use this command to evaluate your run file:

$ python -m pyserini.eval.trec_eval -c -m map -m ndcg_cut.100 -m recall.1000 -l 2 dl19-passage runs/run.ance.dl19-passage.average_prf3.trec
map                 all     0.4247
ndcg_cut_100        all     0.5937
recall_1000         all     0.7739

Qrels file is available in Pyserini, so just replace the runs/run.ance.dl19-passage.average_prf3.trec with your own run file path to test your reproduced results.

Similarly, for TREC DL 2020:

$ python -m pyserini.eval.trec_eval -c -m map -m ndcg_cut.100 -m recall.1000 -l 2 dl20-passage runs/run.ance.dl20-passage.average_prf3.trec
map                 all     0.4325
ndcg_cut_100        all     0.5793
recall_1000         all     0.7909

Qrels file also available in Pyserini, just replace the runs/run.ance.dl20-passage.average_prf3.trec with your own run file path to test your reproduced results.

For MS MARCO Passage V1, no need to use -l 2 option:

$ python -m pyserini.eval.trec_eval -c -m map -m ndcg_cut.100 -m recall.1000 msmarco-passage-dev-subset runs/run.ance.msmarco-passage.average_prf3.trec
map                 all     0.3133
ndcg_cut_100        all     0.4247
recall_1000         all     0.9490

Qrels file already available, replace the runs/run.ance.msmarco-passage.average_prf3.trec with your own run file path to test your reproduced results.

Reproduction Log*