README.md

June 28, 2022 ยท View on GitHub

This paper was accepted at CVPR 2022!

Slow description video.

This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Usage example

python dynamic_inverted_softmax.py --sims_train_test_path msrvtt/tt-ce-train-captions-test-videos-seed0.pkl --sims_test_path msrvtt/tt-ce-test-captions-test-videos-seed0.pkl --test_query_masks_path msrvtt/tt-ce-test-query_masks.pkl

To test QB-Norm on your own data you need to:

  1. Extract the similarity matrix between the caption from the training split and the videos from the testing split path/to/sims/train/test
  2. Extract testing split similarity matrix (similarities between testing captions and testing video) path/to/sims/test
  3. Run QB-Norm
python dynamic_inverted_softmax.py --sims_train_test_path path/to/sims/train/test --sims_test_path path/to/sims/test

Data

The similarity matrices for each method were extracted using the official repositories as follows: CE+, TT-CE+, CLIP2Video, CLIP4Clip (for CLIP4Clip we used the official repo to train from scratch new models since they do not provide pre-trained weights), CLIP, MMT, Audio-Retrieval.

Here you can find our trained weights for CLIP4Clip: MSRVTT, DiDeMo, LSMDC, Activity-Net.

You can download the extracted similarity matrices for training and testing here: MSRVTT, MSRVTT 1kA CLIP2Video, MSVD, DiDeMo, LSMDC.

Text-Video retrieval results

The value used for the inverse temperature is 20, with the exception for CLIP2Video where we used 1/1.99.

QB-Norm Results on MSRVTT Benchmark

ModelSplitTaskR@1R@5R@10MdRGeom
CE+Fullt2v14.4(0.1)37.4(0.1)50.2(0.1)10.0(0.0)30.0(0.1)
CE+ (+QB-Norm)Fullt2v16.4(0.0)40.3(0.1)52.9(0.1)9.0(0.0)32.7(0.1)
TT-CE+Fullt2v14.9(0.1)38.3(0.1)51.5(0.1)10.0(0.0)30.9(0.1)
TT-CE+ (+QB-Norm)Fullt2v17.3(0.0)42.1(0.2)54.9(0.1)8.0(0.0)34.2(0.1)

QB-Norm Results on MSVD Benchmark

ModelSplitTaskR@1R@5R@10MdRGeom
TT-CE+Fullt2v25.4(0.3)56.9(0.4)71.3(0.2)4.0(0.0)46.9(0.3)
TT-CE+ (+QB-Norm)Fullt2v28.9(0.3)62.0(0.4)74.8(0.3)3.0(0.0)43.1(0.1)
CLIP2VideoFullt2v47.076.885.92.067.7
CLIP2Video (+QB-Norm)Fullt2v47.677.686.12.068.5

QB-Norm Results on DiDeMo Benchmark

ModelSplitTaskR@1R@5R@10MdRGeom
TT-CE+Fullt2v21.6(0.7)48.6(0.4)62.9(0.6)6.0(0.0)40.4(0.4)
TT-CE+ (+QB-Norm)Fullt2v24.2(0.7)50.8(0.7)64.4(0.1)5.3(0.5)43.0(0.2)
CLIP4ClipFullt2v43.070.580.02.062.4
CLIP4Clip (+QB-Norm)Fullt2v43.571.480.92.063.1

QB-Norm Results on LSMDC Benchmark

ModelSplitTaskR@1R@5R@10MdRGeom
TT-CE+Fullt2v17.2(0.4)36.5(0.6)46.3(0.3)13.7(0.5)30.7(0.3)
TT-CE+ (+QB-Norm)Fullt2v17.8(0.4)37.7(0.5)47.6(0.6)12.7(0.5)31.7(0.3)
CLIP4ClipFullt2v21.340.049.511.034.8
CLIP4Clip (+QB-Norm)Fullt2v22.340.149.511.035.4

The temperature used for CLIP4Clip method on the LSMDC dataset is 0.8.

QB-Norm Results on VaTeX Benchmark

ModelSplitTaskR@1R@5R@10MdRGeom
TT-CE+Fullt2v53.2(0.2)87.4(0.1)93.3(0.0)1.0(0.0)75.7(0.1)
TT-CE+ (+QB-Norm)Fullt2v54.8(0.1)88.2(0.1)93.8(0.1)1.0(0.0)76.8(0.0)
CLIP2VideoFullt2v57.487.993.61.077.9
CLIP2Video (+QB-Norm)Fullt2v58.888.393.81.078.7

QB-Norm Results on QuerYD Benchmark

ModelSplitTaskR@1R@5R@10MdRGeom
CE+Fullt2v13.2(2.0)37.1(2.9)50.5(1.9)10.3(1.2)29.1(2.2)
CE+ (+QB-Norm)Fullt2v14.1(1.8)38.6(1.3)51.1(1.6)10.0(0.8)30.2(1.7)
TT-CE+Fullt2v14.4(0.5)37.7(1.7)50.9(1.6)9.8(1.0)30.3(0.9)
TT-CE+ (+QB-Norm)Fullt2v15.1(1.6)38.3(2.4)51.2(2.8)10.3(1.7)30.9(2.3)

Text-Image retrieval results

QB-Norm Results on MSCoCo Benchmark

ModelSplitTaskR@1R@5R@10MdRGeom
CLIP5kt2i30.356.167.14.048.5
CLIP (+QB-Norm)5kt2i34.859.970.43.052.8
MMT-Oscar5kt2i52.280.288.01.071.7
MMT-Oscar (+QB-Norm)5kt2i53.980.588.11.072.6

Text-Audio retrieval results

QB-Norm Results on AudioCaps Benchmark

ModelSplitTaskR@1R@5R@10MdRGeom
AR-CEFullt2a23.1(0.6)55.1(0.7)70.7(0.6)4.7(0.5)44.8(0.7)
AR-CE (+QB-Norm)Fullt2a23.9(0.2)57.1(0.3)71.6(0.4)4.0(0.0)46.0(0.3)

References

If you find this code useful or use the extracted similarity matrices, please consider citing:

@inproceedings{bogolin2021cross,
      title={Cross Modal Retrieval with Querybank Normalisation}, 
      author={Simion-Vlad Bogolin and Ioana Croitoru and Hailin Jin and Yang Liu and Samuel Albanie},
      booktitle={CVPR}
      year={2022}
}