Microsoft COCO Dataset (Retrieval)

July 31, 2022 ยท View on GitHub

Samples from the COCO Caption dataset (Image credit: "https://arxiv.org/pdf/1504.00325.pdf").(Samples from the COCO Caption dataset. Image credit: "https://arxiv.org/pdf/1504.00325.pdf")

Microsoft COCO Dataset (Retrieval)

Description

Microsoft COCO dataset contains over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions are be provided for each image.

Task

Cross modal retrieval: (1) image-text: given an image as query, retrieve texts from a gallery; (2) text-image: given a text as query, retrieval images from a gallery.

Metrics

Common metrics are recall@k, denotes the recall score after k retrieval efforts.

We use TR to denote the image-text retrieval recall score and IR to denote text-image retrieval score.

Leaderboard

(Ranked by TR@1.)

RankModelTR@1TR@5TR@10IR@1IR@5IR@10Resources
1BLIP82.495.497.965.186.391.8paper, code, demo, blog
2X-VLM81.295.698.263.485.891.5paper, code
3ALBEF77.694.397.260.784.390.5paper, code, blog
3ALIGN77.093.596.959.983.389.8paper
4VinVL75.492.996.258.883.590.3paper, code
5OSCAR73.592.296.057.582.889.8paper, code
6UNITER65.788.693.852.979.988.0paper, code

Auto-Downloading

cd lavis/datasets/download_scripts && python download_coco.py

References

"Microsoft COCO Captions: Data Collection and Evaluation Server", Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, C. Lawrence Zitnick