Flickr30K Dataset (Retrieval)

July 31, 2022 ยท View on GitHub

Samples from Flickr30k dataset (Image credit: "https://bryanplummer.com/Flickr30kEntities/").Samples from Flickr30k dataset (Image credit: "https://bryanplummer.com/Flickr30kEntities/")

Flickr30K Dataset (Retrieval)

Description

Flickr30k dataset contains 31k+ images collected from Flickr, together with 5 reference sentences provided by human annotators.

Task

Cross modal retrieval: (1) image-text: given an image as query, retrieve texts from a gallery; (2) text-image: given a text as query, retrieval images from a gallery.

Metrics

Common metrics are recall@k, denotes the recall score after k retrieval efforts.

We use TR to denote the image-text retrieval recall score and IR to denote text-image retrieval score.

Leaderboard

(Ranked by TR@1.)

RankModelTR@1TR@5TR@10IR@1IR@5IR@10Resources
1BLIP97.299.9100.087.597.798.9paper, code, demo, blog
2X-VLM97.1100.0100.086.997.398.7paper, code
3ALBEF95.999.8100.085.697.598.9paper, code, blog
4ALIGN95.399.8100.084.997.498.6paper
5VILLA87.997.598.876.394.296.8paper, code
6UNITER87.398.099.275.694.196.8paper, code

Auto-Downloading

cd lavis/datasets/download_scripts && python download_flickr.py

References

Bryan A. Plummer, Liwei Wang, Christopher M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Lazebnik, Flickr30K Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models, IJCV, 123(1):74-93, 2017. [paper]