Flickr30K Dataset (Retrieval)

July 31, 2022 · View on GitHub

Samples from Flickr30k dataset (Image credit: "https://bryanplummer.com/Flickr30kEntities/")

Flickr30K Dataset (Retrieval)

Description

Flickr30k dataset contains 31k+ images collected from Flickr, together with 5 reference sentences provided by human annotators.

Task

Cross modal retrieval: (1) image-text: given an image as query, retrieve texts from a gallery; (2) text-image: given a text as query, retrieval images from a gallery.

Metrics

Common metrics are recall@k, denotes the recall score after k retrieval efforts.

We use TR to denote the image-text retrieval recall score and IR to denote text-image retrieval score.

Leaderboard

(Ranked by TR@1.)

Rank	Model	TR@1	TR@5	TR@10	IR@1	IR@5	IR@10	Resources
1	BLIP	97.2	99.9	100.0	87.5	97.7	98.9	paper, code, demo, blog
2	X-VLM	97.1	100.0	100.0	86.9	97.3	98.7	paper, code
3	ALBEF	95.9	99.8	100.0	85.6	97.5	98.9	paper, code, blog
4	ALIGN	95.3	99.8	100.0	84.9	97.4	98.6	paper
5	VILLA	87.9	97.5	98.8	76.3	94.2	96.8	paper, code
6	UNITER	87.3	98.0	99.2	75.6	94.1	96.8	paper, code

Auto-Downloading

cd lavis/datasets/download_scripts && python download_flickr.py

Bryan A. Plummer, Liwei Wang, Christopher M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Lazebnik, Flickr30K Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models, IJCV, 123(1):74-93, 2017. [paper]

Flickr30K Dataset (Retrieval)

Description

Task

Metrics

Leaderboard

Auto-Downloading

References