Improving Cross-Modal Retrieval with Set of Diverse Embeddings

November 15, 2023 · View on GitHub

This repository contains the official source code for our paper:

Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Dongwon Kim, Namyup Kim, and Suha Kwak
POSTECH CSE
CVPR (Highlight), Vancouver, 2023.

Acknowledgement

Parts of our codes are adopted from the following repositories.

Dataset

data 
├─ coco_download.sh  
├─ coco # can be downloaded with the coco_download.sh 
│  ├─ images
│  │  └─ ......
│  └─ annotations 
│     └─ ......
├─ coco_butd
│  └─ precomp  
│     ├─ train_ids.txt
│     ├─ train_caps.txt
│     └─ ......   
├─ f30k 
│  ├─ images
│  │  └─ ......
│  ├─ dataset_flickr30k.json
│  └─ ......  
└─ f30k_butd
   └─ precomp  
      ├─ train_ids.txt
      ├─ train_caps.txt
      └─ ......

vocab # included in this repo
├─ coco_butd_vocab.pkl
└─ ......

coco_butd and f30k_butd: Datasets used for the Faster-RCNN image backbone. We use the pre-computed features provided by SCAN, which can be downloaded via https://github.com/kuanghuei/SCAN#download-data.
coco and f30k: Datasets used for the CNN backbones. Please refer the COCO download script and Flickr30K website+Flickr30K .json to download the images and captions.

Note: Downloaded datasets should be placed according to the directory structure presented above.

Requirements

You can install requirements using conda.

conda create --name <env> --file requirements.txt

Training on COCO

sh train_eval_coco.sh