Synthesizing Counterfactual Samples for Effective Image-Text Matching

January 1, 2023 · View on GitHub

Official PyTorch implementation of the paper Synthesizing Counterfactual Samples for Effective Image-Text Matching (MM 2022 Oral).

Please use the following bib entry to cite this paper if you are using any resources from the repo.

@inproceedings{wei2022synthesizing,
  title={Synthesizing Counterfactual Samples for Effective Image-Text Matching},
  author={Wei, Hao and Wang, Shuhui and Han, Xinzhe and Xue, Zhe and Ma, Bin and Wei, Xiaoming and Wei, Xiaolin},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  pages={4355--4364},
  year={2022}
}

We referred to the implementation of VSE_infty to build up our codebase.

Preparation

Environment

We trained and evaluated our models with the following key dependencies:

Python 3.8.12
Pytorch 1.10.2
Transformers 4.14.1

Run pip install -r requirements.txt to install the exactly same dependencies as our experiments.

Data

We organize all data used in the experiments in the following manner:

data
├── coco
│   └── precomp  # pre-computed BUTD region features for COCO, provided by SCAN 
│
├── f30k
│   └── precomp  # pre-computed BUTD region features for Flickr30K, provided by SCAN
│
└── vocab  # vocab files provided by SCAN (only used when the text backbone is BiGRU)

The download links for precomputed BUTD features, and corresponding vocabularies are from the offical repo of SCAN. The precomp folders contain pre-computed BUTD region features, and vocab folder contains corresponding vocabularies.

Training

Assuming the data root is /tmp/data, we provide example training scripts for BUTD Region feature for the image feature, BERT-base for the text feature. See train_coco.sh and train_f30k.sh.

Evaluation

Run eval.py to evaluate specified models on either COCO and Flickr30K. For evaluting pre-trained models on COCO, use the following command (assuming the local data path is /tmp/data and the model name is coco_butd_region_bert):

CUDA_VISIBLE_DEVICES=0 python eval.py --dataset coco --data_path /tmp/data/coco --model coco_butd_region_bert

For evaluting pre-trained models on Flickr30K, use the command:

CUDA_VISIBLE_DEVICES=0 python eval.py --dataset f30k --data_path /tmp/data/f30k --model f30k_butd_region_bert

Results

	R1	R5	R1	R5	Link
COCO 1K	80.6	96.8	65.0	91.4	Google drive
COCO 5K	59.5	86.1	42.7	73.1
Flickr30K	82.7	95.5	62.6	86.9	Google drive