MAFA: Managing False Negatives for Vision-Language Pre-training

December 9, 2024 ยท View on GitHub

This is the official PyTorch implementation of "MAFA: Managing False Negatives for Vision-Language Pre-training" (Accepted to CVPR 2024)

Pre-training Dataset Download:

Downstream-task Datasets:

Json Files:

  • Use same json files from ALBEF
  • Change the image path in json files according to your downloaded images (In CC3M and SBU, some images can not be crawled, thus, you should consider about these missing images when creating json files)

Requirements:

  • pytorch 1.8.0
  • transformers 4.8.1
  • timm 0.4.9

Pre-training:

Pretrain.py will be uploaded soon.

  1. Pre-train the model for MAFA using 4 A100 GPUs (assume filter model exists):
python3 -m torch.distributed.launch --nproc_per_node=4 --use_env Pretrain.py --config ./configs/Pretrain.yaml --output_dir output/Pretrain/  

Downstream tasks:

  1. IRTR (MS-COCO) using 4 A100 GPUs:
python3 -m torch.distributed.launch --nproc_per_node=4 --use_env Retrieval.py --config ./configs/Retrieval_coco.yaml --output_dir output/Retrieval_coco/  --checkpoint [Pretrained checkpoint]  --filter_config ./configs/filter.yaml   --filter_checkpoint [Pretrained filter checkpoint] 
  1. IRTR (Flickr) using 4 A100 GPUs:
python3 -m torch.distributed.launch --nproc_per_node=4 --use_env Retrieval.py --config ./configs/Retrieval_flickr.yaml --output_dir output/Retrieval_coco/  --checkpoint [Pretrained checkpoint] 
  1. NLVR using 4 A100 GPUs:
python3 -m torch.distributed.launch --nproc_per_node=4 --use_env Pretrain_nlvr.py --config ./configs/NLVR_pretrain.yaml --output_dir output/NLVR_pretrain/ --checkpoint [Pretrained checkpoint] 
python3 -m torch.distributed.launch --nproc_per_node=4 --use_env NLVR.py --config ./configs/NLVR.yaml --output_dir output/NLVR/ --checkpoint [NLVR-Pretrained checkpoint] 
  1. VQA using 4 A100 GPUs:
python3 -m torch.distributed.launch --nproc_per_node=4 --use_env VQA.py --config ./configs/VQA.yaml --output_dir output/vqa/ --checkpoint [Pretrained checkpoint] 

Check examples from ECM

You can simply check the false negative examples obtained from ECM. (See Get_ECM_example.ipynb) You can find additional examples in FN_examples

If you have any questions or problems to run this code, please mail to wotjr3868@snu.ac.kr or dohoon.kim@snu.ac.kr. Thank you!

Acknowledgement:

Our code implementation is largely borrowed from GRIT-VLP since our method is mainly built upon it.