Introduction
September 24, 2019 ยท View on GitHub
This is the source code of Learning Fragment Self-Atention Embeddings for Image-Text Matching, ACM MM 2019.
Requirements
- python 3.6
- pytorch 0.4.1
Download data
We use the precomputed image features provided by SCAN. Please download data.zip from SCAN.
Bert model
We use the bert code from BERT-pytorch. Please following here to convert the Google bert model to a PyTorch save file.
Training
python train.py --data_path /path/to/data --data_name f30k_precomp --bert_path /path/to/uncased_L-12_H-768_A-12/
python train.py --data_path /path/to/data --data_name coco_precomp --bert_path /path/to/uncased_L-12_H-768_A-12/