Recurrent Fusion Network for Image Captioning

October 23, 2018 ยท View on GitHub

framework This repository includes the implementations for Recurrent Fusion Network for Image Captioning.

Requirements

  • Python 3.6
  • PyTorch 0.3.1
  • Java

Training

0. Feature extraction

All scripts for feature extraction are included in data/feature_extraction. Please generate flipped and cropped images to perform data augmentation, download pre-trained models and extract features with the scripts. All extracted featuers should be put in the data directory.

1. Train with cross entropy loss

bash train_recurrent_fusion_model.sh

2. Training with reinforcement learning

bash train_recurrent_fusion_model_rl.sh

Evaluation

Evaluate with eval_single.sh and eval_ensemble.sh to obtain metric scores for single model and ensemble of multiple models, respectively.

Reference

If you find this repo useful, please consider citing:

@InProceedings{Jiang_2018_ECCV,
author = {Jiang, Wenhao and Ma, Lin and Jiang, Yu-Gang and Liu, Wei and Zhang, Tong},
title = {Recurrent Fusion Network for Image Captioning},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}

Acknowledgements

Our code is based on Ruotian Luo's implementation ans is reorganized by Zhiming Ma.