Recurrent Fusion Network for Image Captioning

October 23, 2018 · View on GitHub

framework This repository includes the implementations for Recurrent Fusion Network for Image Captioning.

All scripts for feature extraction are included in data/feature_extraction. Please generate flipped and cropped images to perform data augmentation, download pre-trained models and extract features with the scripts. All extracted featuers should be put in the data directory.

1. Train with cross entropy loss

bash train_recurrent_fusion_model.sh

2. Training with reinforcement learning

bash train_recurrent_fusion_model_rl.sh

Evaluation

Evaluate with eval_single.sh and eval_ensemble.sh to obtain metric scores for single model and ensemble of multiple models, respectively.

Reference

If you find this repo useful, please consider citing:

@InProceedings{Jiang_2018_ECCV,
author = {Jiang, Wenhao and Ma, Lin and Jiang, Yu-Gang and Liu, Wei and Zhang, Tong},
title = {Recurrent Fusion Network for Image Captioning},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}

Acknowledgements

Our code is based on Ruotian Luo's implementation ans is reorganized by Zhiming Ma.

Recurrent Fusion Network for Image Captioning

Requirements

Training

0. Feature extraction

1. Train with cross entropy loss

2. Training with reinforcement learning

Evaluation

Reference

Acknowledgements