Recurrent Multimodal Interaction for Referring Image Segmentation
July 23, 2017 ยท View on GitHub
This repository contains code for Recurrent Multimodal Interaction for Referring Image Segmentation, ICCV 2017.
If you use the code, please cite
@inproceedings{liu2017recurrent,
title={Recurrent Multimodal Interaction for Referring Image Segmentation},
author={Liu, Chenxi and Lin, Zhe and Shen, Xiaohui and Yang, Jimei and Lu, Xin and Yuille, Alan},
booktitle={{ICCV}},
year={2017}
}
Setup
- Tensorflow 1.2.1
- Download or use symlink, such that the MS COCO images are under
data/coco/images/train2014/ - Download or use symlink, such that the ReferItGame data are under
data/referit/imagesanddata/referit/mask - Run
mkdir external. Download, git clone, or use symlink, such that TF-resnet and TF-deeplab are underexternal. Then strictly follow theExample Usagesection of their README - Download, git clone, or use symlink, such that refer is under
external. Then strictly follow theSetupandDownloadsection of its README. Also put thereferfolder inPYTHONPATH - Download, git clone, or use symlink, such that the MS COCO API is under
external(i.e.external/coco/PythonAPI/pycocotools) - pydensecrf
Data Preparation
python build_batches.py -d Gref -t train
python build_batches.py -d Gref -t val
python build_batches.py -d unc -t train
python build_batches.py -d unc -t val
python build_batches.py -d unc -t testA
python build_batches.py -d unc -t testB
python build_batches.py -d unc+ -t train
python build_batches.py -d unc+ -t val
python build_batches.py -d unc+ -t testA
python build_batches.py -d unc+ -t testB
python build_batches.py -d referit -t trainval
python build_batches.py -d referit -t test
Training and Testing
Specify several options/flags and then run main.py:
-g: Which GPU to use. Default is 0.-m:trainortest. Training mode or testing mode.-w:resnetordeeplab. Specify pre-trained weights.-n:LSTMorRMI. Model name.-d:Greforuncorunc+orreferit. Specify dataset.-t:trainortrainvalorvalortestortestAortestB. Which set to train/test on.-i: Number of training iterations in training mode. The iteration number of a snapshot in testing mode.-s: Used only in training mode. How many iterations per snapshot.-v: Used only in testing mode. Whether to visualize the prediction. Default is False.-c: Used only in testing mode. Whether to also apply Dense CRF. Default is False.
For example, to train the ResNet + LSTM model on Google-Ref using GPU 2, run
python main.py -m train -w resnet -n LSTM -d Gref -t train -g 2 -i 750000 -s 50000
To test the 650000-iteration snapshot of the DeepLab + RMI model on UNC testA set using GPU 1 (with visualization and Dense CRF), run
python main.py -m test -w deeplab -n RMI -d unc -t testA -g 1 -i 650000 -v -c
Miscellaneous
Code and data under util/ and data/referit/ are borrowed from text_objseg and slightly modified for compatibility with Tensorflow 1.2.1.
TODO
Add TensorBoard support.