Attribute Phrases

December 20, 2017 ยท View on GitHub

This is the dataset and the training code with Tensorflow used in the paper:

Jong-Chyi Su*, Chenyun Wu*, Huaizu Jiang, Subhransu Maji, "Reasoning about Fine-grained Attribute Phrases using Reference Games", International Conference on Computer Vision (ICCV), 2017

@inproceedings{su2017reasoning,
    Author = {Jong-Chyi Su and Chenyun Wu and Huaizu Jiang and Subhransu Maji},
    Title = {Reasoning about Fine-grained Attribute Phrases using Reference Games},
    Booktitle = {International Conference on Computer Vision (ICCV)},
    Year = {2017}
}

[Project page] [Paper]

Dataset

Each pair has 1 pair of images and 5 pairs of corresponding attribute phrases

Image 1Image 2
commercial planevsprivate plane
large planevssmall plane
white and greyvswhite with blue and red stripes
twin enginesvssingle engine
more windows on bodyvsless windows on body

Stats about the dataset

  • Training set: 4700 pairs
  • Val set: 2350 pairs
  • Test set: 2350 pairs

Requirements

  • Python 2.7
  • Tensorflow v1.0+

Download Dataset

  • User descriptions are included in dataset/visdiff\_SET.json, where SET={train, val, test, trainval}
  • Download images from OID dataset (http://www.robots.ox.ac.uk/~vgg/data/oid)
  • Move images from oid-aircraft-beta-1/data/images/aeroplane/\*.jpg to the folder dataset/images/\*.jpg

Download ImageNet Pre-trained Model

Add pretrained model (e.g. vgg_16.ckpt) in models/checkpoints/

Extract image feature to numpy file to accelerate training

Go to utils/ and run: python get_feature.py --dataset train the numpy file will be saved in img_feat/vgg_16/train.npy

Train Listener Model

Step 1 fix image feature Step 2 finetune image feature

SL (Simple Listener)

  1. python train_listener.py --mode train --log_dir result/SL --pairwise 0 --train_img_model 0 --max_steps 2000 --batch_size 128
  2. python train_listener.py --mode train --log_dir result/SL --pairwise 0 --train_img_model 1 --max_steps 7500 --load_model_path model-fixed-2000 --learn_rate 0.00001

SLr (Simple Listener trained w/o contrastive data)

  1. python train_listener.py --mode train --log_dir result/SLr --pairwise 0 --ran_neg_sample 1 --train_img_model 0 --max_steps 5000 --batch_size 128
  2. python train_listener.py --mode train --log_dir result/SLr --pairwise 0 --ran_neg_sample 1 --train_img_model 1 --max_steps 10000 --load_model_path model-fixed-5000 --learn_rate 0.00001

DL (Discerning Listener)

  1. python train_listener.py --mode train --log_dir result/DL --pairwise 1 --train_img_model 0 --max_steps 2000 --max_sent_length 17 --batch_size 128
  2. python train_listener.py --mode train --log_dir result/DL --pairwise 1 --train_img_model 1 --max_steps 7000 --load_model_path model-fixed-2000 --max_sent_length 17 --learn_rate 0.00001

Evaluate Listener Model

SL

  1. python train_listener.py --mode eval --log_dir result/SL --pairwise 0 --train_img_model 0 --load_model_path model-fixed-2000 --dataset val
  2. python train_listener.py --mode eval --log_dir result/SL --pairwise 0 --train_img_model 1 --load_model_path model-finetune-7500 --dataset val

SLr

  1. python train_listener.py --mode eval --log_dir result/SLr --pairwise 0 --train_img_model 0 --load_model_path model-fixed-5000 --dataset val
  2. python train_listener.py --mode eval --log_dir result/SLr --pairwise 0 --train_img_model 1 --load_model_path model-finetune-10000 --dataset val

DL

  1. python train_listener.py --mode eval --log_dir result/DL --pairwise 1 --train_img_model 0 --load_model_path model-fixed-2000 --dataset val
  2. python train_listener.py --mode eval --log_dir result/DL --pairwise 1 --train_img_model 1 --load_model_path model-finetune-7000 --dataset val

Train Speaker Model

  • Example: python train_speaker.py --speaker_mode=S --img_model=vgg_16 --train_img_model=1 --experiment_path=result/speaker/temp
  • Options:
    • --speaker_mode: S or DS
    • --img_model: alexnet, inception_v3, or vgg_16
    • --train_img_model: Fine-tune image model or not (0 as False, 1 as True)
    • --experiment_path: where to output and save the trained model
    • --load_model_dir: path to the pre-trained model. If not set, train from scratch
    • --load_model_name: model name (model-%steps) in load_model_dir
    • See more options in train_speaker.py

Use Speaker to Generate Attribute Phrases

  • Example: python inference_pairwise.py --input_path=result/speaker/temp --model_step=model-5000 --dataset_name=val
  • Options:
    • --input_path: path to the trained speaker model that you want to use
    • --model_step: model name (model-%steps) in input_path
    • --dataset_name: which sub-dataset to use (train / val / test)
    • See more options in inference_pairwise.py

Discerning Speaker Model

Here we use the listener model to re-rank attribute phrases generated by speaker model. To run this step, you need to have a listenter model, and generated phrases from a speaker model.

  • Example: pyhton rerank.py --listener_path=result/SL --listener_model=model-fixed-2000 --speaker_result_path=result/speaker/temp/infer_annotations_val_model-5000_case0_beam10_sent10.json --infer_dataset=val
  • Options:
    • --listener_path: path to the listener model used for reranking
    • --listener_model: model name (model-%steps) in listener_model
    • --speaker_result_path: the file that saves the phrases generated by a speaker model
    • --infer_dataset: which dataset to work on (train / val / test)
    • See more options in rerank.py

Generate Set-wise Attribute Phrases

  • In "inference_setwise.py", set "speaker_path" as the path to the trained speaker model you want to use
  • run python inference_setwise.py

Authors

Please contact jcsu@cs.umass.edu if you have any question.

  • Jong-Chyi Su (Umass-Amherst)
  • Chenyun Wu (Umass-Amherst)
  • Huaizu Jiang (Umass-Amherst)