Meta-Module-Network
May 13, 2021 ยท View on GitHub
Code for WACV 2021 Paper "Meta Module Network for Compositional Visual Reasoning"
Data Downloading
Download all the question files and scene graph files and bottom-up features from the web server, it can take up to 300G disk space.
bash get_data.sh
This script will download questions/ folder, and the "trainval_all_programs.json" is used for bootstrapping and "trainval_unbiased_programs.json" is used for finetunning in the paper. The "trainval_unbiased_programs.json" and "testdev_pred_programs.json" are both generated by the program generator model.
Meta Module Network Implementation
To understand more detailed implementation of MMN, please refer to README.
Description of different files
- sceneGraphs/trainval_bounding_box.json: the scene graph provided by the original GQA dataset
{ imageId: { bouding_box_id: { x: number, y: number, w: number, h: number, relations: [{object: "bounding_box_id", name: "relation_name"} ... ], name: object_class, attributes: [attr1, attr2, ... ] }, bouding_box_id: { ... }, } } - questions: the questions-program pairs and their associated images.
[ [ "ImageId", "Question", "Programs": [f1, f2, ..., fn], "QuestionId", "Answer" ] ]
Data Preprocessing [Optional]:
If you want to know how the programs and training data are generated, please follow the following steps:
Preprocessing Question-Program Pairs:
Download the questions from the original GQA website and then put it in the parent folder '../gqa-questions/', the following steps are aimed to convert "questions" into program format as follows:
- preprocess the trainval_all_question into trainval_all_programs.json
python preprocess.py trainval_all
- preprocess the "balanced" programs into different forms:
python preprocess.py create_balanced_programs
- create the programs into the "input" forms for trainval_all_programs.json:
python preprocess.py create_all_inputs
- create the programs into the "input" forms for *balanced.json:
python preprocess.py create_inputs
Using NL2Program Model to Predict Test-Dev Programs from input questions:
- Train the sequence-2-sequence model:
python generate_program.py --do_preprocess
- Evaluate the NL2Program
python generate_program.py --do_testdev
- Prepare the generated programs for the modular transformer
python generate_program.py --do_trainval_unbiased
Meta Module Network Training and Evaluation
- Prepare the inputs for the modular transformer:
python preprocess.py create_pred_inputs - Start the bootstrap training of the modular transoformer or you can download the pre-trained models directly from Google Drive. This bootstrap process could take quite a long time, please be patient if you are training on your own:
python run_experiments.py --do_train_all --model TreeSparsePostv2 --id TreeSparsePost2Full --stacking 2 --batch_size 1024 - Start the finetunning on the balanced split:
python run_experiments.py --do_finetune --id FinetuneTreeSparseStack2RemovalFullValSeed6999 --model TreeSparsePostv2 --load_from models/TreeSparsePost2Full --seed 6999 --stacking 2 - Test the model on the testdev split:
python run_experiments.py --do_testdev_pred --id FinetuneTreeSparseStack2RemovalValSeed6777 --load_from [MODEL_NAME] --model TreeSparsePostv2 --stacking 2
Citation
If you find this paper useful, please add the following reference to your paper.
@article{chen2019meta,
title={Meta module network for compositional visual reasoning},
author={Chen, Wenhu and Gan, Zhe and Li, Linjie and Cheng, Yu and Wang, William and Liu, Jingjing},
journal={Proceedings of WACV},
year={2021}
}