Learning from Children: Improving Image-Caption Pretraining via Curriculum
June 11, 2023 ยท View on GitHub
This repository provides the official implementation of our ACL 2023 Findings paper titled, Learning from Children: Improving Image-Caption Pretraining via Curriculum. The code is built on top of Open Vocabulary Object Detection. We appreciate the work of the authors in this valuable project.

Installation
Create environment and set up data as instructed in this repo; with this exception -- install PyTorch version 1.2.0 instead of PyTorch 1.0 nightly.
Generating Curriculum Data
Generate curriculum data by running this notebook.
Do image-caption pretraining on COCO captions dataset via curriculum
For 4 gpus:
python -m torch.distributed.launch --nproc_per_node=4 --master_port 6254 tools/train_net.py --skip-test --config-file configs/mmss_rcnn_v01_4x_cur_rs.yaml OUTPUT_DIR runs/
Train baseline for the same config
Run the above command with the following changes in configs/mmss_rcnn_v01_4x_cur_rs.yaml:
- CURRICULUM.DO = False
- Comment out MODEL.MMSS_HEAD.GROUNDING.ALIGNMENT_CURRICULUM
Evaluation
You can evaluate using a similar command as above, by running tools/test_net.py.
License
This repository is released under the MIT license. See LICENSE for additional details.