README.md

May 13, 2022 ยท View on GitHub

A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models

Paper (ACL 2022)

This repository contains the implementation of FewVLM described in the paper. Codes are based on VL-T5

Installation

pip install -r requirements.txt
python -c "import language_evaluation; language_evaluation.download('coco')"

Datasets

  • Datasets can be downloaded from Google Drive
  • For other datasets, we used datasets from VL-T5 repository. Please refer to VL-T5 repository for download.

Pre-trained checkpoints

  • We released the pre-trained checkpoints: base and large

Pre-training

# Pre-train with 8 GPUs
bash scripts/pretrain.sh 8 

Zero/few-shot Learning

All commands are runnable on a single GPU.

VQA

# for few-shot
bash scripts/VQA.sh 0 VQA --subsample --dataseed 42 --num_data 16 --test_only --prompt 3

# for zero-shot 
bash scripts/VQA.sh 0 VQA --test_only --prompt 3

OKVQA

# for few-shot
bash scripts/OKVQA.sh 0 OKVQA --subsample --dataseed 42 --num_data 16 --test_only --prompt 3

# for zero-shot 
bash scripts/OKVQA.sh 0 OKVQA --test_only --prompt 3

GQA

# for few-shot
bash scripts/GQA.sh 0 GQA --subsample --dataseed 42 --num_data 16 --test_only --prompt 3

# for zero-shot 
bash scripts/GQA.sh 0 GQA --test_only --prompt 3

Flickr30k

# for few-shot
bash scripts/flickr.sh 0 flickr --subsample --dataseed 42 --num_data 16 --prefix image 

# for zero-shot 
bash scripts/flickr.sh 0 flickr --prefix image --test_only 

Nocaps

# for few-shot
bash scripts/nocaps.sh 0 nocaps --subsample --dataseed 42 --num_data 16 --prefix image 

# for zero-shot 
bash scripts/nocaps.sh 0 nocaps --prefix image --test_only 

Some important command line arguments are listed as follows:

ArgValuesDescriptionNotes
--loadpath for trained checkpointsload a checkpoint
--dataseed{0, 42, 9595,...}Random seed for data shufflingdefault=42
--seed{0, 42, 9595,...}Random seed for parameter shufflingdefault=9595
--subsamplestore_trueSubsample train and val sets for few-shot learning
--num_data{16, 40, ...}Number of subsamples for train and val setsdefault=16
--test_onlystore_trueRun test without training
--prompt{0, 1, 2, 3}Prompts for VQAdefault=0, 0: no prompt, 1: '[Q] <text_1>', 2: 'question: [Q] answer:', 3: 'question: [Q] answer: <text_1>'
--prefix{None, 'image', 'picture', 'photo'}Prompts for captioningDefault=None, 'image': 'an image of', 'picture': 'a picture of', 'photo': 'a photo of'
--backbone{'t5-base', 't5-large'}Backbone architecturedefault='t5-base'