README.md

March 28, 2023 · View on GitHub

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

CoCoCON Dataset

We provide the CoCoCON evaluation dataset consisting of 1500 samples at ./data/cococon.json. Each sample contains 1-5 contrast sets. See paper for details and a few examples below.

Inference using Unified-IO Models

We evaluate the pretrained checkpoints provided here on CoCoCON.

Migrate to the directory unified-io, follow instructions in the original repository to create JAX environment.
cd unified-io
Download pretrained Unified-IO checkpoints and save in the directory ./checkpoints/.
To run likelihood-based evaluation of cross-task consistency using CoCoCON, execute the following command. Sizes can be chosen from small, base, large and xl. Output files are saved at ./results/ by default. The path to validation split (val2014) of MS-COCO images is needed as additional input.
bash evaluate_cococon.sh <size> <path-to-image-directory>
To generate predictions for the samples in CoCoCON, execute the following command:
bash evaluate_tasks.sh <size> <path-to-image-directory>
Follow instructions here for evaluation of task-specific accuracies using output from Step 3.

Training and Inference using OFA Models

We first finetune pretrained checkpoints of OFA models on the four tasks in CoCoCON and then evaluate them on CoCoCON. Instructions for training OFA models coming soon!

Evaluation of COCO Tasks

Migrate to the evaluators directory i.e. cd evaluators/.

Image Captioning

Install packages required for COCO Caption Evaluation.
pip install -r requirements.txt
Run the following command using output files from Unified-IO or OFA.
python coco_eval.py <path-to-output-file> ../data/cococon.json

Acknowledgements

We thank the researchers behind Unified-IO and OFA for making their models available for training and inference.