Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

November 29, 2022 ยท View on GitHub

Paper (ArXiv) | Project Page | Pre-trained Models

Shuquan Ye2,Yujia Xie1,Dongdong Chen1, Yichong Xu1, Lu Yuan1, Chenguang Zhu1, Jing Liao2

1Microsoft, 2City University of Hong Kong

This is the PyTorch code of the DANCE [paper]. The code is on PyTorch 1.11. Pre-training with ours code requires 4 nodes each with 8 A100 GPUs.

Catalog:

  • Code for DANCE-augmented Pre-training

  • Code for DANCE-augmented Fine-tuning

  • Code for Image-Text Retrieval, OK-VQA

  • Download of Pre-trained and Fine-tuned Checkpoints

BibTeX