MacCap
April 1, 2024 ยท View on GitHub
AAAI 2024 Accepted Paper Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

Setup
First, download and set up the repo:
git clone https://github.com/Artanic30/MacCap
cd MacCap
conda env create -f environment.yml
conda activate MacCap
Data preparation
Download coco_train to data.
Download cc3m_train to data.
Training
./train_coco.sh
or
./train_cc3m.sh
Evaluation
Follow the instruction here to evaluate generated captions.
Citation
@article{qiu2024mining,
title={Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training},
author={Qiu, Longtian and Ning, Shan and He, Xuming},
journal={arXiv preprint arXiv:2401.02347},
year={2024}
}
Acknowledgments
This repository is heavily based on ClipCap, DeCap. For training we used the data of COCO dataset and Conceptual Captions.
Release Schedule
- Initial Code release
- Detail Document
- Data Preparation
- Training and Evaluation Scripts
- Checkpoints