MM_Robustness
January 25, 2024 ยท View on GitHub
Journal of Data-centric Machine Learning Research (DMLR)
Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift
More details can be found on the project webpage.
The code for generating multimodal robustness evaluation datasets for downstream image-text applications, including image-text retrieval, visual reasoning, visual entailment, image captioning, and text-to-image generation.
Citation
If you feel our code or models help your research, kindly cite our papers:
@inproceedings{Qiu2022BenchmarkingRO,
title={Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift},
author={Jielin Qiu and Yi Zhu and Xingjian Shi and F. Wenzel and Zhiqiang Tang and Ding Zhao and Bo Li and Mu Li},
journal={Journal of Data-centric Machine Learning Research (DMLR)},
year={2024}
}
Installation
./install.sh
Datasets
- The original datasets can be downloaded from the original website:
Generate perturbation datasets
-
For image perturbation, please see image_perturbation
-
For text perturbation, please see text_perturbation
-
For detection score, please see detection_score
Evaluation data for text-to-image generation
For the text-to-image generation evaluation, we used the captions from COCO as prompt to generate the corresponding images. We also share the generated images here.
Baselines
For the evaluated baselines, plase see evaluated_baselines
Security
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.