MMCBench: Benchmarking Large Multimodal Models against Common Corruptions 🚀

January 23, 2024 · View on GitHub

Code for the paper Benchmarking Large Multimodal Models against Common Corruptions.

Overview

MMCBench is a comprehensive benchmarking framework designed to evaluate the robustness and self-consistency of Large Multimodal Models (LMMs) under common corruption scenarios. This framework focuses on cross-modal interactions involving text, image, and speech, covering essential generative tasks such as text-to-image, image-to-text, text-to-speech, and speech-to-text. Our benchmarking approach uses a novel methodology for selecting representative examples from large datasets and employs a consistent metric system for performance measurement across various cross-modalities.

Benchmarking Process 📈

The selection and evaluation process for cross-modality consistency in MMCBench involves two main steps:

Selection Process 🕵️‍♂️: This step involves determining similarity based on text modality, using model-generated captions or transcriptions for non-text inputs, and directly comparing text inputs before and after corruption.
Evaluation Process 📝: This step measures self-consistency by comparing clean inputs with outputs from corrupted inputs and comparing outputs from clean and corrupted inputs against each other.

Overview of the Selection and Evaluation Process 📌

Selection and Evaluation Process

Model Resilience Analysis 🛡️

We present radar charts depicting the relative consistency scores of selected models for various corruptions across four cross-modality tasks: text-to-image 🎨, image-to-text 📜, text-to-speech 🗣️, and speech-to-text 📝. The scores are normalized with the highest scoring model set as the baseline for each type of corruption, allowing for a comparative analysis of each model's resilience.

Radar Charts of Model Consistency Scores 🎯

Radar Charts

Repository Structure 📂

MMCBench/
- image2text/: Image-to-Text generation tasks.
- speech2text/: Speech-to-Text generation tasks.
- text2image/: Text-to-Image generation tasks.
- text2speech/: Text-to-Speech generation tasks.

Environment Setup 🌐

To set up the environment for running MMCBench, we recommend using Conda, which can handle packages and dependencies effectively. Follow these steps to create and activate a Conda environment:

Create a Conda Environment: Open your terminal and run the following command to create a new environment named mmcbench_env:
```
conda create -n mmcbench python=3.9
```
Activate the Environment: Activate the newly created environment:
```
conda activate mmcbench
```
Install Required Packages: Install all necessary packages using the requirements.txt file included in the repository:
```
pip install -r requirements.txt
```

MMCBench: Benchmarking Large Multimodal Models against Common Corruptions 🚀

Overview

Benchmarking Process 📈

Overview of the Selection and Evaluation Process 📌

Model Resilience Analysis 🛡️

Radar Charts of Model Consistency Scores 🎯

Repository Structure 📂

Environment Setup 🌐

Getting Started 🚦

Contributions 👐

License 📄

Acknowledgments 🎉