README.md
December 4, 2025 ยท View on GitHub
This is the official github of "MACIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval" (Accepted to ICCV 2025)
Overview
Composed Image Retrieval (CIR) seeks to retrieve a target image by using a reference image and conditioning text specifying desired modifications. While recent approaches have shown steady performance improvements on existing CIR benchmarks, we argue that it remains unclear whether these gains genuinely reflect an enhanced compositional understanding of both visual and textual information. For example, current benchmarks do not explicitly consider negation cases and offer limited semantic diversity, with insufficient hard negatives to thoroughly evaluate the CIR task. To bridge this gap, we introduce Multimodal Arithmetic Benchmark for CIR (MA-CIR), a challenging CIR benchmark that integrates arithmetic types (negation, replacement, and addition) across seven complex semantic categories (e.g., spatial reasoning, object reasoning, etc). Moreover, carefully constructed hard negatives are incorporated to assess models in a controlled setting. In MA-CIR, we observe that current CIR models struggle with negation (or replacement) arithmetic types and semantic types that require complex reasoning, indicating a potential reliance on object or concept information. To tackle this, we propose leveraging strong text encoders, particularly those based on large language models (LLMs), and fine-tuning them using carefully constructed text triplets that include hard negatives, thereby enhancing their compositional understanding.
MACIR dataset
The dataset can be found in this link
Evaluation
For the default setting (including the whole database)
$ python3 eval.py \
--eval-type phi \
--dataset-path /path/to/your_dataset/ \
--phi-checkpoint-name "/path/to/your_checkpoint/phi_best.pt" \
--eval_level "full" \
--clip_model_name large
For the default setting (including the whole database, print separately)
$ python3 eval.py \
--eval-type phi \
--dataset-path /path/to/your_dataset/ \
--phi-checkpoint-name "/path/to/your_checkpoint/phi_best.pt" \
--eval_level "full_splits" \
--clip_model_name large
For restricted evaluation for "remove" type
$ python3 eval.py \
--eval-type phi \
--dataset-path /path/to/your_dataset/ \
--phi-checkpoint-name "/path/to/your_checkpoint/phi_best.pt" \
--split "remove" \
--eval_level "restricted" \
--clip_model_name large
Acknowledgement
Our code implementation is largely borrowed from LinCIR and E5-V. We appreciate the original authors for their invaluable contributions.