README.md

May 25, 2026 · View on GitHub

🚀 (AAAI 2026) INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval

Zhiwei Chen¹, Yupeng Hu^1✉, Zhiheng Fu¹, Zixu Li¹, Jiale Huang¹, Qinlei Huang¹, Yinwei Wei¹

¹School of Software, Shandong University
^✉Corresponding author

📌 Introduction

Welcome to the official repository for INTENT. This project provides the codebase of our paper, offering a novel approach to Composed Image Retrieval with Noisy Correspondence using BLIP-2 architectures.

Disclaimer: This codebase is intended for research purposes.

📰 News and Updates

[Mar 2026] 🚀 Official paper is released at AAAI 2026.
[Mar 2026] 🚀 We have officially released the training and testing code for INTENT!
[Nov 2025] ⏳ INTENT is accepted by AAAI 2026.

🏃‍♂️ Experiment Results

CIR Task Performance

💡 Note for Fully-Supervised CIR Benchmarking: 🎯 The 0% noise setting in the table below is equivalent to the traditional fully-supervised CIR paradigm. We highlight this 0% block to facilitate direct and fair comparisons for researchers working on conventional supervised methods.

CIRR：

Table 1. Performance comparison on the CIRR test set in terms of R@K (%) and Rsub@K (%). The best and second-best results are highlighted in bold and underlined, respectively.

FIQ:

Table 2. Performance comparison on FashionIQ in terms of R@K (%). The best result under each noise ratio is highlighted in bold, while the second-best result is underlined.

Image Intervention

📂 Project Structure

To help you navigate our codebase quickly, here is an overview of the main components:

├── lavis/                 # Core model directory (built upon LAVIS)
│   └── models/
│       └── blip2_models/
│           └── blip2_cir.py   # 🧠 The core INTENT model implementation.
├── train_INTENT.py        # 🚂 Main training script
├── test.py                # 🧪 General evaluation script
├── cirr_sub_BLIP2.py      # 📤 Script to generate submission files for the CIRR dataset
├── datasets.py            # 📊 Data loading and processing utilities
└── utils.py               # 🛠️ Helper functions (logging, metrics, etc.)

🛠️ Setup

We recommend running this code on a Linux system with an NVIDIA GPU.

1. Clone the repository

git clone https://github.com/ZivChen-Ty/INTENT.git
cd INTENT

2. Create a virtual environment

conda create -n intent_env python=3.9
conda activate intent_env

# Install PyTorch (The evaluated environment uses Torch 2.1.0 with CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)

# Install core dependencies
pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16

3. Install dependencies

pip install -r requirements.txt

💾 Data Preparation

Before training or testing, you need to download and structure the datasets.

Download the CIRR / FashionIQ dataset from CIRR official repo and FashionIQ official repo.

Organize the data as follows:

1) FashionIQ:

├── FashionIQ
│   ├── captions
|   |   ├── cap.dress.[train | val].json
|   |   ├── cap.toptee.[train | val].json
|   |   ├── cap.shirt.[train | val].json

│   ├── image_splits
|   |   ├── split.dress.[train | val | test].json
|   |   ├── split.toptee.[train | val | test].json
|   |   ├── split.shirt.[train | val | test].json

│   ├── dress
|   |   ├── [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]

│   ├── shirt
|   |   ├── [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]

│   ├── toptee
|   |   ├── [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]

2) CIRR:

├── CIRR
│   ├── train
|   |   ├── [0 | 1 | 2 | ...]
|   |   |   ├── [train-10108-0-img0.png | train-10108-0-img1.png | ...]

│   ├── dev
|   |   ├── [dev-0-0-img0.png | dev-0-0-img1.png | ...]

│   ├── test1
|   |   ├── [test1-0-0-img0.png | test1-0-0-img1.png | ...]

│   ├── cirr
|   |   ├── captions
|   |   |   ├── cap.rc2.[train | val | test1].json
|   |   ├── image_splits
|   |   |   ├── split.rc2.[train | val | test1].json

(Note: Please modify datasets.py if your local data paths differ from the default setup.)

🚀 Quick Start

1. Training & Evaluating the Model

To train the INTENT model from scratch, use the train_INTENT.py script. You can specify hyperparameters via command line arguments or a config file.

python train_INTENT.py

And the evaluation process is included. (Tip: Check out utils.py for logging details during training. Checkpoints will be automatically saved.)

2. Generating Submissions (CIRR Dataset)

If you are evaluating on the CIRR test server, we provide a dedicated script to generate the required JSON submission files.

python cirr_sub_BLIP2.py \
  --checkpoint_path ./checkpoints/intent_run/best_model.pth \
  --output_file ./submission.json

🤔 Some More Discussion

While developing the first module of INTENT, we experimented with a straightforward causal mechanism: explicitly aligning the intervened image with the original one. Conceptually, this operation seems to help mitigate spurious correlations by blocking potential backdoor paths in our specific setting. We wonder if this causal perspective could also provide some inspiration for the Zero-Shot Composed Image Retrieval (ZS-CIR) community. Given that ZS-CIR models heavily rely on large-scale pre-training, they might occasionally be influenced by dataset biases and high-frequency co-occurrences. Introducing a similar causal alignment mechanism could potentially be an interesting direction to explore for decoupling the true modification intent from inherent background noise. Although this is just a preliminary thought rather than a definitive conclusion, we hope it might spark some fresh ideas and discussions for future research towards more robust ZS-CIR models.

📝 Citation

If you find our work or this code useful in your research, please consider leaving a star or citing our paper 🥰:

@inproceedings{INTENT,
  title={INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval},
  author={Chen, Zhiwei and Hu, Yupeng and Fu, Zhiheng and Li, Zixu and Huang, Jiale and Huang, Qinlei and Wei, Yinwei},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}

🙏 Acknowledgements

This codebase is heavily inspired by and built upon the excellent Salesforce LAVIS, SPRC and TME library. We thank the authors for their open-source contributions.

✉️ Contact

For any questions, issues, or feedback, please open an issue on GitHub or reach out to me at zivczw@gmail.com.

Ecosystem & Other Works from our Team

TEMA (ACL'26) Paper \| Project \| Code	ConeSep (CVPR'26) Paper \| Project \| Code \| Blog Post (Chinese)	Air-Know (CVPR'26) Paper \| Project \| Code \| Blog Post (Chinese)
HABIT (AAAI'26) Project \| Code \| Paper	ReTrack (AAAI'26) Project \| Code \| Paper	HUD (ACM MM'25) Project \| Code \| Paper
OFFSET (ACM MM'25) Project \| Code \| Paper	ENCODER (AAAI'25) Project \| Code \| Paper

📄 License

This project is released under the terms of the LICENSE file included in this repository.

If this project helps you, please leave a Star!

TEMA (ACL'26) Paper \| Project \| Code	ConeSep (CVPR'26) Paper \| Project \| Code \| Blog Post (Chinese)	Air-Know (CVPR'26) Paper \| Project \| Code \| Blog Post (Chinese)
HABIT (AAAI'26) Project \| Code \| Paper	ReTrack (AAAI'26) Project \| Code \| Paper	HUD (ACM MM'25) Project \| Code \| Paper
OFFSET (ACM MM'25) Project \| Code \| Paper	ENCODER (AAAI'25) Project \| Code \| Paper