README.md
May 25, 2026 Β· View on GitHub
π (AAAI 2026) INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval
1School of Software, Shandong University Β Β ΒβΒ Corresponding authorΒ Β
π Introduction
Welcome to the official repository for INTENT. This project provides the codebase of our paper, offering a novel approach to Composed Image Retrieval with Noisy Correspondence using BLIP-2 architectures.
Disclaimer: This codebase is intended for research purposes.
π° News and Updates
- [Mar 2026] π Official paper is released at AAAI 2026.
- [Mar 2026] π We have officially released the training and testing code for INTENT!
- [Nov 2025] β³ INTENT is accepted by AAAI 2026.
INTENT Pipeline (based on LAVIS)

Table of Contents
- Experiment Results
- Project Structure
- Setup
- Data Preparation
- Quick Start
- Some More Discussion
- Citation
- Acknowledgement
πββοΈ Experiment Results
CIR Task Performance
π‘ Note for Fully-Supervised CIR Benchmarking: π― The 0% noise setting in the table below is equivalent to the traditional fully-supervised CIR paradigm. We highlight this
0%block to facilitate direct and fair comparisons for researchers working on conventional supervised methods.
CIRRοΌ
Table 1. Performance comparison on the CIRR test set in terms of R@K (%) and Rsub@K (%). The best and second-best results are highlighted in bold and underlined, respectively.
FIQ:
Table 2. Performance comparison on FashionIQ in terms of R@K (%). The best result under each noise ratio is highlighted in bold, while the second-best result is underlined.
Image Intervention

π Project Structure
To help you navigate our codebase quickly, here is an overview of the main components:
βββ lavis/ # Core model directory (built upon LAVIS)
β βββ models/
β βββ blip2_models/
β βββ blip2_cir.py # π§ The core INTENT model implementation.
βββ train_INTENT.py # π Main training script
βββ test.py # π§ͺ General evaluation script
βββ cirr_sub_BLIP2.py # π€ Script to generate submission files for the CIRR dataset
βββ datasets.py # π Data loading and processing utilities
βββ utils.py # π οΈ Helper functions (logging, metrics, etc.)
π οΈ Setup
We recommend running this code on a Linux system with an NVIDIA GPU.
1. Clone the repository
git clone https://github.com/ZivChen-Ty/INTENT.git
cd INTENT
2. Create a virtual environment
conda create -n intent_env python=3.9
conda activate intent_env
# Install PyTorch (The evaluated environment uses Torch 2.1.0 with CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
# Install core dependencies
pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16
3. Install dependencies
pip install -r requirements.txt
πΎ Data Preparation
Before training or testing, you need to download and structure the datasets.
Download the CIRR / FashionIQ dataset from CIRR official repo and FashionIQ official repo.
Organize the data as follows:
1) FashionIQ:
βββ FashionIQ
β βββ captions
| | βββ cap.dress.[train | val].json
| | βββ cap.toptee.[train | val].json
| | βββ cap.shirt.[train | val].json
β βββ image_splits
| | βββ split.dress.[train | val | test].json
| | βββ split.toptee.[train | val | test].json
| | βββ split.shirt.[train | val | test].json
β βββ dress
| | βββ [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]
β βββ shirt
| | βββ [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]
β βββ toptee
| | βββ [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]
2) CIRR:
βββ CIRR
β βββ train
| | βββ [0 | 1 | 2 | ...]
| | | βββ [train-10108-0-img0.png | train-10108-0-img1.png | ...]
β βββ dev
| | βββ [dev-0-0-img0.png | dev-0-0-img1.png | ...]
β βββ test1
| | βββ [test1-0-0-img0.png | test1-0-0-img1.png | ...]
β βββ cirr
| | βββ captions
| | | βββ cap.rc2.[train | val | test1].json
| | βββ image_splits
| | | βββ split.rc2.[train | val | test1].json
(Note: Please modify datasets.py if your local data paths differ from the default setup.)
π Quick Start
1. Training & Evaluating the Model
To train the INTENT model from scratch, use the train_INTENT.py script. You can specify hyperparameters via command line arguments or a config file.
python train_INTENT.py
And the evaluation process is included. (Tip: Check out utils.py for logging details during training. Checkpoints will be automatically saved.)
2. Generating Submissions (CIRR Dataset)
If you are evaluating on the CIRR test server, we provide a dedicated script to generate the required JSON submission files.
python cirr_sub_BLIP2.py \
--checkpoint_path ./checkpoints/intent_run/best_model.pth \
--output_file ./submission.json
π€ Some More Discussion
While developing the first module of INTENT, we experimented with a straightforward causal mechanism: explicitly aligning the intervened image with the original one. Conceptually, this operation seems to help mitigate spurious correlations by blocking potential backdoor paths in our specific setting. We wonder if this causal perspective could also provide some inspiration for the Zero-Shot Composed Image Retrieval (ZS-CIR) community. Given that ZS-CIR models heavily rely on large-scale pre-training, they might occasionally be influenced by dataset biases and high-frequency co-occurrences. Introducing a similar causal alignment mechanism could potentially be an interesting direction to explore for decoupling the true modification intent from inherent background noise. Although this is just a preliminary thought rather than a definitive conclusion, we hope it might spark some fresh ideas and discussions for future research towards more robust ZS-CIR models.
π Citation
If you find our work or this code useful in your research, please consider leaving a star or citing our paper π₯°:
@inproceedings{INTENT,
title={INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval},
author={Chen, Zhiwei and Hu, Yupeng and Fu, Zhiheng and Li, Zixu and Huang, Jiale and Huang, Qinlei and Wei, Yinwei},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}
π Acknowledgements
This codebase is heavily inspired by and built upon the excellent Salesforce LAVIS, SPRC and TME library. We thank the authors for their open-source contributions.
βοΈ Contact
For any questions, issues, or feedback, please open an issue on GitHub or reach out to me at zivczw@gmail.com.
π Related Projects
Ecosystem & Other Works from our Team
![]() TEMA (ACL'26) Paper | Project | Code |
![]() ConeSep (CVPR'26) Paper | Project | Code | Blog Post (Chinese) |
![]() Air-Know (CVPR'26) Paper | Project | Code | Blog Post (Chinese) |
![]() HABIT (AAAI'26) Project | Code | Paper |
![]() ReTrack (AAAI'26) Project | Code | Paper |
![]() HUD (ACM MM'25) Project | Code | Paper |
![]() OFFSET (ACM MM'25) Project | Code | Paper |
![]() ENCODER (AAAI'25) Project | Code | Paper |
π License
This project is released under the terms of the LICENSE file included in this repository.







