README.md

May 25, 2026 · View on GitHub

[CVPR 2026] Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval

Zhiheng Fu1    Yupeng Hu1✉,    Qianyun Yang1    Shiqi Zhang1    Zhiwei Chen1    Zixu Li1

1School of Software, Shandong University    
✉ Corresponding author  

AirKnow Teaser

Paper page Author Page PyTorch Python stars

📌 Introduction

Welcome to the official repository for Air-Know. This is about Noisy Correspondence Learning (NCL) and Composed Image Retrieval (CIR).

Disclaimer: This codebase is intended for research purposes.

📢 News and Updates

  • [2026-04-22] 🚀 Arxiv version is released.
  • [2026-04-02] 🚀 All codes are released.
  • [2026-02-21] 🔥 Air-Know is accepted by CVPR 2026. Codes are coming soon.

Air-Know Pipeline (based on LAVIS)

airknow architecture

Figure 1. The proposed Air-Know consists of three primary modules: (a) External Prior Arbitration leverages an offline multimodal expert to generate reliable arbitration priors for CIR triplets, bypassing the unreliable small-loss hypothesis. (b) Expert-Knowledge Internalization transfers these priors into a lightweight proxy network, structurally preventing the memorization of ambiguous partial matches. Finally, (c) Dual-Stream Reconciliation dynamically integrates the internalized knowledge to provide robust online feedback, guiding the final representation learning. Figure best viewed in color.

Table of Contents

🏃‍♂️ Experiment-Results

CIR Task Performance

💡 Note for Fully-Supervised CIR Benchmarking:
🎯 The 0% noise setting in the tables below is equivalent to the traditional fully-supervised CIR paradigm. We highlight this 0% block to facilitate direct and fair comparisons for researchers working on conventional supervised methods.

FashionIQ:

Table 1. Performance comparison on FashionIQ validation set in terms of R@K (%). The best result under each noise ratio is highlighted in bold, while the second-best result is underlined.

CIRR:

Table 2. Performance comparison on the CIRR test set in terms of R@K (%) and Rsub@K (%). The best and second-best results are highlighted in bold and underlined, respectively.

⬆ Back to top


📦 Install

1. Clone the repository

git clone https://github.com/ZhihFu/Air-Know
cd Air-Know

2. Setup Python Environment

The code is evaluated on Python 3.8.10 and CUDA 12.6. We recommend using Anaconda to create an isolated virtual environment:

conda create -n conesep python=3.8
conda activate conesep

# Install PyTorch (The evaluated environment uses Torch 2.1.0 with CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)

# Install core dependencies
pip install scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16

⬆ Back to top


📂 Project Structure

To help you navigate our codebase quickly, here is an overview of the main components:

├── lavis/                 # Core model directory (built upon LAVIS)
│   └── models/
│       └── blip2_models/
│           └── blip2_cir.py   # 🧠 The core model implementation.
├── train_BLIP2.py        # 🚂 Main training script
├── test_BLIP2.py                # 🧪 General evaluation script
├── cirr_sub_BLIP2.py      # 📤 Script to generate submission files for the CIRR dataset
├── datasets.py            # 📊 Data loading and processing utilities
└── utils.py               # 🛠️ Helper functions (logging, metrics, etc.)

💾 Data Preparation

Before training or testing, you need to download and structure the datasets.

Download the CIRR / FashionIQ dataset from CIRR official repo and FashionIQ official repo.

Organize the data as follows:

1) FashionIQ:

├── FashionIQ
│   ├── captions
|   |   ├── cap.dress.[train | val].json
|   |   ├── cap.toptee.[train | val].json
|   |   ├── cap.shirt.[train | val].json

│   ├── image_splits
|   |   ├── split.dress.[train | val | test].json
|   |   ├── split.toptee.[train | val | test].json
|   |   ├── split.shirt.[train | val | test].json

│   ├── dress
|   |   ├── [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]

│   ├── shirt
|   |   ├── [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]

│   ├── toptee
|   |   ├── [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]

2) CIRR:

├── CIRR
│   ├── train
|   |   ├── [0 | 1 | 2 | ...]
|   |   |   ├── [train-10108-0-img0.png | train-10108-0-img1.png | ...]

│   ├── dev
|   |   ├── [dev-0-0-img0.png | dev-0-0-img1.png | ...]

│   ├── test1
|   |   ├── [test1-0-0-img0.png | test1-0-0-img1.png | ...]

│   ├── cirr
|   |   ├── captions
|   |   |   ├── cap.rc2.[train | val | test1].json
|   |   ├── image_splits
|   |   |   ├── split.rc2.[train | val | test1].json

(Note: Please modify datasets.py if your local data paths differ from the default setup.)

⬆ Back to top


🚀 Quick Start

1. Training under Noisy Settings

In our implementation, we introduce the noise_ratio parameter to simulate varying degrees of NTC (Noisy Triplet Correspondence) interference. You can reproduce the experimental results from the paper by modifying the --noise_ratio parameter (default options evaluated are 0.0, 0.2, 0.5, 0.8).

Training on FashionIQ:

python train_BLIP2.py \
    --dataset fashioniq \
    --fashioniq_path "/path/to/FashionIQ/" \
    --model_dir "./checkpoints/fashioniq_noise0.8" \
    --noise_ratio 0.8 \
    --batch_size 256 \
    --num_epochs 20 \
    --lr 1e-5

Training on CIRR:

python train_BLIP2.py \
    --dataset cirr \
    --cirr_path "/path/to/CIRR/" \
    --model_dir "./checkpoints/cirr_noise0.8" \
    --noise_ratio 0.8 \
    --batch_size 256 \
    --num_epochs 20 \
    --lr 2e-5

2. Testing

To generate the prediction files on the CIRR dataset for submission to the CIRR Evaluation Server, run the following command:

python src/cirr_test_submission.py checkpoints/cirr_noise0.8/

(The corresponding script will automatically output .json based on the generated best checkpoints in the folder for online evaluation.)

⬆ Back to top


🙏 Acknowledgements

This codebase is heavily inspired by and built upon the excellent Salesforce LAVIS, SPRC and TME library. We thank the authors for their open-source contributions.

⬆ Back to top

✉️ Contact

For any questions, issues, or feedback, please open an issue on GitHub or reach out to us at fuzhiheng8@gmail.com

⬆ Back to top

Ecosystem & Other Works from our Team

TEMA
TEMA (ACL'26)
Paper | Project | Code
ConeSep
ConeSep (CVPR'26)
Paper | Project | Code | Blog Post (Chinese)
HABIT
HABIT (AAAI'26)
Paper | Project | Code
ReTrack
ReTrack (AAAI'26)
Paper | Project | Code
INTENT
INTENT (AAAI'26)
Paper | Project | Code
HUD
HUD (ACM MM'25)
Paper | Project | Code
OFFSET
OFFSET (ACM MM'25)
Paper | Project | Code
ENCODER
ENCODER (AAAI'25)
Paper | Project | Code

📝⭐️ Citation

If you find our work or this code useful in your research, please consider leaving a Star⭐️ or Citing📝 our paper 🥰. Your support is our greatest motivation!

@InProceedings{Air-Know,
    title={Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval},
    author={Fu, Zhiheng and Hu, Yupeng and Qianyun Yang and Shiqi Zhang and Chen, Zhiwei and Li, Zixu},
    booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    year = {2026}
}

⬆ Back to top


📄 License

This project is released under the terms of the LICENSE file included in this repository.


If this project helps you, please leave a Star!

GitHub stars