2025-CVPR-TME

June 9, 2025 · View on GitHub

[CVPR 2025] Code for the paper "Learning with Noisy Triplet Correspondence for Composed Image Retrieval"

Preparation

Project structure

The proper project structure for running the code is as follows:

TME/
├── bert-base-uncased/
├── data/
│   ├── CIRR/
│   │   ├── captions/
│   │   │   ├── captions/
│   │   │   │   │ cap.rc2.test1.json
│   │   │   │   │ cap.rc2.train.json
│   │   │   │   │ cap.rc2.val.json
│   │   │   ├── image_splits/
│   │   │   │   │ split.rc2.test1.json
│   │   │   │   │ split.rc2.train.json
│   │   │   │   │ split.rc2.val.json
│   │   ├── dev/
│   │   │   │ dev-0-0-img0.png
│   │   │   │ ...
│   │   ├── test1/
│   │   │   │ test1-0-0-img0.png
│   │   │   │ ...
│   │   ├── train/
│   │   │   ├── 0/
│   │   │   ├── 1/
│   │   │   ├── ...
│   ├── FashionIQ/
│   │   ├── captions
│   │   │   │ cap.dress.train.json
│   │   │   │ cap.dress.val.json
│   │   │   │ ...
│   │   ├── image_splits
│   │   │   │ split.dress.train.json
│   │   │   │ split.dress.val.json
│   │   │   │ ...
│   │   ├── images
│   │   │   │ B00006M009.jpg
│   │   │   │ B00006M00B.jpg
│   │   │   │ ...
├── src/
│   │ ...
├── weight/
│   │ blip2_pretrained.pth
│   │ eva_vit_g.pth

The following commands will create a local Anaconda environment with the necessary packages installed. We provide all package versions, but using the exact same versions is not necessary to run the code; any version shall be fine. Note that Using MKL versions 2024.1+ with PyTorch may result in an "ImportError: undefined symbol: iJIT_NotifyEvent".

conda create -n TME -y python=3.10
conda activate TME
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

dataset setting

Please modify FashioniqPath and CIRRPath in data.py to point to your dataset paths. Also, check the folder names in data.py to ensure they match the names of your dataset folders. The default dataset settings are as follows:

# data.py:
FashioniqPath = './data/FashionIQ/' # end with '/'
CIRRPath = './data/CIRR/' # end with '/'

Pretrained wegiht

Download the BLIP2 weights and place them in the weight directory.

blip2_pretrained.pth: 'https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained.pth'
eva_vit_g.pth: 'https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/eva_vit_g.pth'

To use the online url during training, simply edit the following file: src/lavis/models/blip2_models/blip2.py line 85-98 and src/lavis/models/eva_vit.py line 428-447.

Runing

Precompute

Encodings via Frozen ViT remain unchanged during training. To save time, we precompute the ViT encodings using precompute.py, which creates a features directory containing the precomputed features inside the project directory TME.

python src/precompute.py \
    --dataset {'CIRR' or 'FashionIQ'} \
    --split {'test1', 'train' or 'val'} \
    --batch_size 512

Storing features requires a lot of space. If your storage is insufficient, you can call compute_vit_features() in precompute.py to generate the features (as a CPU tensor) and the name_to_idx dictionary. Then, replace load_image_features() with load_image_features_tensor_and_dict() in precompute_train.py and precompute_test.py to avoid loading features from local disk storage.

Training

python src/precompute_train.py \
    --exp_name {your_exp_name} \
    --shuffle_seed 42 \
    --seed 42 \
    --dataset {'CIRR' or FashionIQ} \
    --noise_ratio 0.0 \
    --nc_type "mix" \
    --batch_size 128 \
    --num_epochs 30 \
    --warmup_qformer 3 \
    --warmup_proj 2 \
    --warmup_last 1 \
    --partitioner "GMM" \
    --split_type "loss" \
    --threshold 0.5 \
    --lr "1e-5" \
    --lpm 1.0 \
    --lsa 1.0 \
    --lrd {'0.2' for CIRR and '0.1' for FashionIQ} \
    --save_training \
    --gpu {gpu_default_0}

Test or Evaluation

Note that the test mode is only for CIRR.

python src/precompute_test.py \
    --dataset {CIRR or FashionIQ} \
    --mode {validate or test} \
    --model-path {your_model_path} \
    --gpu {gpu_default_0} \
    --name {save_folder_name}

@InProceedings{TME,
    author    = {Li, Shuxian and He, Changhao and Liu, Xiting and Zhou, Joey Tianyi and Peng, Xi and Hu, Peng},
    title     = {Learning with Noisy Triplet Correspondence for Composed Image Retrieval},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {19628-19637}
}

Acknowledgement

Our implementation is based on CLIP4Cir and LAVIS.

2025-CVPR-TME

Preparation

Project structure

Prerequisites

dataset setting

Pretrained wegiht

Runing

Precompute

Training

Test or Evaluation

Checkpoints

Experiment Results

CIRR performance:

FashionIQ performance:

Citation

Acknowledgement