README.md

May 25, 2026 · View on GitHub

(ACM MM 2025) OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval

1School of Software, Shandong University    
2Department of Data Science, City University of Hong Kong,    
3School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen),    
✉ Corresponding author  

ACM MM 2025 arXiv Paper page Author Page PyTorch Python stars

Accepted by ACM MM 2025: A novel network designed to address visual inhomogeneity and text-priority biases in Composed Image Retrieval (CIR) through dominant portion segmentation and textually guided focus revision.

📌 Introduction

Welcome to the official repository for OFFSET (Segmentation-based Focus Shift Revision for Composed Image Retrieval).

Existing CIR approaches often overlook the inhomogeneity between dominant and noisy portions in visual data, leading to query feature degradation. Furthermore, they ignore the priority of textual data in the image modification process, resulting in a visual focus bias. OFFSET tackles these limitations using a focus mapping-based feature extractor and a textually guided focus revision module, achieving State-of-the-Art (SOTA) performance across multiple datasets.

⬆ Back to top

📢 News

  • [2026-03-20] 🚀 We migrate the all training and evaluation codes of OFFSET from Google Drive to a GitHub repository.
  • [2025-07-05] 🔥 OFFSET has been accepted by ACM MM 2025.
  • [2024-12-26] 📍 We release the main codes and data of OFFSET!

⬆ Back to top

✨ Key Features

Our framework introduces key innovative modules to achieve precise multimodal semantic alignment:

  • 🔍 Dominant Portion Segmentation: Utilizes visual language models to generate image captions as a role-supervised signal, dividing dominant and noisy regions to effectively mask noise information.
  • 🔗 Dual Focus Mapping: Features Visual Focus Mapping (VFM) and Textual Focus Mapping (TFM) branches. Guided by the dominant segmentation, it accomplishes adaptive focus mapping on both visual and textual data.
  • 🧩 Textually Guided Focus Revision: Utilizes the modification requirements embedded in the textual feature to perform adaptive focus revision on the reference image, enhancing the perception of the modification focus.
  • 🏆 SOTA Performance: Demonstrates superior generalization and achieves remarkable improvements across both fashion-domain (FashionIQ, Shoes) and open-domain (CIRR) datasets.

⬆ Back to top

🏗️ Architecture

OFFSET architecture

Figure 1. The overall architecture of OFFSET. It consists of three key modules: Dominant Portion Segmentation, Dual Focus Mapping, and Textually Guided Focus Revision.

⬆ Back to top

📊 Experiment Results

OFFSET consistently outperforms existing baselines on widely-used datasets, surpassing strong competitors like DQU-CIR and ENCODER.

1. FashionIQ & Shoes Datasets

(Evaluated using Recall@K)

FashionIQ and Shoes Results FashionIQ and Shoes Results

2. CIRR Dataset

(Evaluated using R@K and R_subset@K)

CIRR Results

⬆ Back to top


📑 Table of Contents


📂 Repository Structure

Our codebase is highly modular. Here is a brief overview of the core files:

OFFSET/
├── cirr_test_submission.py# 📄 CIRR submission file generator
├── datasets.py            # 📚 Dataset loader and preprocessing
├── model_OFFSET.py        # 🧠 OFFSET model architecture and forward pass
├── test.py                # 🧪 Evaluation/Test entry point
├── train.py               # 🚀 Training entry point
├── utils.py               # 🛠️ Utility functions (metrics, helper methods)
└── README.md              # 📝 Documentation and result visualization

This section helps users quickly locate the core components and get started with development.

🚀 Installation

1. Clone the repository

git clone https://github.com/ZivChen-Ty/OFFSET.git
cd OFFSET

2. Setup Environment We recommend using Conda to manage your environment:

conda create -n offset_env python=3.8.10
conda activate offset_env

# Install PyTorch (Ensure it matches your CUDA version. Tested on PyTorch 2.0.0, NVIDIA A40 48G)
pip install torch==2.0.0 torchvision torchaudio --index-url [https://download.pytorch.org/whl/cu118](https://download.pytorch.org/whl/cu118)

# Install required packages
pip install -r requirements.txt

📂 Data Preparation

🛟【OURS】Pre-computed Dominant Portion Segmentation Data (Official Release)

The dominant portion segmentation data of OFFSET is available at Google Drive.

🔥 This is our official released data for result reproduction.

OFFSET is evaluated on FashionIQ, Shoes, and CIRR. Please download the datasets from their official sources and arrange them as follows.

Shoes

Download the Shoes dataset following the instructions in the official repository.

After downloading the dataset, ensure that the folder structure matches the following:

├── Shoes
│   ├── captions_shoes.json
│   ├── eval_im_names.txt
│   ├── relative_captions_shoes.json
│   ├── train_im_names.txt
│   ├── [womens_athletic_shoes | womens_boots | ...]
|   |   ├── [0 | 1]
|   |   ├── [img_womens_athletic_shoes_375.jpg | descr_womens_athletic_shoes_734.txt | ...]

FashionIQ

Download the FashionIQ dataset following the instructions in the official repository.

After downloading the dataset, ensure that the folder structure matches the following:

├── FashionIQ
│   ├── captions
|   |   ├── cap.dress.[train | val | test].json
|   |   ├── cap.toptee.[train | val | test].json
|   |   ├── cap.shirt.[train | val | test].json

│   ├── image_splits
|   |   ├── split.dress.[train | val | test].json
|   |   ├── split.toptee.[train | val | test].json
|   |   ├── split.shirt.[train | val | test].json

│   ├── dress
|   |   ├── [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]

│   ├── shirt
|   |   ├── [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]

│   ├── toptee
|   |   ├── [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]

CIRR

Download the CIRR dataset following the instructions in the official repository.

After downloading the dataset, ensure that the folder structure matches the following:

├── CIRR
│   ├── train
|   |   ├── [0 | 1 | 2 | ...]
|   |   |   ├── [train-10108-0-img0.png | train-10108-0-img1.png | ...]

│   ├── dev
|   |   ├── [dev-0-0-img0.png | dev-0-0-img1.png | ...]

│   ├── test1
|   |   ├── [test1-0-0-img0.png | test1-0-0-img1.png | ...]

│   ├── cirr
|   |   ├── captions
|   |   |   ├── cap.rc2.[train | val | test1].json
|   |   ├── image_splits
|   |   |   ├── split.rc2.[train | val | test1].json

🏃‍♂️ Quick Start

1. Training the Model

Train OFFSET on Shoes, FashionIQ, or CIRR using the train.py script.

python3 train.py \
    --model_dir ./checkpoints/ \
    --dataset {shoes, fashioniq, cirr} \
    --cirr_path "path/to/CIRR" \
    --fashioniq_path "path/to/FashionIQ" \
    --shoes_path "path/to/Shoes"

2. Test for CIRR

To generate the predictions file for uploading to the CIRR Evaluation Server using our model, please execute the following command:

python src/cirr_test_submission.py model_path

(Where model_path is the path to the OFFSET model checkpoint on CIRR)

🤝 Acknowledgements

This project builds upon recent advancements in Composed Image Retrieval and Vision-Language pre-training. We express our sincere gratitude to the open-source community for their contributions. Supported in part by the National Natural Science Foundation of China.

✉️ Contact

If you have any questions, feel free to open an issue or reach out to me zivczw@gmail.com ☺️

Ecosystem & Other Works from our Team

TEMA
TEMA (ACL'26)
Paper | Project | Code
ConeSep
ConeSep (CVPR'26)
Paper | Project | Code | Blog Post (Chinese)
Air-Know
Air-Know (CVPR'26)
Paper | Project | Code | Blog Post (Chinese)
HABIT
HABIT (AAAI'26)
Project | Code | Paper
ReTrack
ReTrack (AAAI'26)
Project | Code | Paper
INTENT
INTENT (AAAI'26)
Project | Code | Paper
HUD
HUD (ACM MM'25)
Project | Code | Paper
ENCODER
ENCODER (AAAI'25)
Project | Code | Paper

📝⭐️ Citation

If you find our work or this code useful in your research, please consider leaving a Star⭐️ or Citing📝 our paper 🥰. Your support is our greatest motivation!

@inproceedings{OFFSET, 
  title = {OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval}, 
  author = {Chen, Zhiwei and Hu, Yupeng and Li, Zixu and Fu, Zhiheng and Song, Xuemeng and Nie, Liqiang}, 
  booktitle = {Proceedings of the ACM International Conference on Multimedia}, 
  pages = {6113–6122}, 
  year = {2025}
}

🫡 Support & Contributing

We welcome all forms of contributions! If you have any questions, ideas, or find a bug, please feel free to:

  • Open an Issue for discussions or bug reports.
  • Submit a Pull Request to improve the codebase.

⬆ Back to top

📄 License

This project is released under the terms of the LICENSE file included in this repository.

OFFSET Demo



Star Issues Pull Request



Typing SVG