IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis
April 6, 2025 Β· View on GitHub
π Overview
IteRPrimE is a novel framework for zero-shot referring image segmentation that leverages an Iterative Grad-CAM Refinement Strategy (IGRS) and a Primary Word Emphasis Module (PWEM) to improve localization accuracyβespecially in cases with complex spatial descriptions. By using pre-trained vision-language models, IteRPrimE eliminates the need for further training or fine-tuning, achieving state-of-the-art performance on benchmarks such as RefCOCO, RefCOCO+, RefCOCOg, and PhraseCut.
β¨ News
- [2025.04.05]π₯ The v1 of the code is now officially open-sourced!π
- [2024.12.10]π₯ IteRPrimE is accepted by AAAI-2025! π₯³
π Installation
- Create and activate a Conda environment:
conda create -n iterprime python=3.8.19
conda activate iterprime
- Install the dependencies:
pip install -r requirements.txt
If you encounter any issues with the environment setup, you can refer to the configurations in the following two open-source projects for assistance:
ποΈ Preparing Pretrained Model Weights
Before you can run the code, you'll need to prepare the pretrained model weights and dataset. Please follow the steps below:
1. Create the necessary directories
In the root directory of the project, create two directories: checkpoints and data.
mkdir checkpoints
mkdir data
2. Download the pretrained model weights
Download the following pretrained model weights and place them in the checkpoints folder. You can download them directly from the given links.
- Mask2Former weights: model_final_f07440.pkl [pt6z]
- ALBEF weights: ALBEF.pth [8u5y]
- Stanza weights: pweight.zip [6f84]
- BERT-base-uncased weights: bert-base-uncased.zip[bggq]
After downloading, extract the weight files (if in a compressed format) into the checkpoints folder. Your folder structure should look like this:
<root_directory>/
β
βββ checkpoints/
β βββ model_final_f07440.pkl
β βββ ALBEF.pth
β βββ pweight/
β βββ bert-base-uncased/
βββ data/
3. Download the COCO dataset
Download the COCO dataset from the provided link and place it in the data folder.
- COCO dataset: coco.zip [egus]
After downloading, extract the coco.zip file to the data folder. The final directory structure should look like this:
<root_directory>/
β
βββ data/
β βββ coco/
βββ checkpoints/
Once the weights and dataset are in place, youβre ready to run the code and start testing the model.
π Running the Demo
To quickly verify your setup and see IteRPrimE in action, follow these steps:
- Navigate to the
demodirectory:
cd demo
- Run the main script:
Run following code to evaluate IteRPrimE on refcoco testA datasets:
python IteRPrimE.py --data-set refcoco --image-set testA
This script will:
-
Load the necessary configurations and pretrained model weights.
-
Perform zero-shot referring image segmentation on a sample image or dataset.
-
Output the segmentation results for you to inspect.
Depending on your configuration, the output images or logs may be saved to a specific folder (e.g., outputs/). Check the script and your config settings for details.
If you encounter any issues or have questions, feel free to open an issue or start a discussion.
π Citation
If you use IteRPrimE in your research, please cite our paper:
@article{wang2025iterprime,
title={IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis},
author={Wang, Yuji and Ni, Jingchen and Liu, Yong and Yuan, Chun and Tang, Yansong},
journal={arXiv preprint arXiv:2503.00936},
year={2025}
}
π€ Acknowledgements
We appreciate the support from our collaborators and funding agencies. Stay tuned for updatesβthe full code release will be coming soon!
We would also like to express our gratitude to the authors of GroundVLP and Mask2Former for their inspiring work and open-source contributions, which served as valuable references for this project.