README.md

March 4, 2026 Β· View on GitHub

✨ VCP: Visual Consensus Prompting for Co-Salient Object Detection ✨

πŸ“’ News

  • [2026/03/04] πŸš€ Released VCP weights (DUT_class+COCO-SEG) and inference code.
  • [2025/06/17] πŸ‘‰ paper link
  • [2025/02/27] πŸŽ‰πŸŽ‰πŸŽ‰ Accepted to CVPR 2025.

πŸ’‘ Abstract

Existing co-salient object detection (CoSOD) methods generally employ a three-stage architecture (i.e., encoding, consensus extraction & dispersion, and prediction) along with a typical full fine-tuning paradigm. Although they yield certain benefits, they exhibit two notable limitations: 1) This architecture relies on encoded features to facilitate consensus extraction, but the meticulously extracted consensus does not provide timely guidance to the encoding stage. 2) This paradigm involves globally updating all parameters of the model, which is parameter-inefficient and hinders the effective representation of knowledge within the foundation model for this task. Therefore, in this paper, we propose an interaction-effective and parameter-efficient concise architecture for the CoSOD task, addressing two key limitations. It introduces, for the first time, a parameter-efficient prompt tuning paradigm and seamlessly embeds consensus into the prompts to formulate task-specific Visual Consensus Prompts (VCP). Our VCP aims to induce the frozen foundation model to perform better on CoSOD tasks by formulating task-specific visual consensus prompts with minimized tunable parameters. Concretely, the primary insight of the purposeful Consensus Prompt Generator (CPG) is to enforce limited tunable parameters to focus on co-salient representations and generate consensus prompts. The formulated Consensus Prompt Disperser (CPD) leverages consensus prompts to form task-specific visual consensus prompts, thereby arousing the powerful potential of pre-trained models in addressing CoSOD tasks. Extensive experiments demonstrate that our concise VCP outperforms 13 cutting-edge full fine-tuning models, achieving the new state of the art (with 6.8% improvement in F_m metrics on the most challenging CoCA dataset).

3

πŸ—οΈ Consensus Prompt Generator & Consensus Prompt Disperser

πŸ›οΈ Model Zoo

VCP_ModelSegformerDatasetPrediction results
DUT_class+COCO-SEGb4Testgoogle-drive Hug

⚠️ Please download the following before running inference:

  1. VCP_Model β†’ place it in ./train_segformer_vcp_cosod/
  2. Segformer model β†’ place it in ./
  3. CoSOD test dataset β†’ place it in your configured data path

πŸ“ˆ Quantitative and qualitative comparison with SOTA methods

c93a3a49bc1bd12a41683c1be8b3c1b2 fig 6

πŸ“ˆ Extention to RGB-D CoSOD task

We use the most straightforward early fusion strategy, which does not introduce additional parameters, to validate the effectiveness and generalization of the proposed VCP for the RGB-D CoSOD task. Quantitative and qualitative comparison with SOTA methods: 36b103e8f61f99d42c0a648cd44a1b1f 8

🏁 Quick Start

# 1️⃣ Create and activate the conda environment
conda create -n CoSOD python=3.10 -y
conda activate CoSOD

# 2️⃣ Install PyTorch + Torchvision (CUDA 11.3)
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 \
  --extra-index-url https://download.pytorch.org/whl/cu113

# 3️⃣ Install OpenMMLab key packages
pip install mmcv-full==1.7.1
pip install mmsegmentation==0.30.0
pip install mmcls==0.25.0

# 4️⃣ Run inference
# Replace # with your GPU id
CUDA_VISIBLE_DEVICES=0 python test_CoSOD.py

πŸ“Œ Citation

@inproceedings{wang2025visual,
  title={Visual consensus prompting for co-salient object detection},
  author={Wang, Jie and Yu, Nana and Zhang, Zihao and Han, Yahong},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={9591--9600},
  year={2025}
}