NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval (CVPR'2025 🔥)

April 14, 2025 · View on GitHub

The official implementation of CVPR 2025 paper: NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval.

TL;DR: NeighborRetr tackles the hubness problem in cross-modal retrieval by distinguishing between good hubs (relevant) and bad hubs (irrelevant) during training, offering a direct solution rather than relying on post-processing methods that require prior data distributions.

📌 Citation

If you find this paper useful, please consider starring 🌟 this repo and citing 📑 our paper:

@article{lin2025neighborretr,
  title={NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval},
  author={Lin, Zengrong and Wang, Zheng and Qian, Tianwen and Mu, Pan and Chan, Sixian and Bai, Cong},
  journal={arXiv preprint arXiv:2503.10526},
  year={2025}
}

The hubness problem in cross-modal retrieval refers to the phenomenon where certain items (hubs) frequently emerge as the nearest neighbors to many other samples, while the majority of samples rarely appear as neighbors. This leads to biased representations and degraded retrieval accuracy. Unlike previous approaches that apply post-hoc normalization techniques during inference, NeighborRetr introduces a novel approach that:

Distinguishes between good hubs (semantically relevant) and bad hubs (semantically irrelevant)
Applies adaptive neighborhood adjustment during training
Employs uniform regularization to balance hub formation

😍 Visualization

Our method significantly improves the quality of nearest neighbors, reducing irrelevant hubs and promoting more meaningful semantic relationships:

🔄 Updates

[2025/04/13]: Code released! 🎉
[2025/03/14]: Initial version submitted to arXiv.
[2025/02/27]: Our paper is accepted to CVPR 2025!

🚀 Quick Start

Setup

Environment Setup

# Create and activate conda environment
conda create -n NeighborRetr python=3.8 -y
conda activate NeighborRetr

# Install dependencies
pip install -r requirements.txt

Download CLIP Model

cd NeighborRetr/models
wget https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt
# Optional: for ViT-B-16
# wget https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt

Download Datasets

Datasets	Baidu Yun
MSR-VTT	Download
MSVD	Download
ActivityNet	Download
DiDeMo	Download

Training

Train on MSR-VTT

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch \
--master_port 4501 \
--nproc_per_node=4 \
main_retrieval.py \
--do_train 1 \
--workers 8 \
--epochs 5 \
--batch_size 128 \
--batch_size_val 128 \
--anno_path ${ANNO_PATH} \
--video_path ${VIDEO_PATH} \
--datatype msrvtt \
--max_words 24 \
--max_frames 12 \
--output_dir ${OUTPUT_PATH} \
--mb_batch 15 \
--memory_size 512

Train on ActivityNet Captions

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch \
--master_port 4501 \
--nproc_per_node=8 \
main_retrieval.py \
--do_train 1 \
--workers 8 \
--epochs 10 \
--batch_size 128 \
--batch_size_val 128 \
--anno_path ${ANNO_PATH} \
--video_path ${VIDEO_PATH} \
--datatype activity \
--max_words 64 \
--max_frames 64 \
--output_dir ${OUTPUT_PATH} \
--mb_batch 15 \
--memory_size 1024

📚 License

This repository is released under the Apache License 2.0. This permissive license allows users to freely use, modify, distribute, and sublicense the code while maintaining copyright and license notices.

✨ Acknowledgement

Our work is primarily built upon HBI, CLIP, CLIP4Clip. We extend our gratitude to all these authors for their generously open-sourced code and their significant contributions to the community.