RAM: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport
March 25, 2025 ยท View on GitHub
Official implementation of the paper in CVPR 2025:
Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport
Hao Tan, Zichang Tan, Jun Li, Ajian Liu, Jun Wan, Zhen Lei
๐จ Introduction
RAM is an efficient matching framework for OVMLR (Open-Vocabulary Multi-Label Recognition). To address the urgent problems in existing methods, RAM involves (1) LLA to recover regional semantics, and (2) KCOT to find precise region-to-label matching.
๐ง Installation
Install the environment through conda and pip is recommended:
conda create -n ram python=3.10
conda activate ram
# Install the dependencies
pip install -r requirements.txt
๐ฏ Running the code
model/model.py: Implementation of RAM modelmodel/ot_solver.py: Implementation of Sinkhorn Algorithmclip/adapters.py: Implementation of LLA (Local Adapter)loss/mmc_loss.py: Implementation of MMC loss (Multi-Matching loss)
Run the following code to start training:
python train.py --config_file configs/coco.yml
Use wandb to log the running:
python train.py --config_file configs/coco.yml WANDB True
๐ฌ Discussion
The core contribution is the OT-based matching pipeline, which we found beneficial to the OVMLR task while remaining highly efficient. The matching framework can be easily extended to dense prediction tasks (e.g., semantic segmentation). Welcome to transfer our approach to the segmentation scenarios.
If you find our work useful, please cite our paper:
@inproceedings{tan2025recoverandmatch,
title={Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport},
author={Hao Tan, Zichang Tan, Jun Li, Ajian Liu, Jun Wan, Zhen Lei},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}
Acknowledgements
This repo benefits from MaPLe, CLIP-Surgery and POT. Thanks for their wonderful works.