RAM: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport

March 25, 2025 · View on GitHub

Official implementation of the paper in CVPR 2025:

Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport

Hao Tan, Zichang Tan, Jun Li, Ajian Liu, Jun Wan, Zhen Lei

📨 Introduction

RAM is an efficient matching framework for OVMLR (Open-Vocabulary Multi-Label Recognition). To address the urgent problems in existing methods, RAM involves (1) LLA to recover regional semantics, and (2) KCOT to find precise region-to-label matching.

RAM Framework

🔧 Installation

Install the environment through conda and pip is recommended:

conda create -n ram python=3.10
conda activate ram

# Install the dependencies
pip install -r requirements.txt

🎯 Running the code

model/model.py: Implementation of RAM model
model/ot_solver.py: Implementation of Sinkhorn Algorithm
clip/adapters.py: Implementation of LLA (Local Adapter)
loss/mmc_loss.py: Implementation of MMC loss (Multi-Matching loss)

Run the following code to start training:

python train.py --config_file configs/coco.yml

Use wandb to log the running:

python train.py --config_file configs/coco.yml WANDB True

💬 Discussion

The core contribution is the OT-based matching pipeline, which we found beneficial to the OVMLR task while remaining highly efficient. The matching framework can be easily extended to dense prediction tasks (e.g., semantic segmentation). Welcome to transfer our approach to the segmentation scenarios.

If you find our work useful, please cite our paper:

@inproceedings{tan2025recoverandmatch,
  title={Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport},
  author={Hao Tan, Zichang Tan, Jun Li, Ajian Liu, Jun Wan, Zhen Lei},
  booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

Acknowledgements

This repo benefits from MaPLe, CLIP-Surgery and POT. Thanks for their wonderful works.