Multimodal Label Relevance Ranking via Reinforcement Learning (ECCV2024)

August 6, 2025 · View on GitHub

This is the official PyTorch implementation of LR2PPO. The ECCV2024 paper is available at arXiv.
Introduction video: YouTube

Getting Started

Data Preparation

For LRMovieNet Benchmark

For MSLR-Web10K → MQ2008 Transfer Task

Initialization Weights

Download required weights for both benchmarks:

  • roberta_base_en_model and vit_base_patch16_224_model
  • Source: from Google Drive or from its official repositories
  • Save in: ./pretrained_models/

Prerequisites

pip3 install -r requirements.txt

Hardware Requirement: 4 GPUs

Usage Instructions

For LRMovieNet Benchmark

# Stage 1: Base Model
sh pointwise.sh <your_stage1>

# Stage 2: Reward Model
sh reward_pair_dataloader.sh <your_stage2>

# Stage 3: LR<sup>2</sup>PPO
sh ppo.sh <your_stage3>

# Evaluation
sh ppo_eval.sh <your_eval>

For MSLR-Web10K → MQ2008 Transfer Task

# Stage 1: Base Model
sh pointwise_trad.sh <your_stage1>

# Stage 2: Reward Model
sh reward_trad.sh <your_stage2>

# Stage 3: LR<sup>2</sup>PPO
sh ppo_trad.sh <your_stage3>

# Evaluation
sh ppo_eval_trad.sh <your_eval>

Model Checkpoints

LRMovieNet Benchmark

MSLR-Web10K → MQ2008 Transfer

License

See LICENSE for details.

Acknowledgments

Code components borrowed from:

We are grateful for these excellent works and repositories.

Citation

If you found our work helpful in your research, please consider citing it.

@inproceedings{guo2024multimodal,
  title={Multimodal Label Relevance Ranking via Reinforcement Learning},
  author={Guo, Taian and Zhang, Taolin and Wu, Haoqian and Li, Hanjun and Qiao, Ruizhi and Sun, Xing},
  booktitle={European Conference on Computer Vision},
  pages={391--408},
  year={2024},
  organization={Springer}
}