Multimodal Label Relevance Ranking via Reinforcement Learning (ECCV2024)

August 6, 2025 · View on GitHub

This is the official PyTorch implementation of LR²PPO. The ECCV2024 paper is available at arXiv.
Introduction video: YouTube

Getting Started

Data Preparation

For LRMovieNet Benchmark

Download dataset: HuggingFace Hub
Optional: Original MovieNet dataset Official Website

For MSLR-Web10K → MQ2008 Transfer Task

Pre-processed datasets (datasets_trad) available: Google Drive
Optional preparation:
- Follow dataset generation guide: datasets_trad/README.md
- Access source datasets:
  • MSLR-Web10K: Microsoft Research
  • MQ2008: LETOR 4.0

Initialization Weights

Download required weights for both benchmarks:

roberta_base_en_model and vit_base_patch16_224_model
Source: from Google Drive or from its official repositories
Save in: ./pretrained_models/

Prerequisites

pip3 install -r requirements.txt

Hardware Requirement: 4 GPUs

Usage Instructions

For LRMovieNet Benchmark

# Stage 1: Base Model
sh pointwise.sh <your_stage1>

# Stage 2: Reward Model
sh reward_pair_dataloader.sh <your_stage2>

# Stage 3: LR<sup>2</sup>PPO
sh ppo.sh <your_stage3>

# Evaluation
sh ppo_eval.sh <your_eval>

For MSLR-Web10K → MQ2008 Transfer Task

# Stage 1: Base Model
sh pointwise_trad.sh <your_stage1>

# Stage 2: Reward Model
sh reward_trad.sh <your_stage2>

# Stage 3: LR<sup>2</sup>PPO
sh ppo_trad.sh <your_stage3>

# Evaluation
sh ppo_eval_trad.sh <your_eval>

Model Checkpoints

LRMovieNet Benchmark

Download: Google Drive

MSLR-Web10K → MQ2008 Transfer

Download: Google Drive

License

See LICENSE for details.

Acknowledgments

Code components borrowed from:

We are grateful for these excellent works and repositories.

Citation

If you found our work helpful in your research, please consider citing it.

@inproceedings{guo2024multimodal,
  title={Multimodal Label Relevance Ranking via Reinforcement Learning},
  author={Guo, Taian and Zhang, Taolin and Wu, Haoqian and Li, Hanjun and Qiao, Ruizhi and Sun, Xing},
  booktitle={European Conference on Computer Vision},
  pages={391--408},
  year={2024},
  organization={Springer}
}