MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models via Reinforcement Learning

September 17, 2025 · View on GitHub

arXiv Hugging Face

Installation

Run the setup script to configure the environment:

bash setup.sh

This script will:

  • Create conda environment medvlm-r1
  • Install necessary dependencies
  • Configure the open-r1-multimodal framework

Quick Start

Run Demo

Use the Jupyter notebook to quickly experience the model:

jupyter notebook demo.ipynb

The demo includes:

  • Model loading
  • Medical image VQA examples
  • Inference process demonstration

Example Output

The model generates structured reasoning process:

<think>
    The image is a magnetic resonance imaging (MRI) scan of a knee joint. The scan shows a chondral abnormality, which is a type of cartilage damage. This is evident from the irregular shape and the presence of a defect in the cartilage.
</think>

<answer>A</answer>

Dataset Download

Training and Testing Datasets

Download the HuatuoGPT-Vision dataset via Hugging Face CLI:

# 1) Install Hugging Face CLI (if not already)
pip install -U "huggingface_hub[cli]"

# 2) (Optional) Login if the dataset requires auth
# huggingface-cli login

# 3) Download the dataset to a local directory
# Replace <TARGET_DIR> with your local path, e.g., /data/datasets/PubMedVision
hf download FreedomIntelligence/PubMedVision \
  --repo-type dataset \
  --local-dir <TARGET_DIR> \
  --local-dir-use-symlinks False \
  --include "*"

# After download, set <DATASET_PATH_ROOT>=<TARGET_DIR> in your scripts

The dataset contains:

  • MRI, CT, X-ray medical images
  • Corresponding visual question-answer pairs
  • Multi-modal medical reasoning tasks

Training and Testing

Training

Run the training script:

bash train_script.sh

Note: Please update the following paths in the script:

  • <DATASET_NAME>: Dataset name
  • <GPU_NUM>: Number of GPUs
  • <LOG_PATH>: Log output path
  • <HF_CACHE_DIR>: Hugging Face cache directory
  • <WANDB_ENTITY>: Weights & Biases entity
  • <WANDB_PROJECT>: Project name
  • <OUTPUT_DIR_ROOT>: Output directory root path
  • <MODEL_REPO_OR_DIR>: Model path
  • <DATASET_PATH_ROOT>: Dataset root path
  • <MASTER_ADDR>: Master node address
  • <MASTER_PORT>: Master node port

Testing

Run the testing script:

bash test_script.sh

Note: Please update the following paths in the script:

  • <HF_CACHE_DIR>: Hugging Face cache directory
  • <CUDA_DEVICES>: CUDA devices
  • <MODEL_REPO_OR_DIR>: Model path
  • <DATASET_PATH_ROOT>: Dataset root path
  • <OUTPUT_DIR>: Output directory

Testing Configuration

The testing script supports the following parameters:

  • MODALITY: Modality type (MRI, CT, Ultrasound, Xray, Dermoscopy, Microscopy, Fundus)
  • PROMPT_TYPE: Prompt type (simple, complex)
  • BSZ: Batch size
  • MAX_NEW_TOKENS: Maximum new tokens to generate
  • DO_SAMPLE: Whether to sample
  • TEMPERATURE: Temperature parameter

Project Structure

r1-v-med/
├── demo.ipynb                    # Demo notebook
├── setup.sh                      # Setup script
├── train_script.sh               # Training script
├── test_script.sh                # Testing script
├── MRI_CT_XRAY_300each_dataset.json  # Test dataset
├── images/                       # Example images
│   ├── successful_cases/         # Successful cases
│   └── failure_cases/            # Failure cases
└── src/
    ├── eval/                     # Evaluation code
    │   └── test_qwen2vl_med.py   # Testing script
    ├── distill_r1/               # R1 distillation related
    └── open-r1-multimodal/       # Based framework
        └── src/open_r1/
            ├── grpo.py           # GRPO training code
            └── trainer/
                └── grpo_trainer.py  # GRPO trainer

Acknowledgement

Citation

If you find our work helpful, please cite:

@article{pan2025medvlm,
  title={MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning},
  author={Pan, Jiazhen and Liu, Che and Wu, Junde and Liu, Fenglin and Zhu, Jiayuan and Li, Hongwei Bran and Chen, Chen and Ouyang, Cheng and Rueckert, Daniel},
  journal={arXiv preprint arXiv:2502.19634},
  year={2025}
}

Base Frameworks

Our code is based on the following open-source projects:

Thanks to these excellent open-source projects for providing a solid foundation for our research.

License

This project is licensed under the Apache 2.0 License.