How to train, infer, and evaluate ColonR1
January 2, 2026 Β· View on GitHub
Figure 1: Details of our colonoscopy-specific reasoning model, ColonR1.
π Installation guide
Important
π Troubleshooting guide. If you encounter any issues during installation or execution, please refer to our π Troubleshooting Guide for solutions to common problems.
-
First, clone the repository and install the required dependencies:
git clone git@github.com:ai4colonoscopy/Colon-X.git cd COLON-X -
Create and activate a Conda environment. Notably, our default setup uses CUDA 11.8, not guarantee other versions.
conda create -n colonr1 python=3.10 -y conda activate colonr1 pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118 pip install flash-attn --no-build-isolation pip install -r ColonR1/requirements.txt -
Download the pretrained weights, for inference.
- π¦ Qwen2.5-VL-3B-Instruct
- π¦ all-MiniLM-L6-v2
- Ours ColonR1 (πGoogle drive & π€Huggingface)
-
Prepare the data, for details, please refer to π here. We assume you have done this already.
-
Finally, double check and ensure your directory has the following structure.
π cache/ # all cached data, weights, and structured dataset files βββ π checkpoints/ # trained ColonR1 model checkpoints β βββ π ColonR1-Qwen2.5-VL-GRPO-thinking-StageII β βββ π data/ # dataset root containing all images and annotations β βββ π Positive-images/ # images with positive clinical findings (polyps, lesions, etc.) β βββ π Negative-images/ # normal images without pathology β βββ π JSON/ # annotation files for training / validation / testing β β βββ π Train-Val-merge/ # combined training + validation JSONs β β βββ π Test/ # test JSONs for inference and evaluation β βββ π download-weights/ # downloaded pretrained model weights β βββ π Qwen2.5-VL-3B-Instruct β βββ π gpt-oss-20b β βββ π all-MiniLM-L6-v2 β βββ π ColonR1/ # main ColonR1 codebase for training, inference, and evaluation
π Training
Before starting training, please update the configs as needed:
- Set
S1_OUTPUT_FILEandS1_OUTPUT_DIRβ the output name and path for Stage-I. - Set
IMAGE_ROOTandS1_JSON_FILEβ typicallycache/dataandColonReason_GRPO.json. - Set
S1_BASE_MODELβ path to the Qwen2.5-VL-3B-Instruct weights. - Set
S2_OUTPUT_FILEandS2_OUTPUT_DIRβ the output name and path for Stage-II.
Then start training:
bash ColonR1/script/train/ColonR1_grpo_thinking.sh
π Inference
Single-image Inference
To use ColonR1 for single-image chat, use the following command:
- Set
MODEL_PATHandIMAGE_PATHto the paths of the saved checkpoints and image you want to evaluate on, respectively. - Run
bash ColonR1/script/infer_eval/infer_single.sh, then enter your instruction and the result will be printed on the screen.
Batch Inference
We provide one-key inference code. If you use ColonEval or follow the same data organization format, you only need to modify a few configurations in ColonR1/script/infer_eval/infer.sh to perform inference.
Or you can infer it on your customized data
-
Set
IMAGE_BASE_PATHandROOT_PATHto the path ofcache/dataandcache/data/JSON/Test. -
Set
EXP_MODEL_IDto the path of the model weight you want to infer. -
Then use
bash ColonR1/script/infer_eval/infer.shto start inference. -
An example of an inference script is as follows:
#!/bin/bash IMAGE_BASE_PATH=cache/data ROOT_PATH=cache/data/JSON/Test EXP_MODEL_ID=cache/checkpoints/ft-exp/ColonR1-Qwen2.5-VL-GRPO-thinking-StageII mkdir -p $EXP_MODEL_ID/pred export CUDA_VISIBLE_DEVICES=0 nohup python ColonR1/serve/inference.py \ --model_path $EXP_MODEL_ID \ --image_dir $IMAGE_BASE_PATH \ --json_file $ROOT_PATH/ColonEval/Task_1_ColonEval.json \ --output_path $EXP_MODEL_ID/pred/pred_Task_1_ColonEval.json > $EXP_MODEL_ID/pred/nohup-pred_task1.txt 2>&1 &
Gradio Web Demo Inference
Note
What is Gradio? Gradio is an open-source Python library that allows you to quickly create customizable web-based interfaces for machine learning models. It enables users to interact with models through a user-friendly graphical interface, making it easier to demonstrate and test model capabilities without requiring extensive coding knowledge.
To launch the Gradio web demo for ColonR1, follow these steps:
conda activate colonr1
# `--model_path` should point to your ColonR1 model checkpoint
python ColonR1/serve/inference_gradio_web_demo.py --model_path cache/checkpoints/ft-exp/ColonR1-Qwen2.5-VL-GRPO-thinking-StageII
This will start a local web server, and you can access the demo by navigating to http://localhost:7860 in your web browser. You can upload colonoscopy images and interact with the ColonR1 model through the web interface.
The below image showcases an example predicted by our ColonR1 in an interactive manner in Gradio UI demo.
π― Evaluation
-
To perform the evaluation, Set
EXP_MODEL_IDto the path of the model you want to evaluate. -
Then, if you wish to use ColonEval for evaluation, set
EVAL_MODEtopilot. -
Finally, run the following command to begin the evaluation. (For ColonEval's environment configuration, please refer to π here)
conda activate coloneval bash ColonR1/script/infer_eval/eval.sh -
An example of an evaluation script is as follows:
#!/bin/bash EXP_MODEL_ID=cache/checkpoints/ft-exp/ColonR1-Qwen2.5-VL-GRPO-thinking-StageII EVAL_MODE=pilot python ColonR1/serve/understanding_eval.py \ --task_id 1 \ --data_type reasoning \ --eval_mode $EVAL_MODE \ --input_file $EXP_MODEL_ID/pred/pred_Task_1_ColonEval.json \ --output_file $EXP_MODEL_ID/pred/Task_1.txt > $EXP_MODEL_ID/pred/eval_task_1_log.txt 2>&1
Results
Here is the comparison of multimodal reasoning abilities under various fine-tuning methods. NS and SP denote the use of negative sampling and self-evolving prompting, respectively. Overall accuracy of ColonR1 on ColonEval is reported in the last column. All prediction results and evaluation scores for ColonR1 are available on πGoogle Drive.
Table 1: Comparison of multimodal reasoning abilities under
various fine-tuning methods.
Figure 2: Qualitative comparison of COLONR1 with Med-R1 and Qwen-SFT.