PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning

December 7, 2025 ยท View on GitHub

This is the evalaution module for our work Perception-Aware Policy Optimization for Multimodal Reasoning

  • This module is also embedded into PAPO for convenient inference and evaluation
  • Feel free to directly use PAPO for complete training-evaluation workflow!

๐Ÿš€ Evaluation for PAPO

1. Env Setup

We follow the environment setup instructions from LLaMA-Factory:

cd PAPO-Eval
conda env create -f env.yml
conda activate papo_eval
pip install -e ".[all]"

2. Data Preprocessing

All evaluation data can be downloaded from: https://huggingface.co/datasets/PAPO-Galaxy/PAPO_eval


Prepare evaluation dataset for PAPO evaluation:

  • Set the specific dataset(s) you would like to use for evaluation:

    • AUTO_UNZIP (bool): Whether to automatically upzip images
      • If set to true, the downloaded image ZIP file will be automatically unzipped, and the ZIP file will be removed
    • SPLIT_NAME (str): Which dataset to use for evalaution. Current available datasets:
      • hiyouga/geometry3k: SPLIT_NAME="hiyouga_geometry3k"
      • AI4Math/MathVerse: SPLIT_NAME="AI4Math_MathVerse"
      • AI4Math/MathVista: SPLIT_NAME="AI4Math_MathVista"
      • We_Math/We_Math: SPLIT_NAME="We_Math"
      • FanqingM/MMK12: SPLIT_NAME="PAPO_MMK12"
      • Vision-dependent subset of AI4Math/MathVerse: SPLIT_NAME="AI4Math_MathVerse_vision_dependent"
      • BUAADreamer/clevr_count_70k: SPLIT_NAME="BUAADreamer_clevr_count_70k"
      • lscpku/LogicVista: SPLIT_NAME="lscpku_LogicVista"
      • MMMU/MMMU_Pro: SPLIT_NAME="MMMU_MMMU_Pro
  • Run data preprocessing

cd PAPO-Eval
bash papo_eval/preprocess/preprocess.sh

3. Run Evaluation

3.1 Run Model Inference

  • Please set the dataset and other eval parameters in PAPO-Eval/papo_eval/run_infer.sh

    • DATASET (str): The dataset you would like to run inference on
      • hiyouga/geometry3k: DATASET="hiyouga_geometry3k"
      • AI4Math/MathVerse: DATASET="AI4Math_MathVerse"
      • AI4Math/MathVista: DATASET="AI4Math_MathVista"
      • We_Math/We_Math: DATASET="We-Math_We-Math"
      • FanqingM/MMK12: SPLIT_NAME="PAPO_MMK12"
      • Vision-dependent subset of AI4Math/MathVerse: DATASET="AI4Math_MathVerse_vision_dependent"
      • BUAADreamer/clevr_count_70k: DATASET="BUAADreamer_clevr_count_70k"
      • lscpku/LogicVista: DATASET="lscpku_LogicVista"
      • MMMU/MMMU_Pro: DATASET="MMMU_MMMU_Pro"
    • Model (str): PAPO model you would like to run inference
      • For example: MODEL="PAPOGalaxy/PAPO-G-H-Qwen2.5-VL-7B"
      • Our model collection on Hugging Face: PAPO-Qwen
        • PAPO-GRPO model collection: PAPO-G
        • PAPO-DAPO model collection: PAPO-D
  • Run inference:

    cd PAPO-Eval
    bash papo_eval/run_infer.sh
    
  • Inference outputs will be saved under PAPO-Eval/infer_outputs

    • The first and last output line will also show the exact save path

3.2 Run Evaluation On Model Inference

  • Please set the dataset and other eval parameters in PAPO-Eval/papo_eval/run_eval.sh

    • JSONL_PATH (str): Path to your to-be-eval inference results
      • JSONL path: Directly give the JSONL path if evaluate accuracy of a specific dataset inference results
      • Model dir: Give only model dir without JSONL path if evaluate vision-dependent accuracy
    • N_ROLLOUT (int): Number of rollout
      • We set N_ROLLOUT=8 in our paper
  • Run evaluation:

    cd PAPO-Eval
    bash papo_eval/run_eval.sh
    
  • Detailed results will be saved to ./eval_results/<eval_output_name>.json

    • Results will also be printed out in the final section of the output, together with the exact save path of evaluation results

๐Ÿฅฐ Acknowledgement

Huge thanks for providing this awesome codebase!

  • We thank LLaMA-Factory team for providing this foundational codebase that we adapted to implement model inference and evaluation for PAPO.

๐Ÿ“ Citation

@article{wang2025perception,
  title={Perception-Aware Policy Optimization for Multimodal Reasoning},
  author={Wang, Zhenhailong and Guo, Xuehang and Stoica, Sofia and Xu, Haiyang and Wang, Hongru and Ha, Hyeonjeong and Chen, Xiusi and Chen, Yangyi and Yan, Ming and Huang, Fei and others},
  journal={arXiv preprint arXiv:2507.06448},
  year={2025}
}