README.md
November 11, 2025 ยท View on GitHub
Introduction
We investigate Code-Integrated Reasoning, a approach where models generate and execute code during reasoning to enhance performance, particularly in complex mathematical tasks.
๐ Challenges & Strategies: We analyze the challenges in tool-augmented RL, such as instability and lack of exploration, and propose strategies that balance exploration and stability. These include progressively increasing the tool interaction budget, precisely matching the interaction boundaries and masking the external feedback.
๐ Performance: Our method demonstrates substantial performance gains, achieving state-of-the-art results across multiple benchmarks with an average accuracy of 52.4%, surpassing several competitive baselines.
๐ก Mechanistic Insights: We further provide an in-depth analysis of the mechanisms behind code-integrated reasoning, explaining why and how it is effective.
- Capability Expansion: Code integration significantly extends the modelโs capability boundaries, enhancing performance on complex tasks.
- Efficiency: It produces more concise and efficient reasoning paths compared to traditional long-chain-of-thought (long-CoT) methods.
- Error Feedback: Non-executable code generates informative error feedback, compelling the model to reflect and revise, ultimately improving accuracy.
- Selective Benefits: While highly effective for algebra, number theory, and combinatorics, code integration shows minimal impact on geometry problems.
Quick Start
Environment Setup
pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install flash-attn --no-build-isolation
cd verl/verl-main
pip install -e .
pip install vllm==0.8.5
pip install pepple, timeout_decorator
pip install math-verify[antlr4_9_3]
Train
cd verl/verl-main/scripts/run
bash ray_start.sh
bash run.sh
Eval
pip install "git+https://github.com/tongyx361/symeval.git"
cd evaluation
bash script/eval_qwen_math.sh
bash script/eval_qwen3.sh
Citation
@article{bai2025towards,
title={Towards Effective Code-Integrated Reasoning},
author={Bai, Fei and Min, Yingqian and Zhang, Beichen and Chen, Zhipeng and Zhao, Wayne Xin and Fang, Lei and Liu, Zheng and Wang, Zhongyuan and Wen, Ji-Rong},
journal={arXiv preprint arXiv:2505.24480},
year={2025}
}