Training Large Language Models to Reason Efficiently
April 13, 2025 ยท View on GitHub
This is the codebase for our paper "Training Large Language Models to Reason Efficiently". The codebase has been tested on GH200 GPUs with Python 3.10.15 and CUDA 12.6. Other environments might require different tweaks in installation of Flash Attention or vLLM.
Trained Checkpoints
Trained checkpoints can be found at this link
Installation
conda create -n efficient_reasoning python=3.10.15
conda activate efficient_reasoning
cd utils/latex2sympy
pip install -e .
cd ../../
pip install -e .
Dataset
Download the dataset used in the paper using:
huggingface-cli download daman1209arora/compression_dataset --repo-type dataset --local-dir datasets/compression_dataset
This dataset is a random split created using easily parsed problems from the MATH, cn k12, AIME, AoPS and Olympiad subsets Numina Math dataset.
Design
Our codebase is adapted using the OpenRLHF library.
For minimal changes to the codebase, we launch a remote reward server defined in reward_server/math_server.py
which is then passed to the trainer defined in OpenRLHF.
Usage
For an illustrative example, we provide scripts to run on a slurm cluster:
- 1.5B with 4 GH200 GPUs on 1 node.
run_rloo_1.5B.sh - 7B with 8 GH200 GPUs on 2 nodes.
run_rloo_7B.sh
To train the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B model, set WANDB_KEY and ALPHA in run_rloo_1.5B.sh and run the following command:
sbatch run_rloo_1.5B.sh
Evaluation
To evaluate a model, use the script provided in the main directory:
python evaluate_model.py \
--model_path='scale:1.5B_alpha:0.1/' \
--dataset=openai/gsm8k \
--scale=1.5B
Citation
If you find this code repository useful, please cite us!
@misc{arora2025traininglanguagemodelsreason,
title={Training Language Models to Reason Efficiently},
author={Daman Arora and Andrea Zanette},
year={2025},
eprint={2502.04463},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.04463},
}