README.md
May 6, 2025 ยท View on GitHub
How to Use?
Installation
git clone https://github.com/cmu-l3/l1.git
cd l1
conda create -n l1 python=3.12
conda activate l1
pip install flash-attn --no-build-isolation
pip install git+https://github.com/volcengine/verl.git
pip install -r requirements.txt
Note: In the latest version of this repository, you are free to use latest version of verl, just make sure to update the configs accordingly.
Prepare Dataset
You can use scripts in scripts/data to prepare your own dataset.
Example, generate data for traininng L1-Exact:
python scripts/data/deepscaler_dataset.py
For L1-Max:
python scripts/data/deepscaler_dataset.py --use_both_both
For Evaluation on AIME2025, GPQA, LSAT and MMLU, you can use scripts in scripts/eval:
python scripts/data/generate_aime.py
python scripts/data/generate_gpqa.py
python scripts/data/generate_lsat.py
python scripts/data/generate_mmlu.py
Train Models
You can skip this step if you want to use our pre-trained models.
You can run scripts in scripts/train to train your own models. Make sure to specify the correct data path.
Evaluate Models
Use one of scripts/eval to evaluate your models. Make sure to specify the correct model path.
For example, evaluate L1-Exact on AIME2025:
./scripts/eval/eval_model_token.sh --model path/to/your/model --num-tokens <num_tokens> --datasets aime2025
Replicate Results
To replicate results for L1-Exact and L1-Max from the paper, you can use scripts in scripts/replicate.
- Prepare data:
./scripts/replicate/prepare_data.sh
- Evaluate models:
./scripts/replicate/eval_inference_exact.sh l3lab/L1-Qwen-1.5B-Exact
./scripts/replicate/eval_inference_max.sh l3lab/L1-Qwen-1.5B-Max
Acknowledgments
- We would like to thank DeepSeek for releasing Deepseek-r1 and distilled models,
- Qwen for releasing super-awesome Qwen-2.5 math Models, and
- Agentica for codebase, and opensourcing their models and datasets! This codebase is built on top of their work.
Citation
If you use L1/LCPO in your research, please cite:
@misc{aggarwal2025l1controllinglongreasoning,
title={L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning},
author={Pranjal Aggarwal and Sean Welleck},
year={2025},
eprint={2503.04697},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.04697},
}