🧠 Plan-and-Budget
March 2, 2026 · View on GitHub
🧠 Plan-and-Budget
Effective and Efficient Test-Time Scaling on Large Language Model Reasoning
Accepted at ICLR 2026 🎉
Junhong Lin*¹, Xinyue Zeng*², Jie Zhu², Song Wang², Julian Shun¹, Jun Wu⁴, Dawei Zhou²
¹ MIT CSAIL, ² Virginia Tech, ³ University of Virginia, ⁴ Michigan State University
(*Equal Contribution)
📌 Overview
Plan-and-Budget is a training-free test-time reasoning framework that improves both reasoning accuracy and efficiency in large language models (LLMs).
Modern reasoning LLMs often suffer from:
- 🔄 Overthinking --- excessive and redundant reasoning on simple queries\
- ⚡ Underthinking --- premature termination on complex tasks
We introduce:
- BAM (Budget Allocation Model) --- a theoretical framework for adaptive token allocation\
- Plan-and-Budget --- a practical inference-time implementation using structured decomposition and local budget scheduling\
- E³ (Efficiency-aware Effectiveness Evaluation) --- a principled metric balancing accuracy and compute
This repository contains the full inference and evaluation pipeline to reproduce our results.
⚙️ Environment Setup
conda create -n plan_budget python=3.12 -y
conda activate plan_budget
pip install vllm
pip install -r requirements.txt
🔐 Environment Variables
cp .env_template .env
Edit .env:
API_BASE(e.g.,http://localhost:7878/v1)API_KEY(use"DUMMY"for local vLLM)
You can specify a different config at runtime:
ENV_FILE=path/to/your/.env python ...
📂 Datasets
Download TravelPlanner database:
https://drive.google.com/file/d/1pF1Sw6pBmq2sFkJvm-LzJOqrmfWoQgxE/view?usp=drive_link
Unzip into:
dataset/TravelPlanner/
Pre-decomposed datasets:
- dataset/MATH-500\
- dataset/NaturalInstruction-Sampled-500\
- dataset/TravelPlanner
To re-run decomposition:
python -m dataset.break_down_question \
--num-workers 32 \
--queue-size 32 \
--dataset DATASET_NAME
DATASET_NAME ∈ {math, instruction, travelplanner}
🚀 Reproducing Experimental Results
Example (MATH-500):
python -m run.run_inf --num-workers 32 --dataset math --model vanilla
python -m run.run_inf --num-workers 32 --dataset math --model planned
python -m run.run_inf --num-workers 32 --dataset math --model global_budget
python -m run.run_inf --num-workers 32 --dataset math --model planned_global
Plan-and-Budget (Local Allocation):
python -m run.run_inf --dataset math --model planned_local_uniform
python -m run.run_inf --dataset math \
--model planned_local_weighted \
--decay polynomial --postfix polynomial
--postfix only affects log naming.
📊 Evaluation
For MATH-500 and NaturalInstructions: - Results computed automatically.
For TravelPlanner: - Requires structured JSON evaluation via a secondary LLM.
ENV_FILE=.env.eval python -m run.run_eval \
--dataset travelplanner \
--model MODEL_NAME \
--postfix POSTFIX
Ensure .env.eval specifies a model supporting JSON output.
📚 Citation
@inproceedings{lin2026plan,
title={Plan-and-Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning},
author={Lin, Junhong and Zeng, Xinyue and Zhu, Jie and Wang, Song and Shun, Julian and Wu, Jun and Zhou, Dawei},
booktitle={International Conference on Learning Representations (ICLR)},
year={2026}
}
⭐ If you find this repository useful, please consider starring it!