OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure

May 24, 2025 · View on GitHub

🎉🎉OPT-Tree has been accepted has been accepted by TACL.

Introduction
Installation
Demo
- With independent draft models
- With EAGLE draft models
Evaluation on datasets
- With independent draft models
- With EAGLE draft models
Citation

We propose an adaptive and scalable draft tree structure in speculative decoding, which supports for any autoregressive draft models. More than 10 tokens can be generated in a single decoding step with OPT-Tree. An example is shown below: Blue tokens are drafted by llama-2-chat-7b and verified by llama-2-chat-70b in a single decoding step. Red tokens are generated by llama-2-chat-70b.

Installation

pip install -r requirements.txt

Demo

With independent draft models

export CUDA_VISIBLE_DEVICES=0 #Also support for multiple GPUs
python demo_opt_classic.py

With EAGLE draft models

export CUDA_VISIBLE_DEVICES=0 #Also support for multiple GPUs
python demo_opt_eagle.py

Evaluation on datasets

With independent draft models

export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m evaluation.eval_opt_classic \
		 --draft-model-path JackFram/llama-68m \
		 --base-model-path meta-llama/Llama-2-7b-chat-hf \
		 --bench-name mt_bench \
		 --answer-file ./mt_classic_opt.jsonl \
		 --temperature 0 \
		 --nodes 60 \
		 --threshold 0.5 \
		 --max_depth 10

With EAGLE draft models

EAGLE draft models can be downloaded from https://github.com/SafeAILab/EAGLE.

export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m evaluation.eval_opt_eagle \
		 --ea-model-path yuhuili/EAGLE-llama2-chat-7B \
		 --base-model-path meta-llama/Llama-2-7b-chat-hf \
		 --bench-name mt_bench \
		 --answer-file ./mt_eagle_opt.jsonl \
		 --temperature 0 \
		 --nodes 60 \
		 --threshold 0.5 \
		 --max_depth 10

Citation

@article{wang2025opt,
  title={Opt-tree: Speculative decoding with adaptive draft tree structure},
  author={Wang, Jikai and Su, Yi and Li, Juntao and Xia, Qingrong and Ye, Zi and Duan, Xinyu and Wang, Zhefeng and Zhang, Min},
  journal={Transactions of the Association for Computational Linguistics},
  volume={13},
  pages={188--199},
  year={2025},
  publisher={MIT Press 255 Main Street, 9th Floor, Cambridge, Massachusetts 02142, USA~…}
}