OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure

May 24, 2025 ยท View on GitHub

๐ŸŽ‰๐ŸŽ‰OPT-Tree has been accepted has been accepted by TACL.

Contents

Introduction

OPT-Tree

We propose an adaptive and scalable draft tree structure in speculative decoding, which supports for any autoregressive draft models. More than 10 tokens can be generated in a single decoding step with OPT-Tree. An example is shown below: image Blue tokens are drafted by llama-2-chat-7b and verified by llama-2-chat-70b in a single decoding step. Red tokens are generated by llama-2-chat-70b.

Installation

pip install -r requirements.txt

Demo

With independent draft models

export CUDA_VISIBLE_DEVICES=0 #Also support for multiple GPUs
python demo_opt_classic.py

With EAGLE draft models

export CUDA_VISIBLE_DEVICES=0 #Also support for multiple GPUs
python demo_opt_eagle.py

Evaluation on datasets

With independent draft models

export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m evaluation.eval_opt_classic \
		 --draft-model-path JackFram/llama-68m \
		 --base-model-path meta-llama/Llama-2-7b-chat-hf \
		 --bench-name mt_bench \
		 --answer-file ./mt_classic_opt.jsonl \
		 --temperature 0 \
		 --nodes 60 \
		 --threshold 0.5 \
		 --max_depth 10

With EAGLE draft models

EAGLE draft models can be downloaded from https://github.com/SafeAILab/EAGLE.

export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m evaluation.eval_opt_eagle \
		 --ea-model-path yuhuili/EAGLE-llama2-chat-7B \
		 --base-model-path meta-llama/Llama-2-7b-chat-hf \
		 --bench-name mt_bench \
		 --answer-file ./mt_eagle_opt.jsonl \
		 --temperature 0 \
		 --nodes 60 \
		 --threshold 0.5 \
		 --max_depth 10

Citation

@article{wang2025opt,
  title={Opt-tree: Speculative decoding with adaptive draft tree structure},
  author={Wang, Jikai and Su, Yi and Li, Juntao and Xia, Qingrong and Ye, Zi and Duan, Xinyu and Wang, Zhefeng and Zhang, Min},
  journal={Transactions of the Association for Computational Linguistics},
  volume={13},
  pages={188--199},
  year={2025},
  publisher={MIT Press 255 Main Street, 9th Floor, Cambridge, Massachusetts 02142, USA~โ€ฆ}
}