README.md
January 29, 2026 · View on GitHub
This is the Pytorch implementation for our ICLR'26 paper: Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling.

Abstract

Currently, SpecTTS-Bench supports the evaluation of the following open source models: EAGLE-3, Speculative Sampling, Prompt Lookup Decoding, TokenRecycling, REST, Lookahead Decoding, PIA, SAM-Decoding, SAM[EAGLE-3].
Requirement
Install neccesary packages.
conda create -n specTTS python=3.10
torch==2.1.1+cu121
transformers==4.43.1 # for DeepSeek-R1-Distill-Llama-8B
transformers==4.53.1 # for Qwen3 series
More details about the environment are provided in
./code/environment_deepseek.txtand./code/environment_qwen3.txt.
Code Structure
SpecTTS-Bench/
├── code/ # Core implementation directory
│ ├── scripts/ # Shell scripts to execute the benchmarks
│ │ ├── deepseek.sh # 🚀 Run here: Script for DeepSeek-R1-Distill-Llama-8B
│ │ └── qwen3.sh # 🚀 Run here: Script for Qwen3 series
│ ├── model/ # Speculative decoding methods
│ ├── evaluation/ # Launching inference with speculative decoding
│ ├── data/ # Reasoning Dataset
│ ├── environment_deepseek.txt # Python dependency requirements for DeepSeek-R1-Distill-Llama-8B
│ └── environment_qwen3.txt # Python dependency requirements for Qwen3 series
├── fig/ # Figures and images for the README/Paper
├── LICENSE # MIT License
└── README.md # Main project documentation
Run
cd code
bash scripts/deepseek.sh # bash scripts/qwen3.sh
We provide the checkpoints for REST here.
Model Weight
Download corresponding model weights (if required) and modify the checkpoint path in code/scripts/*.
| Model | Type | URL |
|---|---|---|
| DeepSeek-R1-Distill-Llama-8B | Target Model | Link |
| Qwen3-4B | Target Model | Link |
| Qwen3-8B | Target Model | Link |
| Qwen3-14B | Target Model | Link |
| Qwen3-0.6B | Draft Model | Link |
| EAGLE3-DeepSeek-R1-Distill-LLaMA-8B | Draft Model | Link |
| EAGLE3-Qwen3-4B | Draft Model | Link |
| EAGLE3-Qwen3-8B | Draft Model | Link |
| EAGLE3-Qwen3-14B | Draft Model | Link |
Acknowledgment of Open-Source Code Contributions
The code is based on the open-source repositories: Spec-Bench, EAGLE, and Medusa, many thanks to the authors!
You are welcome to cite our paper:
@inproceedings{SunLi25,
title={Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling},
author={Shengyin Sun, Yiming Li, Xing Li, Yingzhao Lian, Weizhe Lin, Hui-Ling Zhen, Zhiyuan Yang, Chen Chen, Xianzhi Yu, Mingxuan Yuan, Chen Ma},
booktitle={arXiv:2509.04474},
year={2025}
}