README.md

October 13, 2025 · View on GitHub

DREAM

Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding

An open-source framework to accelerate Vision Language Model (VLM) inference by up to 3x with no quality loss.

🔥 Our work has been accepted to NeurIPS 2025! The paper is now available on arXiv. ✨

🚀 Overview

DREAM is a cutting-edge framework designed to significantly accelerate the inference speed of Vision Language Models (VLMs), such as LLaVA. By employing a novel speculative decoding mechanism, DREAM achieves up to a 3x speedup over traditional autoregressive methods without compromising the quality of the output.

The core of DREAM is its innovative approach: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding. This allows the model to generate multiple tokens in parallel and validate them efficiently, leading to substantial gains in performance.

✨ Key Features

High-Performance Inference: Up to 3x faster inference for Vision Language Models (VLMs) compared to standard methods.
Zero Quality Loss: Maintains the same output distribution as the original model.
Multimodal Support: Fully compatible with multimodal models like LLaVA.
Efficient Training: Includes scripts for training the auto-regression head using DeepSpeed.
Interactive Web UI: Comes with a Gradio-based web interface for easy testing and demonstration.
Comprehensive Tooling: Provides scripts for training data generation and performance evaluation.

🎥 Demo

Vanilla	DREAM

🛠️ Setup & Installation

Clone the repository:

git clone https://github.com/SAI-Lab-NYU/DREAM.git
cd DREAM

Install dependencies: We recommend creating a virtual environment first.
```
pip install -e .
```
Note: -e installs the project in editable mode.
Download Model Weights: See the Model Weights section below for links to the available models.

⚡ Quick Start

1. Inference with Web UI

Run our Gradio-based web interface for an interactive experience. The command automatically handles model allocation across multiple GPUs.

python -m dream.application.webui \
    --ea-model-path [PATH_TO_DREAM_WEIGHTS] \
    --base-model-path [PATH_TO_BASE_MODEL]

[PATH_TO_DREAM_WEIGHTS]: Path to the downloaded DREAM weights (e.g., ./DREAM-llava-v1.6-vicuna-7b).
[PATH_TO_BASE_MODEL]: Path to the original base model weights (e.g., the original vicuna-7b-v1.3).
total-token: Number of draft tokens. Adjust this based on your hardware for optimal performance. Set to -1 for auto-configuration.

Once the model is loaded, a URL will be displayed in the terminal.

2. Training the Auto-regression Head

First, generate the necessary training data (see ./ge_data for detailed instructions and generation scripts):

python -m dream.ge_data.allocation_mix665

Then, use the following DeepSpeed command to start training:

cd dream/train
deepspeed main_deepspeed.py \
    --deepspeed_config ./ds_config.json \
    --tmpdir [PATH_TO_TRAINING_DATA] \
    --cpdir [PATH_TO_SAVE_CHECKPOINTS] \
    --configpath ./vicuna_7B_config.json

3. Evaluation

Test the inference speed of DREAM on benchmarks like MT-Bench.

python -m dream.evaluation.eval_llava \
    --ea-model-path [PATH_TO_DREAM_WEIGHTS] \
    --base-model-path [PATH_TO_BASE_MODEL]

This will generate a .jsonl file containing the generation results and wall time.

📦 Model Weights

Model	Base Model	Download
`DREAM-llava-v1.6-vicuna-7b`	`vicuna-7b-v1.6`	🤗 HideonBed12138/DREAM-llava-v1.6-vicuna-7b

📄 Citation

If you find our work useful for your research, please consider citing our paper:

@misc{hu2025dreamdraftingrefinedtarget,
  title={DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding},
  author={Yunhai Hu and Tianhua Xia and Zining Liu and Rahul Raman and Xingyu Liu and Bo Bao and Eric Sather and Vithursan Thangarasa and Sai Qian Zhang},
  year={2025},
  eprint={2505.19201},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2505.19201},
}

🙏 Acknowledgements

This project is built upon the incredible work of the open-source community. We are especially grateful to the developers of Medusa, EAGLE, and FastChat.

📜 License

DREAM is licensed under the Apache 2.0 License