README.md

October 13, 2025 ยท View on GitHub

DREAM Logo

DREAM

Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding

An open-source framework to accelerate Vision Language Model (VLM) inference by up to 3x with no quality loss.

License Version arXiv Python

๐Ÿ”ฅ Our work has been accepted to NeurIPS 2025! The paper is now available on arXiv. โœจ

๐Ÿš€ Overview

DREAM is a cutting-edge framework designed to significantly accelerate the inference speed of Vision Language Models (VLMs), such as LLaVA. By employing a novel speculative decoding mechanism, DREAM achieves up to a 3x speedup over traditional autoregressive methods without compromising the quality of the output.

The core of DREAM is its innovative approach: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding. This allows the model to generate multiple tokens in parallel and validate them efficiently, leading to substantial gains in performance.

โœจ Key Features

  • High-Performance Inference: Up to 3x faster inference for Vision Language Models (VLMs) compared to standard methods.
  • Zero Quality Loss: Maintains the same output distribution as the original model.
  • Multimodal Support: Fully compatible with multimodal models like LLaVA.
  • Efficient Training: Includes scripts for training the auto-regression head using DeepSpeed.
  • Interactive Web UI: Comes with a Gradio-based web interface for easy testing and demonstration.
  • Comprehensive Tooling: Provides scripts for training data generation and performance evaluation.

๐ŸŽฅ Demo

Vanilla DREAM
Vanilla Demo DREAM Demo

๐Ÿ› ๏ธ Setup & Installation

  1. Clone the repository:

    git clone https://github.com/SAI-Lab-NYU/DREAM.git
    cd DREAM
    
  2. Install dependencies: We recommend creating a virtual environment first.

    pip install -e .
    

    Note: -e installs the project in editable mode.

  3. Download Model Weights: See the Model Weights section below for links to the available models.

โšก Quick Start

1. Inference with Web UI

Run our Gradio-based web interface for an interactive experience. The command automatically handles model allocation across multiple GPUs.

python -m dream.application.webui \
    --ea-model-path [PATH_TO_DREAM_WEIGHTS] \
    --base-model-path [PATH_TO_BASE_MODEL]
  • [PATH_TO_DREAM_WEIGHTS]: Path to the downloaded DREAM weights (e.g., ./DREAM-llava-v1.6-vicuna-7b).
  • [PATH_TO_BASE_MODEL]: Path to the original base model weights (e.g., the original vicuna-7b-v1.3).
  • total-token: Number of draft tokens. Adjust this based on your hardware for optimal performance. Set to -1 for auto-configuration.

Once the model is loaded, a URL will be displayed in the terminal.

2. Training the Auto-regression Head

First, generate the necessary training data (see ./ge_data for detailed instructions and generation scripts):

python -m dream.ge_data.allocation_mix665

Then, use the following DeepSpeed command to start training:

cd dream/train
deepspeed main_deepspeed.py \
    --deepspeed_config ./ds_config.json \
    --tmpdir [PATH_TO_TRAINING_DATA] \
    --cpdir [PATH_TO_SAVE_CHECKPOINTS] \
    --configpath ./vicuna_7B_config.json

3. Evaluation

Test the inference speed of DREAM on benchmarks like MT-Bench.

python -m dream.evaluation.eval_llava \
    --ea-model-path [PATH_TO_DREAM_WEIGHTS] \
    --base-model-path [PATH_TO_BASE_MODEL]

This will generate a .jsonl file containing the generation results and wall time.

๐Ÿ“ฆ Model Weights

ModelBase ModelDownload
DREAM-llava-v1.6-vicuna-7bvicuna-7b-v1.6๐Ÿค— HideonBed12138/DREAM-llava-v1.6-vicuna-7b

๐Ÿ“„ Citation

If you find our work useful for your research, please consider citing our paper:

@misc{hu2025dreamdraftingrefinedtarget,
  title={DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding},
  author={Yunhai Hu and Tianhua Xia and Zining Liu and Rahul Raman and Xingyu Liu and Bo Bao and Eric Sather and Vithursan Thangarasa and Sai Qian Zhang},
  year={2025},
  eprint={2505.19201},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2505.19201},
}

๐Ÿ™ Acknowledgements

This project is built upon the incredible work of the open-source community. We are especially grateful to the developers of Medusa, EAGLE, and FastChat.

๐Ÿ“œ License

DREAM is licensed under the Apache 2.0 License