Draft Attention
May 22, 2025 ยท View on GitHub
Draft Attention
This repository provides an overview of all resources for the paper "DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance".
Draft Attention is a plug-and-play acceleration method for video diffusion transformers.
Draft Attention reshapes long queries and keys into frame-wise feature maps and applying 2D average pooling to downsample them.
Draft Attention provides the reference for the sparse attention in full length.
Draft Attention introduces minimal overhead by compressing the number of tokens 128x or larger.
๐ฅ News
- [2025/05] We support HunyuanCustom with classifier free guidance.
๐ฅ Demo
Hunyuan
![]() Dense Attention |
![]() Sparse Video Generation (SVG) |
![]() Draft Attention (Ours) |
Prompt:
"The banks of the Thames, as the camera moves vertically from low to high."
![]() Dense Attention |
![]() Sparse Video Generation (SVG) |
![]() Draft Attention (Ours) |
Prompt:
"On the green grass, the white-walled Leaning Tower of Pisa stands tall. The camera moves vertically from top to bottom during filming."
![]() Dense Attention |
![]() Sparse Video Generation (SVG) |
![]() Draft Attention (Ours) |
Prompt:
"A blue long dress fell from the balcony clothes rack and dropped into the water on the ground."
Prompts are all from the Penguin Video Benchmark.
Videos are generated with sparsity 90%, seed 42, using Hunyuan model in 768p on A100 GPU.
HunyuanCustom
![]() Input Image |
![]() Dense Attention |
![]() Draft Attention (Ours) |
Prompt:
"Realistic, High-quality. A woman is drinking coffee at a cafรฉ."
Videos are generated with seed 42 in 768p resolution on 8xA100 GPUs, with either dense attention or 90% sparse attention.
๐ Quick Start
Model Preparation
Please follow the instruction of environment setup and download the checkpoint from HunyuanVideo, Wan2.1, and HunyuanCustom.
Sparse Attention
We mainly adopt the block sparse attention for draft attention.
Video Generation
Simply run video generation with scripts in hunyuan/, wan/ or hunyuan_custom/.
Evaluation results in the paper are mainly achieved with VBench on Penguin Video Benchmark using HunyuanVideo and Wan2.1.
Use for Your Own
You can simply use the draft attention similar as the flash attention through the Draft_Attention defined in draft_attention.py or draft_attention_classifier_free_guidance.py.
Here is the example for hunyuan model:
from draft_attention import Draft_Attention
draft_attention = Draft_Attention(
pool_h=8,
pool_w=16,
latent_h=48,
latent_w=80,
visual_len=126_720,
text_len=256,
sparsity_ratio=0.9,
)
x = draft_attention(
q,
k,
v,
attn_mask=attn_mask,
causal=causal,
drop_rate=drop_rate,
cu_seqlens_q=cu_seqlens_q,
cu_seqlens_kv=cu_seqlens_kv,
max_seqlen_q=max_seqlen_q,
max_seqlen_kv=max_seqlen_kv,
batch_size=batch_size,
)
โ๏ธ TODO
- Support any-resolution video generation with padding.
- Support reordering of further block sparse grouping for faster hardware execution.
๐ Acknowledgement
This work is mainly contributed by Xuan and Chenxia.
๐ BibTeX
If you find Draft Attention is interesting, please cite through BibTeX:
@article{shen2025draft,
title={DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance},
author={Shen, Xuan and Han, Chenxia and Zhou, Yufa and Xie, Yanyue and Gong, Yifan and Wang, Quanyi and Wang, Yiwei and Wang, Yanzhi and Zhao, Pu and Gu, Jiuxiang},
journal={arXiv preprint arXiv:2505.14708},
year={2025}
}











