Draft Attention

May 22, 2025 · View on GitHub

Draft Attention

This repository provides an overview of all resources for the paper "DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance".

Draft Attention is a plug-and-play acceleration method for video diffusion transformers.

Draft Attention reshapes long queries and keys into frame-wise feature maps and applying 2D average pooling to downsample them.

Draft Attention provides the reference for the sparse attention in full length.

Draft Attention introduces minimal overhead by compressing the number of tokens 128x or larger.

🔥 News

[2025/05] We support HunyuanCustom with classifier free guidance.

🎥 Demo

Hunyuan

Dense Attention

Sparse Video Generation (SVG)

Draft Attention (Ours)

Prompt: "The banks of the Thames, as the camera moves vertically from low to high."

Dense Attention

Sparse Video Generation (SVG)

Draft Attention (Ours)

Prompt: "On the green grass, the white-walled Leaning Tower of Pisa stands tall. The camera moves vertically from top to bottom during filming."

Dense Attention

Sparse Video Generation (SVG)

Draft Attention (Ours)

Prompt: "A blue long dress fell from the balcony clothes rack and dropped into the water on the ground."

Prompts are all from the Penguin Video Benchmark.

Videos are generated with sparsity 90%, seed 42, using Hunyuan model in 768p on A100 GPU.

HunyuanCustom

Input Image

Dense Attention

Draft Attention (Ours)

Prompt: "Realistic, High-quality. A woman is drinking coffee at a café."

Videos are generated with seed 42 in 768p resolution on 8xA100 GPUs, with either dense attention or 90% sparse attention.

from draft_attention import Draft_Attention

draft_attention = Draft_Attention(
    pool_h=8,
    pool_w=16,
    latent_h=48,
    latent_w=80,
    visual_len=126_720,
    text_len=256,
    sparsity_ratio=0.9,
)

x = draft_attention(
    q,
    k,
    v,
    attn_mask=attn_mask,
    causal=causal,
    drop_rate=drop_rate,
    cu_seqlens_q=cu_seqlens_q,
    cu_seqlens_kv=cu_seqlens_kv,
    max_seqlen_q=max_seqlen_q,
    max_seqlen_kv=max_seqlen_kv,
    batch_size=batch_size,
)

✏️ TODO

Support any-resolution video generation with padding.
Support reordering of further block sparse grouping for faster hardware execution.

📑 Acknowledgement

This work is mainly contributed by Xuan and Chenxia.

🔗 BibTeX

If you find Draft Attention is interesting, please cite through BibTeX:

@article{shen2025draft,
  title={DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance},
  author={Shen, Xuan and Han, Chenxia and Zhou, Yufa and Xie, Yanyue and Gong, Yifan and Wang, Quanyi and Wang, Yiwei and Wang, Yanzhi and Zhao, Pu and Gu, Jiuxiang},
  journal={arXiv preprint arXiv:2505.14708},
  year={2025}
}

Draft Attention

Draft Attention

🔥 News

🎥 Demo

Hunyuan

HunyuanCustom

🚀 Quick Start

Model Preparation

Sparse Attention

Video Generation

Use for Your Own

✏️ TODO

📑 Acknowledgement

🔗 BibTeX