README.md

March 26, 2025 · View on GitHub

Dynamic Diffusion Transformer

🔥🔥🔥 2025.03.26 update: The enhanced version of our method (including training code) is now available at in https://github.com/alibaba-damo-academy/DyDiT/tree/main 🔥🔥🔥

The official implementation of "2024 Dynamic Diffusion Transformer".

Wangbo Zhao¹, Yizeng Han², Jiasheng Tang^2,3, Kai Wang¹, Yibing Song^2,3, Gao Huang⁴, Fan Wang², Yang You¹

¹National University of Singapore, ²DAMO Academy, Alibaba Group, ³Hupan Lab, ⁴Tsinghua University

Paper

https://github.com/user-attachments/assets/44ef5f81-cfe0-4e59-b228-14cc0729f5c6

We compare the generation speed of original DiT and the proposed DyDiT with $\lambda=0.5$ on a NVIDIA V100 32G GPU.

Images generated by DyDiT with $\lambda=0.5$ . visualization

Abstract: Diffusion Transformer (DiT), an emerging diffusion model for image generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs stem from the static inference paradigm, which inevitably introduces redundant computation in certain diffusion timesteps and spatial regions. To address this inefficiency, we propose Dynamic Diffusion Transformer (DyDiT), an architecture that dynamically adjusts its computation along both timestep and spatial dimensions during generation. Specifically, we introduce a Timestep-wise Dynamic Width (TDW) approach that adapts model width conditioned on the generation timesteps. In addition, we design a Spatial-wise Dynamic Token (SDT) strategy to avoid redundant computation at unnecessary spatial locations. Extensive experiments on various datasets and different-sized models verify the superiority of DyDiT. Notably, with <3% additional fine-tuning iterations, our method reduces the FLOPs of DiT-XL by 51%, accelerates generation by 1.73, and achieves a competitive FID score of 2.07 on ImageNet.

🚀 News

2025.03.26: The enhanced version of our method (including training code) is now available at in https://github.com/alibaba-damo-academy/DyDiT/tree/main
2025.01.23: DyDiT is accepted by ICLR 2025!!! We will update the code and paper soon.
2024.12.19: We release the code for inference.
2024.10.04: Our paper is released.

🎯 TODO

Release the code for inference.
Release the code for training.
Release the code for applying our method to additional models (e.g., U-ViT, SiT).
Release the code for applying our method to text-to-image and text-to-video generation diffusion models.

💥 Overview

motivation (a) The loss difference between DiT-S and DiT-XL across all diffusion timesteps (T = 1000). The difference is slight at most timesteps.

(b) Loss maps (normalized to the range [0, 1]) at different timesteps, show that the noise in different patches has varying levels of difficulty to predict.

model Overview of the proposed dynamic diffusion transformer (DyDiT). It reduces the computational redundancy in DiT from both timestep and spatial dimensions.

🔨 Install

We provide an environment.yml file to help create the Conda environment in our experiments. Other environments may also works well.

git clone https://github.com/NUS-HPC-AI-Lab/Dynamic-Diffusion-Transformer.git
conda env create -f environment.yml
conda activate DyDiT

⚙️ Inference

Currently, we provide a pre-trained checkpoint of DyDiT $\lambda=0.7$ .

model	FLOPs (G)	FID	download
DiT	118.69	2.27	-
DyDiT $\lambda=0.7$	84.33	2.12	🤗
DyDiT $\lambda=0.5$	-	-	in progress

Run sample_0.7.sh to sample images and evaluate the performance.

bash  sample_0.7.sh

The sample_ddp.py script which samples 50,000 images in parallel. It generates a folder of samples as well as a .npz file which can be directly used with ADM's TensorFlow evaluation suite to compute FID, Inception Score and other metrics. Please follow its instructions to download the reference batch VIRTUAL_imagenet256_labeled.npz.

🤔 Cite DyDiT

If you found our work useful, please consider citing us.

@article{zhao2024dynamic,
  title={Dynamic diffusion transformer},
  author={Zhao, Wangbo and Han, Yizeng and Tang, Jiasheng and Wang, Kai and Song, Yibing and Huang, Gao and Wang, Fan and You, Yang},
  journal={arXiv preprint arXiv:2410.03456},
  year={2024}
}

☎️ Contact

If you're interested in collaborating with us, feel free to reach out via email at wangbo.zhao96@gmail.com.