README.md

March 19, 2026 · View on GitHub

📒A curated list of Awesome Diffusion Inference Papers with codes. For Awesome LLM Inference, please check 📖Awesome-LLM-Inference for more details.

📖 News 🔥🔥

  • [2026/03] Cache-DiT 🎉v1.3.0 release is ready, the major updates including: Ring Attention w/ batched P2P, USP (Hybrid Ring and Ulysses), Hybrid 2D and 3D Parallelism (💥USP + TP), VAE-P Comm overhead reduce.

arch

🤖Contents

©️Citations

@misc{Awesome-DiT-Inference@2024,
  title={Awesome-DiT-Inference: A small curated list of Awesome Diffusion Inference.},
  url={https://github.com/xlite-dev/Awesome-DiT-Inference},
  note={Open-source software available at https://github.com/xlite-dev/Awesome-DiT-Inference},
  author={xlite-dev},
  year={2024}
}

📙 Sampling

DateTitlePaperCodeRecom
2020.06🔥[DDPM] Denoising Diffusion Probabilistic Models(@UC Berkeley)[pdf][diffusion] ⭐️⭐️
2020.10🔥[DDIM] DENOISING DIFFUSION IMPLICIT MODELS(@cs.stanford.edu)[pdf]⚠️⭐️⭐️
2022.02🔥[PNDM] PSEUDO NUMERICAL METHODS FOR DIFFUSION MODELS ON MANIFOLDS(@)[pdf][PNDM] ⭐️⭐️
2022.02🔥[DPM-Solver] DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps(@Cheng Lu)[pdf][dpm-solver] ⭐️⭐️
2022.11🔥[DPM-Solver++] DPM-SOLVER++: FAST SOLVER FOR GUIDED SAMPLING OF DIFFUSION PROBABILISTIC MODELS(@Cheng Lu)[pdf][dpm-solver] ⭐️⭐️
2023.10🔥[DPM-Solver-v3] DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics(@Kaiwen Zheng)[pdf][DPM-Solver-v3] ⭐️⭐️
2023.11🔥[Parallel Sampling] Parallel Sampling of Diffusion Models(@Stanford University)[pdf][paradigms] ⭐️⭐️
2023.11🔥[SAMPLER SCHEDULER] SAMPLER SCHEDULER FOR DIFFUSION MODELS(@sysu)[pdf]⚠️⭐️⭐️
2024.02🔥[Parallel Sampling] Accelerating Parallel Sampling of Diffusion Models(@Zhiwei Tang)[pdf][ParaTAA-Diffusion] ⭐️⭐️
2024.01🔥[YONOS] You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation(@Samsung AI)[pdf]⚠️⭐️⭐️
2024.01🔥[S^2-DM] S^2-DMs: Skip-Step Diffusion Models(@Yixuan Wang)[pdf]⚠️⭐️⭐️
2024.08🔥[StepSaver] StepSaver: Predicting Minimum Denoising Steps for Diffusion Model Image Generation(@intel)[pdf]⚠️⭐️⭐️
2024.09🔥[DC-Solver] DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation(@Tsinghua University)[pdf][DC-Solver] ⭐️⭐️

📙 Caching

  • UNet Based (DeepCache)
image
  • DiT Based (Fast-Forward Caching) image
DateTitlePaperCodeRecom
2023.05🔥🔥[Cache-Enabled Sparse Diffusion] Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference(@pku.edu.cn etc)[pdf]⚠️⭐️⭐️
2023.12🔥🔥[DeepCache] DeepCache: Accelerating Diffusion Models for Free(@nus.edu)[pdf][DeepCache] ⭐️⭐️
2023.12🔥🔥[Block Caching] Cache Me if You Can: Accelerating Diffusion Models through Block Caching(@Meta GenAI etc)[pdf]⚠️⭐️⭐️
2023.12🔥🔥[Approximate Caching] Approximate Caching for Efficiently Serving Diffusion Models(@Adobe)[pdf]⚠️⭐️⭐️
2024.06🔥🔥[Layer Caching] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching(@nus.edu)[pdf][learning-to-cache] ⭐️⭐️
2024.07🔥[ElasticCache-LVLM] Efficient Inference of Vision Instruction-Following Models with Elastic Cache(@Tsinghua University etc)[pdf][ElasticCache] ⭐️
2024.07🔥🔥[Fast-Forward Caching(DiT)] FORA: Fast-Forward Caching in Diffusion Transformer Acceleration(@microsoft.com etc)[pdf][FORA] ⭐️⭐️
2024.07🔥🔥[Faster I2V Generation] Faster Image2Video Generation: A Closer Look at CLIP Image Embedding’s Impact on Spatio-Temporal Cross-Attentions(@Ashkan Taghipour etc)[pdf]⚠️⭐️⭐️
2024.04🔥🔥[T-GATE V1] Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models(@Wentian Zhang etc)[pdf][T-GATE] ⭐️⭐️
2024.04🔥🔥[T-GATE V2] Faster Diffusion via Temporal Attention Decomposition(@Haozhe Liu etc)[pdf][T-GATE] ⭐️⭐️
2024.06🔥🔥[DiTFastAttn] DiTFastAttn: Attention Compression for Diffusion Transformer Models(@Zhihang Yuan etc)[pdf][DiTFastAttn] ⭐️⭐️
2024.06🔥🔥[∆-DiT] ∆-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers(@Fudan University)[pdf]⚠️⭐️⭐️
2024.09🔥🔥[TokenCache] Token Caching for Diffusion Transformer Acceleration(@Institute of Automation, Chinese Academy of Sciences)[pdf]⚠️⭐️⭐️
2024.11🔥🔥[AdaCache] Adaptive Caching for Faster Video Generation with Diffusion Transformers(@Meta)[pdf][AdaCache] ⭐️⭐️
2024.11🔥🔥[TeaCache] Timestep Embedding Tells: It’s Time to Cache for Video Diffusion Model(@Alibaba)[pdf][TeaCache] ⭐️⭐️
2024.11🔥🔥[LazyDiT] LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers(@Adobe Research)[pdf]⚠️⭐️⭐️
2024.11🔥🔥[Ca2-VDM] Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing(@ZJU)[pdf][CausalCache-VDM] ⭐️⭐️
2024.11🔥🔥[SmoothCache] SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers(@Roblox)[pdf][SmoothCache] ⭐️⭐️
2024.10🔥🔥[FasterCache] FASTERCACHE: TRAINING-FREE VIDEO DIFFUSION MODEL ACCELERATION WITH HIGH QUALITY(@S-Lab)[pdf][FasterCache] ⭐️⭐️
2024.10🔥🔥[ToCa] ToCa: Accelerating Diffusion Transformers with Token-wise Feature Caching(@SJTU)[pdf][ToCa] ⭐️⭐️
2024.11🔥🔥[SkipCache] Accelerating Vision Diffusion Transformers with Skip Branches(@SJTU)[pdf][Skip-DiT] ⭐️⭐️
2024.12🔥🔥[DuCa] Accelerating Diffusion Transformers with Dual Feature Caching(@SJTU)[pdf][DuCa] ⭐️⭐️
2025.01🔥🔥[FBCache] Fastest HunyuanVideo Inference with Context Parallelism and First Block Cache on NVIDIA L20 GPUs(@chengzeyi)[docs][ParaAttention] ⭐️⭐️
2025.01🔥🔥[FlexCache] FlexCache: Flexible Approximate Cache System for Video Diffusion(@University of Waterloo)[pdf]⚠️⭐️⭐️
2025.01🔥🔥[Token Pruning] Token Pruning for Caching Better: 9× Acceleration on Stable Diffusion for Free(@SJTU)[pdf][DaTo] ⭐️⭐️
2025.04🔥🔥[AB-Cache] AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse(@USTC)[pdf]⚠️⭐️⭐️
2025.03🔥🔥[DiTFastAttnV2] DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers(@Infinigence AI)[pdf][DiTFastAttn] ⭐️⭐️
2025.03🔥🔥[TaylorSeers] From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers(@SJTU)[pdf][TaylorSeer] ⭐️⭐️
2025.04🔥🔥[Increment-Calibrated Cache] Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition(@PKU)[pdf][icc] ⭐️⭐️
2025.05🔥🔥[FastCache] FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation(@yale)[pdf][FastCache-xDiT] ⭐️⭐️
2025.06🔥🔥[DBCache] DBCache: Dual Block Caching for Diffusion Transformers(@DefTruth, @vipshop, etc)[docs][cache-dit] ⭐️⭐️
2025.06🔥🔥[DBPrune] DBPrune: Dynamic Block Prune with Residual Caching(@DefTruth, @vipshop, etc)[docs][cache-dit] ⭐️⭐️
2025.06🔥🔥[BACache] Block-wise Adaptive Caching for Accelerating Diffusion Policy(@THU)[pdf]⚠️⭐️⭐️

📙 Parallelism

  • UNet Based: Displaced Patch parallelism (DistriFusion)
image
  • DiT Based: Displaced Patch parallelism (PipeFusion)
image
DateTitlePaperCodeRecom
2024.02🔥🔥[DistriFusion] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models(@MIT etc)[pdf][distrifuser] ⭐️⭐️
2024.05🔥🔥[PipeFusion] PipeFusion: Displaced Patch Pipeline Parallelism for Inference of Diffusion Transformer Models(@Tencent etc)[pdf][xDiT] ⭐️⭐️
2024.06🔥🔥[AsyncDiff] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising(@nus.edu)[pdf][AsyncDiff] ⭐️⭐️
2024.05🔥🔥[TensorRT-LLM SDXL] SDXL Distributed Inference with TensorRT-LLM and synchronous comm(@Zars19)[pdf][SDXL-TensorRT-LLM] ⭐️⭐️
2024.06🔥🔥[Clip Parallelism] Video-Infinity: Distributed Long Video Generation(@nus.edu)[pdf][Video-Infinity] ⭐️⭐️
2024.05🔥🔥[FIFO-Diffusion] FIFO-Diffusion: Generating Infinite Videos from Text without Training(@Seoul National University)[pdf][FIFO-Diffusion] ⭐️⭐️
2025.01🔥🔥[ParaAttention] Context parallel attention that accelerates DiT model inference with dynamic caching(@chengzeyi)[docs][ParaAttention] ⭐️⭐️
2026.01🔥🔥[Cache-DiT] 🤗A PyTorch-native Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.[docs][cache-dit] ⭐️⭐️

📙 Quantization

DateTitlePaperCodeRecom
2024.08🔥[Transfusion] Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model(@meta)[pdf][transfusion-pytorch] ⭐️⭐️
2024.08🔥[MixDQ] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization (@THU&Infinigence AI.)[pdf][mixdq] ⭐️⭐️
2024.08🔥[ViDiT-Q] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation (@THU&Infinigence AI.)[pdf][viditq] ⭐️⭐️
2024.08🔥[VQ4DiT] VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers(@ZJU)[pdf]⚠️⭐️⭐️
2024.08🔥[LBQ] Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models(@toronto.edu)[pdf]⚠️⭐️⭐️
2024.08🔥[EE-Diffusion] A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models(@KAIST AI)[pdf][ee-diffusion] ⭐️⭐️
2024.08🔥[TFM-PTQ] Temporal Feature Matters: A Framework for Diffusion Model Quantization(@SenseTime)[pdf]⚠️⭐️⭐️
2024.08🔥[Diffusion-RWKV] Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models(@Zhengcong Fei)[pdf][Diffusion-RWKV] ⭐️⭐️
2024.09🔥[LinFusion] LINFUSION: 1 GPU, 1 MINUTE, 16K IMAGE(@NUS)[pdf][LinFusion] ⭐️⭐️
2024.11🔥🔥[SVDQuant] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models[pdf][nunchaku] ⭐️⭐️

📙 Attention

DateTitlePaperCodeRecom
2024.10🔥🔥[SageAttention] SAGEATTENTION: ACCURATE 8-BIT ATTENTION FOR PLUG-AND-PLAY INFERENCE ACCELERATION(@thu-ml)[pdf][SageAttention] ⭐️⭐️
2024.11🔥🔥[SageAttention-2] SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization(@thu-ml)[pdf][SageAttention] ⭐️⭐️
2025.03🔥🔥[SpargeAttention] SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference(@thu-ml)[pdf][SpargeAttn] ⭐️⭐️
2025.05🔥🔥[SageAttention-3] SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-bit Training(@thu-ml)[pdf][SageAttention] ⭐️⭐️
2025.05🔥🔥[DraftAttention] DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance(@Northeastern University)[pdf][draft-attention] ⭐️⭐️

©️License

GNU General Public License v3.0

🎉Contribute

Welcome to star & submit a PR to this repo!

Star History Chart