๐ŸŽจ Awesome Diffusion Acceleration Cache ๐Ÿš€

November 4, 2025 ยท View on GitHub

๐ŸŽจ Awesome Diffusion Acceleration Cache ๐Ÿš€

Awesome papercount Maintenance GitHub stars

๐Ÿ‘๐Ÿ‘ If you would like to contribute to this repository, feel free to email me at ljc.mytcl@gmail.com! ๐Ÿ‘๐Ÿ‘

๐Ÿ“š Repository Description

A curated list of research papers, resources, and advancements on Diffusion Cache and related efficient diffusion model acceleration techniques.

This repository aims to provide a comprehensive and up-to-date collection of academic works focused on Diffusion Cache โ€” a promising approach for accelerating diffusion models by caching intermediate features or latent states. It includes papers on model efficiency, memory optimization, reuse mechanisms, and inference speed-up in diffusion-based generative systems.

This repository is maintained by EPIC Lab at Shanghai Jiao Tong University

๐Ÿ”ฅ Update News

  • 2025/03/10 ๐Ÿ’ฅ๐Ÿ’ฅ We propose TaylorSeer, achieving ~5ร— acceleration for DiTs with Taylor expansion-based feature forecasting!

  • 2024/12/24 ๐Ÿ’ฅ๐Ÿ’ฅ Our work DuCa has been accepted by ICLR 2025! Congratulations to all collaborators!

  • 2024/10/12 ๐Ÿš€๐Ÿš€ We release our work ToCa, achieving nearly lossless acceleration of 1.51ร— on FLUX, 1.93ร— on PixArt-ฮฑ, and 2.36ร— on OpenSora!

  • 2024/08/24 ๐Ÿค—๐Ÿค— We release an open-source repo for diffusion model acceleration and caching techniques!

๐Ÿ“š Contents

๐Ÿ’ฌ Keywords

๐Ÿ“ Papers

2023

  • [1] Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference, AAAI 2024.

    Yu, Zihao and Li, Haoyang and Fu, Fangcheng and Miao, Xupeng and Cui, Bin.

    [Paper] [Code]

  • [2] DeepCache: Accelerating Diffusion Models for Free, CVPR 2024.

    Ma, Xinyin and Fang, Gongfan and Wang, Xinchao.

    [Paper] [Code]

  • [3] Cache Me if You Can: Accelerating Diffusion Models through Block Caching, CVPR 2024.

    Wimbauer, Felix and Wu, Bichen and Schoenfeld, Edgar and Dai, Xiaoliang and others.

    [Paper]

  • [4] Approximate Caching for Efficiently Serving Diffusion Models, NSDI 2024.

    Adobe Research Team.

    [Paper]

  • [5] Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference, NeurIPS 2024.

    Li, Senmao and Hu, Taihang and Khan, Fahad Shahbaz and Li, Linxuan and Yang, Shiqi and others.

    [Paper] [Code]

2024

  • [6] Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models, arXiv 2024.

    Liu, Haozhe and Zhang, Wentian and Xie, Jinheng and others.

    [Paper] [Code]

  • [7] Faster Diffusion via Temporal Attention Decomposition, TMLR 2024.

    Liu, Haozhe and Zhang, Wentian and Xie, Jinheng and Faccio, Francesco and others.

    [Paper] [Code]

  • [8] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching, NeurIPS 2024.

    Ma, Xinyin and Fang, Gongfan and Mi, Michael Bi and Wang, Xinchao.

    [Paper] [Code]

  • [9] โˆ†-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers, arXiv 2024.

    Chen, Pengtao and Shen, Mingzhu and Ye, Peng and Cao, Jianjian and others.

    [Paper]

  • [10] DiTFastAttn: Attention Compression for Diffusion Transformer Models, NeurIPS 2024.

    Yuan, Zhihang and Lu, Pu and Zhang, Hanling and Ning, Xuefei and others.

    [Paper] [Code]

  • [11] Efficient Inference of Vision Instruction-Following Models with Elastic Cache, ECCV 2024.

    Liu, Zuyan and others.

    [Paper] [Code]

  • [12] FORA: Fast-Forward Caching in Diffusion Transformer Acceleration, arXiv 2024.

    Selvaraju, Pratheba and Ding, Tianyu and Chen, Tianyi and Zharkov, Ilya and others.

    [Paper] [Code]

  • [13] Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions, arXiv 2024.

    University of Western Australia Research Team.

    [Paper]

  • [14] Token Caching for Diffusion Transformer Acceleration, arXiv 2024.

    Jinming Lou and Wenyang Luo and Yufan Liu and Bing Li and others.

    [Paper]

  • [15] FRDiff: Feature Reuse for Universal Training-free Acceleration of Diffusion Models, ECCV 2024.

    Lee, Jungwon and Park, Suhyeon and others.

    [Paper] [Code]

  • [16] FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality, arXiv 2024.

    Zhengyao Lv and Chenyang Si and Junhao Song and Zhenyu Yang and others.

    [Paper] [Code]

  • [17] ToCa: Accelerating Diffusion Transformers with Token-wise Feature Caching, ICLR 2025.

    Chang Zou and Xuyang Liu and Ting Liu and Siteng Huang and Linfeng Zhang.

    [Paper] [Code]

  • [18] HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration, ICML 2025.

    SenseTime Research Team.

    [Paper] [Code]

  • [19] Adaptive Caching for Faster Video Generation with Diffusion Transformers, arXiv 2024.

    Kumara Kahatapitiya and Haozhe Liu and Sen He and Ding Liu and others.

    [Paper] [Code]

  • [20] Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model, CVPR 2025.

    Liu, Feng and Zhang, Shiwei and Wang, Xiaofeng and Wei, Yujie and others.

    [Paper] [Code]

  • [21] LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers, AAAI 2025.

    Rice, Shawn and others.

    [Paper] [Code]

  • [22] Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing, ICML 2025.

    Gao, Kaifeng and Shi, Jiaxin and Zhang, Hanwang and others.

    [Paper] [Code]

  • [23] SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers, CVPR eLVM 2024.

    Roblox Research Team.

    [Paper] [Code]

  • [24] Accelerating Diffusion Transformers with Dual Feature Caching, ICLR 2025.

    Chang Zou and Evelyn Zhang and Runlin Guo and Haohang Xu and Conghui He and others.

    [Paper] [Code]

2025

  • [25] Fastest HunyuanVideo Inference with Context Parallelism and First Block Cache on NVIDIA L20 GPUs, arXiv 2025.

    Zheng, Zeyi and others.

    [Paper] [Code]

  • [26] FlexCache: Flexible Approximate Cache System for Video Diffusion, arXiv 2025.

    University of Waterloo Research Team.

    [Paper]

  • [27] Token Pruning for Caching Better: 9 Times Acceleration on Stable Diffusion for Free, arXiv 2025.

    Evelyn Zhang and Bang Xiao and Jiayi Tang and Qianli Ma and Chang Zou and others.

    [Paper]

  • [28] Accelerating Diffusion Transformer via Error-Optimized Cache, arXiv 2025.

    University of Science and Technology of China Research Team.

    [Paper] [Code]

  • [29] Real-Time Video Generation with Pyramid Attention Broadcast, ICLR 2025.

    Xuanlei Zhao and Xiaolong Jin and Kai Wang and Yang You.

    [Paper] [Code]

  • [30] MoDM: Efficient Serving for Image Generation via Mixture-of-Diffusion Models, arXiv 2025.

    University of Michigan Research Team.

    [Paper] [Code]

  • [31] BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers, CVPR 2025.

    Fudan University Research Team.

    [Paper]

  • [32] From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers, ICCV 2025.

    Chang Zou and others.

    [Paper] [Code]

  • [33] FEB-Cache: Frequency-Guided Exposure Bias Reduction for Enhancing Diffusion Transformer Caching, arXiv 2025.

    Xu, Haohang and Zou, Chang and Liu, Xuyang and Guo, Runlin and He, Conghui and others.

    [Paper] [Code]

  • [34] CacheQuant: Comprehensively Accelerated Diffusion Models, CVPR 2025.

    Zhang, Evelyn and Tang, Jiayi and others.

    [Paper] [Code]

  • [35] QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation, ICCV 2025.

    Wu, Junyi and Zou, Chang and Liu, Xuyang and Guo, Runlin and Xu, Haohang and others.

    [Paper] [Code]

  • [36] AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse, arXiv 2025.

    Xu, Haohang and Zou, Chang and Liu, Xuyang and Guo, Runlin and He, Conghui and others.

    [Paper]

  • [37] ProfilingDiT: Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models, arXiv 2025.

    Huazhong University of Science and Technology Research Team.

    [Paper] [Code]

  • [38] Compute only 16 tokens in one timestep: Accelerating Diffusion Transformers with Cluster-Driven Feature Caching, ACM MM 2025.

    Zheng, Zhixin and Zou, Chang and Liu, Xuyang and others.

    [Paper] [Code]

  • [39] SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching, ACM MM 2025.

    Zou, Chang and Liu, Xuyang and Guo, Runlin and others.

    [Paper] [Code]

  • [40] Accelerate Diffusion Transformers with Feature Momentum, arXiv 2025.

    Liu, Xuyang and Zou, Chang and Guo, Runlin and others.

    [Paper] [Code]

  • [41] Foresight: Adaptive Layer Reuse for Accelerated and High-Quality Text-to-Video Generation, NeurIPS 2025.

    The University of British Columbia and d-Matrix Research Team.

    [Paper] [Code]

  • [42] ParaStep: Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism, arXiv 2025.

    Shanghai Jiao Tong University Research Team.

    [Paper]

  • [43] RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy, arXiv 2025.

    Huawei Technologies Co., Ltd Research Team.

    [Paper]

  • [44] Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition, CVPR 2025.

    Qiu, Jianxin and others.

    [Paper] [Code]

  • [45] FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation, CVPR 2025.

    Liu, Noak and others.

    [Paper] [Code]

  • [46] Chipmunk: Training-Free Acceleration of Diffusion, ICML 2025.

    University of California Research Team.

    [Paper] [Code]

  • [47] ECAD: Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model, arXiv 2025.

    University of Maryland Research Team.

    [Paper] [Code]

  • [48] MagCache: Fast Video Generation with Magnitude-Aware Cache, arXiv 2025.

    Peking University Research Team.

    [Paper] [Code]

  • [49] DBPrune: Dynamic Block Prune with Residual Caching, arXiv 2025.

    Vipshop Research Team.

    [Paper] [Code]

  • [50] DBCache: Dual Block Caching for Diffusion Transformers, arXiv 2025.

    Vipshop Research Team.

    [Paper] [Code]

  • [51] Block-wise Adaptive Caching for Accelerating Diffusion Policy, arXiv 2025.

    Zhang, Yunbo and Liu, Zhenyu and others.

    [Paper]

  • [52] Accelerating Vision Diffusion Transformers with Skip Branches, ICCV 2025.

    Chen, Guanjie and Zhao, Xinyu and Zhou, Yucheng and Chen, Tianlong and Yu, Cheng.

    [Paper] [Code]

  • [53] Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers, arXiv 2025.

    Zou, Chang and Liu, Xuyang and Guo, Runlin and others.

    [Paper] [Code]

  • [54] HiCache: Training-free Acceleration of Diffusion Models via Hermite Polynomial-based Feature Caching, arXiv 2025.

    Guo, Runlin and Zou, Chang and Liu, Xuyang and others.

    [Paper] [Code]

  • [55] WaveEx: Accelerating Flow Matching-based Speech Generation via Wavelet-guided Extrapolation, arXiv 2025.

    Guo, Runlin and Zou, Chang and Liu, Xuyang and others.

    [Paper] [Code]

  • [56] EasyCache: Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching, arXiv 2025.

    Huazhong University of Science and Technology Research Team.

    [Paper] [Code]

  • [57] Accelerating Diffusion Transformer via Gradient-Optimized Cache, ICCV 2025.

    Qiu, Jianxin and others.

    [Paper] [Code]

  • [58] TaoCache: Structure-Maintained Video Generation Acceleration, arXiv 2025.

    Huawei Research Team.

    [Paper]

  • [59] DiCache: Let Diffusion Model Determine Its Own Cache, arXiv 2025.

    Shanghai Jiao Tong University Research Team.

    [Paper] [Code]

  • [60] HERO: Hierarchical Extrapolation and Refresh for Efficient World Model, arXiv 2025.

    Tsinghua University Research Team.

    [Paper]

  • [61] ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion, arXiv 2025.

    ByteDance Research Team.

    [Paper] [Code]

  • [62] OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models, ICCV 2025.

    Zhipu AI Research Team.

    [Paper]

  • [63] MixCache: Mixture-of-Cache for Video Diffusion Transformer Acceleration, arXiv 2025.

    Sun Yat-sen University Research Team.

    [Paper]

  • [64] Z-Cache: Accelerating Diffusion Transformers via Self-Reflection, arXiv 2025.

    Zou, Chang and Liu, Xuyang and Guo, Runlin and others.

    [Paper] [Code]

  • [65] SpecDiff: Accelerating Diffusion Model Inference with Self-Speculation, arXiv 2025.

    Shanghai Jiao Tong University Research Team.

    [Paper]

  • [66] BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching, arXiv 2025.

    Beijing Normal University Research Team.

    [Paper]

  • [67] DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration, arXiv 2025.

    Xiamen University, Zhejiang University, Wuhan University Research Team.

    [Paper]

  • [68] LightningCP: Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation, arXiv 2025.

    Nanyang Technological University Research Team.

    [Paper]

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

Thanks to all researchers and contributors who have worked on diffusion model acceleration and caching techniques.