MagCache4FLUX
June 11, 2025 ยท View on GitHub
MagCache can speedup FLUX 2.8x without much visual quality degradation, in a training-free manner.

๐ Inference Latency Comparisons on a Single A800
| FLUX.1 [dev] | TeaCache (0.6) | MagCache (E024K5R01) |
|---|---|---|
| ~14.26 s | ~5.65 s 2.5x sppedup | ~5.05 s 2.8x sppedup |


Prompt: A photo of a black bicycle.
Prompt: A photo of a black bicycle.Installation
pip install --upgrade diffusers[torch] transformers protobuf tokenizers sentencepiece
Usage
You can modify the 'magcache_thresh', 'magcache_K', and 'retention_ratio' in lines 455-457 to obtain your desired trade-off between latency and visul quality. For single-gpu inference, you can use the following command:
python magcache_flux.py
Citation
If you find MagCache is useful in your research or applications, please consider giving us a star ๐ and citing it by the following BibTeX entry.
@misc{ma2025magcachefastvideogeneration,
title={MagCache: Fast Video Generation with Magnitude-Aware Cache},
author={Zehong Ma and Longhui Wei and Feng Wang and Shiliang Zhang and Qi Tian},
year={2025},
eprint={2506.09045},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.09045},
}
Acknowledgements
We would like to thank the contributors to the FLUX, TeaCache, and Diffusers.