Optimal Stepsize for Diffusion Sampling(OSS)
April 13, 2025 · View on GitHub
Official PyTorch implementation for our paper Optimal Stepsize for Diffusion Sampling, a plug-and-play algorithm to search the optimal sampling stepsize in diffusion sampling.
Left to Right: FLUX-100steps; FLUX+OSS-10steps; FLUX-10steps.
https://github.com/user-attachments/assets/d903c467-7050-4e79-9659-47bb672f001c
Left to Right: Wan-100steps; Wan+OSS-20steps; Wan-20steps.
:fire: Latest News!!
- Apr 13, 2025: :clap: OSS is integrated into ComfyUI as OptimalStepsScheduler! It supports the fast sampling of text-to-image model FLUX and text-to-video model Wan. Here, we give a comparision of FLUX workflow with 10 sampling steps:
Here are the demo workflows for FLUX-workflow and Wan-workflow. 10 steps for FLUX and 20 steps for Wan are highly recommended. More details are included in our pull request here.
:smiley: Overview
In this repo, we provide some examples of using our algorithm based on the DiT, FLUX, Open-Sora, and Wan2.1.
Note that OSS is not limited to these, we also provide a guidance to adapt it to other diffusion models.
:rocket: Quick Start Guide
Step 0: Prepare the environment
Prepare the environment for the target model you want to use, such as DiT, FLUX, Open-Sora, Wan2.1 or other models.
Step 1: Clone the repository
git clone https://github.com/bebebe666/OSS.git
cd OSS
Step 2: Run inference
# DiT
bash scripts/dit.sh
# FLUX
bash scripts/flux.sh
# Open-Sora
bash scripts/opensora.sh
# Wan2.1
bash scripts/wan.sh
:airplane: Appling for other models
Model-Preparing
Before using our algorithm, you need to wrap your diffusion model to a unified format, this should satisfies:
- The output of the model should be the v-pred same as the Flow Matching.
- The sampling trajectory should be straight, follows .
You can refer to examples we gave in the model_wrap.py.
Searching
We provide the searching functions as follows:
oss_steps = search_OSS(model, z, search_batch, context, device, teacher_steps=10, student_steps=5, model_kwargs=model_kwargs)
- the
zis the input noise; - the
search_batchis the number of images you want to search; - the
contextis the class embedding in class conditional image generation and prompt embedding in text-to-image generation.
We provide the function search_OSS_video for video generation model searching, which supports the selection of frame and channel. As default, cost_type="all" and channel_type="all" means using all the frames and channels for cost calculation. You can pass any number (remember not greater than the total) as a string format like "4" to use part of them.
Inference
After getting the oss_steps, you can pass it to the inference function to get the sampling results.
samples = infer_OSS(oss_steps, model, z, context, device, model_kwargs=model_kwargs)
🙏 Acknowledgments
This codebase benefits from the solid prior works: DiT, FLUX, Open-Sora, and Wan2.1 for their excellent generation ability.
📖 Citation
If you find this project helpful for your research or use it in your own work, please cite our paper:
@misc{pei2025optimalstepsizediffusionsampling,
title={Optimal Stepsize for Diffusion Sampling},
author={Jianning Pei and Han Hu and Shuyang Gu},
year={2025},
eprint={2503.21774},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.21774},
}
⭐️ If this repository helped your research, please star 🌟 this repo 👍!