🚀[CVPR 2025] Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling✨

September 17, 2025 · View on GitHub

[2025.03.08] 🚀 STG is now integrated into the Diffusers community pipeline!
👉 Check it out on Hugging Face
[2025.02.07] 🏆 STG officially accepted to CVPR 2025!
🎤 Stay tuned for our presentation at the conference.
[2024.12.20] 🔥 STG added to LTXVideo’s official repository!
📂 Now part of LTXVideo’s main repository.
[2024.12.19] 🖥️ ComfyUI STG support for LTXVideo!
🎬 Implemented in ComfyUI, enhancing LTXVideo support.

🎥Video Examples

Below are example videos showcasing the enhanced video quality achieved through STG:

🗺️Start Guide

🧪Diffusers-based codes To run the test script, refer to the inference.py file in each folder. Below is an example using Mochi:

# inference.py
import torch
from diffusers import MochiPipeline
from pipeline_stg_mochi import MochiSTGPipeline
from diffusers.utils import export_to_video
import os

# Ensure the samples directory exists
os.makedirs("samples", exist_ok=True)

ckpt_path = "genmo/mochi-1-preview"
# Load the pipeline
pipe = MochiSTGPipeline.from_pretrained(ckpt_path, variant="bf16", torch_dtype=torch.bfloat16)

# Enable memory savings
# pipe.enable_model_cpu_offload()
# pipe.enable_vae_tiling()
pipe = pipe.to("cuda")

#--------Option--------#
prompt = "A close-up of a beautiful woman's face with colored powder exploding around her, creating an abstract splash of vibrant hues, realistic style."
stg_applied_layers_idx = [34]
stg_mode = "STG"
stg_scale = 1.0 # 0.0 for CFG (default)
do_rescaling = False # False (default)
#----------------------#

# Generate video frames
frames = pipe(
    prompt, 
    height=480,
    width=480,
    num_frames=81,
    stg_applied_layers_idx=stg_applied_layers_idx,
    stg_scale=stg_scale,
    generator = torch.Generator().manual_seed(42),
    do_rescaling=do_rescaling,
).frames[0]

# Construct the video filename
if stg_scale == 0:
    video_name = f"CFG_rescale_{do_rescaling}.mp4"
else:
    layers_str = "_".join(map(str, stg_applied_layers_idx))
    video_name = f"{stg_mode}_scale_{stg_scale}_layers_{layers_str}_rescale_{do_rescaling}.mp4"

# Save video to samples directory
video_path = os.path.join("samples", video_name)
export_to_video(frames, video_path, fps=30)

print(f"Video saved to {video_path}")

For details on memory efficiency, inference acceleration, and more, refer to the original pages below:

🙏Acknowledgements

This project is built upon the following works:

📖 BibTeX

@article{hyung2024spatiotemporal,
  title={Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling},
  author={Hyung, Junha and Kim, Kinam and Hong, Susung and Kim, Min-Jung and Choo, Jaegul},
  journal={arXiv preprint arXiv:2411.18664},
  year={2024}
}

🚀[CVPR 2025] Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling✨

📑Paper

🌐Project Page

📰 News

🎥Video Examples

Mochi

CogVideoX

SVD (Stable Video Diffusion)

LTX-Video

🗺️Start Guide

🙏Acknowledgements

📖 BibTeX