Flash-Unified: A Training-Free and Task-Aware Acceleration Framework for Native Unified Models

May 7, 2026 ยท View on GitHub

CVPR 2026 arXiv

๐ŸŽ‰ This work has been accepted to Findings of CVPR 2026! ๐ŸŽ‰

FlashU is a training-free, plug-and-play acceleration framework for unified multimodal model. By applying a simple apply_flashu_patch() call, you can achieve ~2x speedup on text-to-image generation with negligible quality loss.

๐Ÿš€ Quick Start

from models import Showo2Qwen2_5  # Original Show-o2 baseline
from flashu import apply_flashu_patch, FlashUConfig

model = Showo2Qwen2_5.from_pretrained("showlab/show-o2-1.5B")
apply_flashu_patch(model, FlashUConfig())  # โšก That's it!

# Use the accelerated model exactly as before
images = model.t2i_generate(...)

๐Ÿ› ๏ธ Installation

# Clone this repository
git clone https://github.com/Rirayh/FlashU-Show-o2.git
cd FlashU-Show-o2

# Install dependencies (same as Show-o2)
pip install torch torchvision accelerate transformers einops omegaconf tqdm pillow

# Download WAN 2.1 VAE weights
wget https://path/to/Wan2.1_VAE.pth -P .

Requirements:

  • Python >= 3.10
  • PyTorch >= 2.4.0 (with FlexAttention support)
  • CUDA >= 12.1
  • GPU: >= 24GB VRAM

๐Ÿ“– Usage

Basic Usage

from models import Showo2Qwen2_5
from flashu import apply_flashu_patch, FlashUConfig

# Load baseline model
model = Showo2Qwen2_5.from_pretrained("showlab/show-o2-1.5B").cuda().eval()

# Apply FlashU acceleration
config = FlashUConfig(
    r_p=0.20,           # FFN pruning ratio
    r_LS=0.20,          # Layer skipping ratio
    T_LS=10,            # Layer importance recalc interval
    T_cache=5,          # Diffusion head cache interval
    tau=10,             # Hybrid FFN final steps
    schedule=[(4, 5.0), (8, 7.5), (20, 11.0)],  # Adaptive guidance
)
apply_flashu_patch(model, config)

# Use model as normal
images = model.t2i_generate(prompts, ...)

DPG-Bench Evaluation

# Quick demo (5 prompts)
bash scripts/run_dpg_accelerated.sh --demo

# Full benchmark (1065 prompts)
bash scripts/run_dpg_accelerated.sh

The script automatically runs both baseline and FlashU, then prints a comparison table.

Custom Configuration

from flashu import FlashUConfig

# Aggressive speedup (lower quality)
fast_config = FlashUConfig(r_p=0.30, r_LS=0.30, tau=5)

# Conservative (higher quality)
quality_config = FlashUConfig(r_p=0.10, r_LS=0.10, tau=15)

Reverting to Baseline

from flashu import remove_flashu_patch

remove_flashu_patch(model)  # Restores original forward passes

๐Ÿ”ฌ Analysis Tools

OBD Importance Score Computation

python -m evaluation.calculate_obd_cache_new \
    config=configs/showo2_1.5b_demo_432x432.yaml \
    num_prompts=20

Importance Score Visualization

python -m evaluation.visualize_obd_scores \
    --scores_dir <path_to_scores>

Activation Analysis

python -m evaluation.analyze_t2i_activations1 \
    config=configs/showo2_1.5b_demo_432x432.yaml

๐ŸŽ“ Citation

@article{ke2026flashunified,
  title={Flash-Unified: A Training-Free and Task-Aware Acceleration Framework for Native Unified Models},
  author={Ke, Junlong and Wen, Zichen and Yang, Boxue and Yang, Yantai and Liu, Xuyang and Liao, Chenfei and Chen, Zhaorun and Wang, Shaobo and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2603.15271},
  year={2026}
}

๐Ÿ“œ License

Apache License 2.0. See headers in source files for details.

๐Ÿ™ Acknowledgements

  • Show-o2 by NUS Show Lab - The amazing baseline model
  • Wanda - Structural pruning method
  • FlexAttention (PyTorch 2.4+) - Efficient attention implementation