model_zoo.md

March 16, 2026 ยท View on GitHub

SANA

ModelResopth linkdiffusersPrecisionDescription
Sana-0.6B512pxSana_600M_512pxEfficient-Large-Model/Sana_600M_512px_diffusersfp16/fp32Multi-Language
Sana-0.6B1024pxSana_600M_1024pxEfficient-Large-Model/Sana_600M_1024px_diffusersfp16/fp32Multi-Language
Sana-1.6B512pxSana_1600M_512pxEfficient-Large-Model/Sana_1600M_512px_diffusersfp16/fp32-
Sana-1.6B512pxSana_1600M_512px_MultiLingEfficient-Large-Model/Sana_1600M_512px_MultiLing_diffusersfp16/fp32Multi-Language
Sana-1.6B1024pxSana_1600M_1024pxEfficient-Large-Model/Sana_1600M_1024px_diffusersfp16/fp32-
Sana-1.6B1024pxSana_1600M_1024px_MultiLingEfficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusersfp16/fp32Multi-Language
Sana-1.6B1024pxSana_1600M_1024px_BF16Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusersbf16/fp32Multi-Language
Sana-1.6B-int41024px-mit-han-lab/svdq-int4-sana-1600mint4Multi-Language
Sana-1.6B2KpxSana_1600M_2Kpx_BF16Efficient-Large-Model/Sana_1600M_2Kpx_BF16_diffusersbf16/fp32Multi-Language
Sana-1.6B4KpxSana_1600M_4Kpx_BF16Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusersbf16/fp32Multi-Language
ControlNet
Sana-1.6B-ControlNet1KpxSana_1600M_1024px_BF16_ControlNet_HEDComing soonbf16/fp32Multi-Language
Sana-0.6B-ControlNet1KpxSana_600M_1024px_ControlNet_HED- soonfp16/fp32-

SANA-1.5

ModelResopth linkdiffusersPrecisionDescription
SANA1.5-4.8B1024pxSANA1.5_4.8B_1024pxEfficient-Large-Model/SANA1.5_4.8B_1024px_diffusersbf16Multi-Language
SANA1.5-1.6B1024pxSANA1.5_1.6B_1024pxEfficient-Large-Model/SANA1.5_1.6B_1024px_diffusersbf16Multi-Language

SANA-Sprint

ModelResopth linkdiffusersPrecisionDescription
Sana-Sprint-0.6B1024pxSana-Sprint_0.6B_1024pxEfficient-Large-Model/Sana_Sprint_0.6B_1024px_diffusersbf16Multi-Language
Sana-Sprint-1.6B1024pxSana-Sprint_1.6B_1024pxEfficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusersbf16Multi-Language

SANA-Video

ModelResopth linkdiffusersPrecisionDescription
Sana-Video-2B480pSana-Video_2B_480pEfficient-Large-Model/Sana-Video_2B_480p_diffusersbf165s Pre-train model
Sana-Video-2B720pSana-Video_2B_720pEfficient-Large-Model/SANA-Video_2B_720p_diffusersbf165s 720p model (LTX2 VAE)
LongSANA-Video-2B480pSANA-Video_2B_480p_LongLiveEfficient-Large-Model/SANA-Video_2B_480p_LongLive_diffusersbf1627FPS Minute-length model
LongSANA-Video-2B-ODE-Init480pLongSANA_2B_480p_ode---bf16LongSANA first step model initialized from ODE trajectories
LongSANA-Video-2B-Self-Forcing480pLongSANA_2B_480p_self_forcing---bf16LongSANA second step model trained by Self-Forcing

โ— 2. Make sure to use correct precision(fp16/bf16/fp32) for training and inference.

We provide two samples to use fp16 and bf16 weights, respectively.

โ—๏ธMake sure to set variant and torch_dtype in diffusers pipelines to the desired precision.

1). For fp16 models

# run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers
import torch
from diffusers import SanaPipeline

pipe = SanaPipeline.from_pretrained(
    "Efficient-Large-Model/Sana_1600M_1024px_diffusers",
    variant="fp16",
    torch_dtype=torch.float16,
)
pipe.to("cuda")

pipe.vae.to(torch.bfloat16)
pipe.text_encoder.to(torch.bfloat16)

prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    guidance_scale=5.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]

image[0].save("sana.png")

2). For bf16 models

# run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers
import torch
from diffusers import SanaPAGPipeline

pipe = SanaPAGPipeline.from_pretrained(
  "Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers",
  variant="bf16",
  torch_dtype=torch.bfloat16,
  pag_applied_layers="transformer_blocks.8",
)
pipe.to("cuda")

pipe.text_encoder.to(torch.bfloat16)
pipe.vae.to(torch.bfloat16)

prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
    prompt=prompt,
    guidance_scale=5.0,
    pag_scale=2.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]
image[0].save('sana.png')

โ— 3. 2K & 4K models

4K models need VAE tiling to avoid OOM issue.(16 GPU is recommended)

# run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers
import torch
from diffusers import SanaPipeline

# 2K model: Efficient-Large-Model/Sana_1600M_2Kpx_BF16_diffusers
# 4K model:Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers
pipe = SanaPipeline.from_pretrained(
    "Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers",
    variant="bf16",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

pipe.vae.to(torch.bfloat16)
pipe.text_encoder.to(torch.bfloat16)

# for 4096x4096 image generation OOM issue, feel free adjust the tile size
if pipe.transformer.config.sample_size == 128:
    pipe.vae.enable_tiling(
        tile_sample_min_height=1024,
        tile_sample_min_width=1024,
        tile_sample_stride_height=896,
        tile_sample_stride_width=896,
    )
prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
    prompt=prompt,
    height=4096,
    width=4096,
    guidance_scale=5.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]

image[0].save("sana_4K.png")

โ— 4. int4 inference

This int4 model is quantized with SVDQuant-Nunchaku. You need first follow the guidance of installation of nunchaku engine, then you can use the following code snippet to perform inference with int4 Sana model.

Here we show the code snippet for SanaPipeline. For SanaPAGPipeline, please refer to the SanaPAGPipeline section.

import torch
from diffusers import SanaPipeline

from nunchaku.models.transformer_sana import NunchakuSanaTransformer2DModel

transformer = NunchakuSanaTransformer2DModel.from_pretrained("mit-han-lab/svdq-int4-sana-1600m")
pipe = SanaPipeline.from_pretrained(
    "Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers",
    transformer=transformer,
    variant="bf16",
    torch_dtype=torch.bfloat16,
).to("cuda")

pipe.text_encoder.to(torch.bfloat16)
pipe.vae.to(torch.bfloat16)

image = pipe(
    prompt="A cute ๐Ÿผ eating ๐ŸŽ‹, ink drawing style",
    height=1024,
    width=1024,
    guidance_scale=4.5,
    num_inference_steps=20,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("sana_1600m.png")

๐Ÿ”ง 5. Convert .pth to diffusers .safetensor

python tools/convert_scripts/convert_sana_to_diffusers.py \
      --orig_ckpt_path Efficient-Large-Model/Sana_1600M_1024px_BF16/checkpoints/Sana_1600M_1024px_BF16.pth \
      --model_type SanaMS_1600M_P1_D20 \
      --dtype bf16 \
      --dump_path output/Sana_1600M_1024px_BF16_diffusers \
      --save_full_pipeline