MindSpore ONE

December 24, 2025 ยท View on GitHub

This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation.

ONE is short for "ONE for all"

News

  • [2025.12.24] We release v0.5.0, compatibility with ๐Ÿค— Transformers v4.57.1 (70+ new models) and ๐Ÿค— Diffusers v0.35.2, plus previews of v0.36 pipelines like Flux2, QwenImageEditPlus, Lucy and Kandinsky5. Also introduces initial ComfyUI integration. Happy exploring!
  • [2025.11.02] v0.4.0 is released, with 280+ transformers models and 70+ diffusers pipelines supported. See here
  • [2025.04.10] We release v0.3.0. More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B, CogVideoX 5B~30B. Have fun!
  • [2025.02.21] We support DeepSeek Janus-Pro, a SoTA multimodal understanding and generation model. See here
  • [2024.11.06] v0.2.0 is released

Quick tour

To install v0.5.0, please install MindSpore 2.6.0 - 2.7.1 and run pip install mindone

Alternatively, to install the latest version from the master branch, please run:

git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .

We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using Stable Diffusion 3 as an example.

Hello MindSpore from Stable Diffusion 3!

sd3
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")

run hf diffusers on mindspore

  • mindone diffusers is under active development, most tasks were tested with MindSpore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
  • compatible with ๐Ÿค— diffusers v0.35.2, preview supports for SoTA v0.36 pipelines, see support list
  • 18+ training examples - controlnet, dreambooth, lora and more

run hf transformers on mindspore

  • mindone transformers is under active development, most tasks were tested with mindspore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
  • compatibale with ๐Ÿค— transformers v4.57.1
  • providing 350+ state-of-the-art machine learning models in text, computer vision, audio, video, and multimodal model for inference, see support list

supported models under mindone/examples

taskmodelinferencefinetunepretraininstitute
Text/Image-to-Videowan2.1 ๐Ÿ”ฅโœ…โœ–๏ธโœ–๏ธAlibaba
Text/Image-to-Videowan2.2 ๐Ÿ”ฅ๐Ÿ”ฅโœ…โœ…โœ–๏ธAlibaba
Audio/Image-Text-to-Textqwen2_5_omni ๐Ÿ”ฅ๐Ÿ”ฅโœ…โœ…โœ–๏ธAlibaba
Image/Video-Text-to-Textqwen2_5_vl ๐Ÿ”ฅ๐Ÿ”ฅโœ…โœ…โœ–๏ธAlibaba
Any-to-Anyqwen3_omni_moe ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅโœ…โœ–๏ธโœ–๏ธAlibaba
Image-Text-to-Textqwen3_vl/qwen3_vl_moe ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅโœ…โœ–๏ธโœ–๏ธAlibaba
Text-to-Imageqwen_image ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅโœ…โœ…โœ–๏ธAlibaba
Text-to-Textminicpm ๐Ÿ”ฅ๐Ÿ”ฅโœ…โœ–๏ธโœ–๏ธOpenBMB
Any-to-Anyjanusโœ…โœ…โœ…DeepSeek
Any-to-Anyemu3โœ…โœ…โœ…BAAI
Class-to-Imagevarโœ…โœ…โœ…ByteDance
Text-to-Imageomnigen2 ๐Ÿ”ฅโœ…โœ…โœ–๏ธVectorSpaceLab
Text/Image-to-Videohpcai open sora 1.2/2.0โœ…โœ…โœ…HPC-AI Tech
Text/Image-to-Videocogvideox 1.5 5B~30B โœ…โœ…โœ…Zhipu
Image/Text-to-Textglm4v ๐Ÿ”ฅโœ…โœ–๏ธโœ–๏ธZhipu
Text-to-Videoopen sora plan 1.3โœ…โœ…โœ…PKU
Text-to-Videohunyuanvideoโœ…โœ…โœ…Tencent
Image-to-Videohunyuanvideo-i2v ๐Ÿ”ฅโœ…โœ–๏ธโœ–๏ธTencent
Text-to-Videomovie gen 30Bโœ…โœ…โœ…Meta
Segmentationlang_sam ๐Ÿ”ฅโœ…โœ–๏ธโœ–๏ธMeta
Segmentationsam2โœ…โœ–๏ธโœ–๏ธMeta
Text-to-Videostep_video_t2vโœ…โœ–๏ธโœ–๏ธStepFun
Text-to-Speechsparkttsโœ…โœ–๏ธโœ–๏ธSpark Audio
Text-to-Imagefluxโœ…โœ…โœ–๏ธBlack Forest Lab
Text-to-Imagestable diffusion 3โœ…โœ…โœ–๏ธStability AI

supported captioner

taskmodelinferencefinetunepretrainfeatures
Image-Text-to-Textpllavaโœ…โœ–๏ธโœ–๏ธsupport video and image captioning

training-free acceleration

Introduce dit infer acceleration - DiTCache, PromptGate and FBCache with Taylorseer, tested on sd3 and flux.1.