MindSpore ONE

December 24, 2025 · View on GitHub

This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation.

ONE is short for "ONE for all"

News

[2025.12.24] We release v0.5.0, compatibility with 🤗 Transformers v4.57.1 (70+ new models) and 🤗 Diffusers v0.35.2, plus previews of v0.36 pipelines like Flux2, QwenImageEditPlus, Lucy and Kandinsky5. Also introduces initial ComfyUI integration. Happy exploring!
[2025.11.02] v0.4.0 is released, with 280+ transformers models and 70+ diffusers pipelines supported. See here
[2025.04.10] We release v0.3.0. More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B, CogVideoX 5B~30B. Have fun!
[2025.02.21] We support DeepSeek Janus-Pro, a SoTA multimodal understanding and generation model. See here
[2024.11.06] v0.2.0 is released

Quick tour

To install v0.5.0, please install MindSpore 2.6.0 - 2.7.1 and run pip install mindone

Alternatively, to install the latest version from the master branch, please run:

git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .

We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using Stable Diffusion 3 as an example.

Hello MindSpore from Stable Diffusion 3!

import mindspore
from mindone.diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")

run hf diffusers on mindspore

mindone diffusers is under active development, most tasks were tested with MindSpore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
compatible with 🤗 diffusers v0.35.2, preview supports for SoTA v0.36 pipelines, see support list
18+ training examples - controlnet, dreambooth, lora and more

run hf transformers on mindspore

mindone transformers is under active development, most tasks were tested with mindspore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
compatibale with 🤗 transformers v4.57.1
providing 350+ state-of-the-art machine learning models in text, computer vision, audio, video, and multimodal model for inference, see support list

supported models under mindone/examples

task	model	inference	finetune	pretrain	institute
Text/Image-to-Video	wan2.1 🔥	✅	✖️	✖️	Alibaba
Text/Image-to-Video	wan2.2 🔥🔥	✅	✅	✖️	Alibaba
Audio/Image-Text-to-Text	qwen2_5_omni 🔥🔥	✅	✅	✖️	Alibaba
Image/Video-Text-to-Text	qwen2_5_vl 🔥🔥	✅	✅	✖️	Alibaba
Any-to-Any	qwen3_omni_moe 🔥🔥🔥	✅	✖️	✖️	Alibaba
Image-Text-to-Text	qwen3_vl/qwen3_vl_moe 🔥🔥🔥	✅	✖️	✖️	Alibaba
Text-to-Image	qwen_image 🔥🔥🔥	✅	✅	✖️	Alibaba
Text-to-Text	minicpm 🔥🔥	✅	✖️	✖️	OpenBMB
Any-to-Any	janus	✅	✅	✅	DeepSeek
Any-to-Any	emu3	✅	✅	✅	BAAI
Class-to-Image	var	✅	✅	✅	ByteDance
Text-to-Image	omnigen2 🔥	✅	✅	✖️	VectorSpaceLab
Text/Image-to-Video	hpcai open sora 1.2/2.0	✅	✅	✅	HPC-AI Tech
Text/Image-to-Video	cogvideox 1.5 5B~30B	✅	✅	✅	Zhipu
Image/Text-to-Text	glm4v 🔥	✅	✖️	✖️	Zhipu
Text-to-Video	open sora plan 1.3	✅	✅	✅	PKU
Text-to-Video	hunyuanvideo	✅	✅	✅	Tencent
Image-to-Video	hunyuanvideo-i2v 🔥	✅	✖️	✖️	Tencent
Text-to-Video	movie gen 30B	✅	✅	✅	Meta
Segmentation	lang_sam 🔥	✅	✖️	✖️	Meta
Segmentation	sam2	✅	✖️	✖️	Meta
Text-to-Video	step_video_t2v	✅	✖️	✖️	StepFun
Text-to-Speech	sparktts	✅	✖️	✖️	Spark Audio
Text-to-Image	flux	✅	✅	✖️	Black Forest Lab
Text-to-Image	stable diffusion 3	✅	✅	✖️	Stability AI

supported captioner

task	model	inference	finetune	pretrain	features
Image-Text-to-Text	pllava	✅	✖️	✖️	support video and image captioning

training-free acceleration

Introduce dit infer acceleration - DiTCache, PromptGate and FBCache with Taylorseer, tested on sd3 and flux.1.