MindSpore ONE
December 24, 2025 ยท View on GitHub
This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation.
ONE is short for "ONE for all"
News
- [2025.12.24] We release v0.5.0, compatibility with ๐ค Transformers v4.57.1 (70+ new models) and ๐ค Diffusers v0.35.2, plus previews of v0.36 pipelines like Flux2, QwenImageEditPlus, Lucy and Kandinsky5. Also introduces initial ComfyUI integration. Happy exploring!
- [2025.11.02] v0.4.0 is released, with 280+ transformers models and 70+ diffusers pipelines supported. See here
- [2025.04.10] We release v0.3.0. More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B, CogVideoX 5B~30B. Have fun!
- [2025.02.21] We support DeepSeek Janus-Pro, a SoTA multimodal understanding and generation model. See here
- [2024.11.06] v0.2.0 is released
Quick tour
To install v0.5.0, please install MindSpore 2.6.0 - 2.7.1 and run pip install mindone
Alternatively, to install the latest version from the master branch, please run:
git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .
We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using Stable Diffusion 3 as an example.
Hello MindSpore from Stable Diffusion 3!
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")
run hf diffusers on mindspore
- mindone diffusers is under active development, most tasks were tested with MindSpore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
- compatible with ๐ค diffusers v0.35.2, preview supports for SoTA v0.36 pipelines, see support list
- 18+ training examples - controlnet, dreambooth, lora and more
run hf transformers on mindspore
- mindone transformers is under active development, most tasks were tested with mindspore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
- compatibale with ๐ค transformers v4.57.1
- providing 350+ state-of-the-art machine learning models in text, computer vision, audio, video, and multimodal model for inference, see support list
supported models under mindone/examples
| task | model | inference | finetune | pretrain | institute |
|---|---|---|---|---|---|
| Text/Image-to-Video | wan2.1 ๐ฅ | โ | โ๏ธ | โ๏ธ | Alibaba |
| Text/Image-to-Video | wan2.2 ๐ฅ๐ฅ | โ | โ | โ๏ธ | Alibaba |
| Audio/Image-Text-to-Text | qwen2_5_omni ๐ฅ๐ฅ | โ | โ | โ๏ธ | Alibaba |
| Image/Video-Text-to-Text | qwen2_5_vl ๐ฅ๐ฅ | โ | โ | โ๏ธ | Alibaba |
| Any-to-Any | qwen3_omni_moe ๐ฅ๐ฅ๐ฅ | โ | โ๏ธ | โ๏ธ | Alibaba |
| Image-Text-to-Text | qwen3_vl/qwen3_vl_moe ๐ฅ๐ฅ๐ฅ | โ | โ๏ธ | โ๏ธ | Alibaba |
| Text-to-Image | qwen_image ๐ฅ๐ฅ๐ฅ | โ | โ | โ๏ธ | Alibaba |
| Text-to-Text | minicpm ๐ฅ๐ฅ | โ | โ๏ธ | โ๏ธ | OpenBMB |
| Any-to-Any | janus | โ | โ | โ | DeepSeek |
| Any-to-Any | emu3 | โ | โ | โ | BAAI |
| Class-to-Image | var | โ | โ | โ | ByteDance |
| Text-to-Image | omnigen2 ๐ฅ | โ | โ | โ๏ธ | VectorSpaceLab |
| Text/Image-to-Video | hpcai open sora 1.2/2.0 | โ | โ | โ | HPC-AI Tech |
| Text/Image-to-Video | cogvideox 1.5 5B~30B | โ | โ | โ | Zhipu |
| Image/Text-to-Text | glm4v ๐ฅ | โ | โ๏ธ | โ๏ธ | Zhipu |
| Text-to-Video | open sora plan 1.3 | โ | โ | โ | PKU |
| Text-to-Video | hunyuanvideo | โ | โ | โ | Tencent |
| Image-to-Video | hunyuanvideo-i2v ๐ฅ | โ | โ๏ธ | โ๏ธ | Tencent |
| Text-to-Video | movie gen 30B | โ | โ | โ | Meta |
| Segmentation | lang_sam ๐ฅ | โ | โ๏ธ | โ๏ธ | Meta |
| Segmentation | sam2 | โ | โ๏ธ | โ๏ธ | Meta |
| Text-to-Video | step_video_t2v | โ | โ๏ธ | โ๏ธ | StepFun |
| Text-to-Speech | sparktts | โ | โ๏ธ | โ๏ธ | Spark Audio |
| Text-to-Image | flux | โ | โ | โ๏ธ | Black Forest Lab |
| Text-to-Image | stable diffusion 3 | โ | โ | โ๏ธ | Stability AI |
supported captioner
| task | model | inference | finetune | pretrain | features |
|---|---|---|---|---|---|
| Image-Text-to-Text | pllava | โ | โ๏ธ | โ๏ธ | support video and image captioning |
training-free acceleration
Introduce dit infer acceleration - DiTCache, PromptGate and FBCache with Taylorseer, tested on sd3 and flux.1.