README.md

May 1, 2026 ยท View on GitHub

ACE-Step 1.5

Pushing the Boundaries of Open-Source Music Generation

ACEMusic | Project | Hugging Face | ModelScope | Space Demo | Discord | Technical Report | Awesome ACE-Step

StepFun Logo ย ย  ACEMusic - Try ACE-Step Online

๐Ÿ“ฐ News

๐ŸŽต Want a faster & more stable experience? Try acemusic.ai โ€” 100% free!

  • [2026-04-02] ๐ŸŽ‰ ACE-Step 1.5 XL (4B DiT) Released! โ€” We introduce the XL series with a 4B-parameter DiT decoder for higher audio quality. Three variants available: xl-base, xl-sft, xl-turbo. Requires โ‰ฅ12GB VRAM (with offload), โ‰ฅ20GB recommended. All LM models fully compatible. See Model Zoo for details.

Table of Contents

๐Ÿ“ Abstract

๐Ÿš€ We present ACE-Step v1.5, a highly efficient open-source music foundation model that brings commercial-grade generation to consumer hardware. On commonly used evaluation metrics, ACE-Step v1.5 achieves quality beyond most commercial music models while remaining extremely fastโ€”under 2 seconds per full song on an A100 and under 10 seconds on an RTX 3090. The model runs locally with less than 4GB of VRAM, and supports lightweight personalization: users can train a LoRA from just a few songs to capture their own style.

๐ŸŒ‰ At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprintsโ€”scaling from short loops to 10-minute compositionsโ€”while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). โšก Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model's internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. ๐ŸŽš๏ธ

๐Ÿ”ฎ Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilitiesโ€”such as cover generation, repainting, and vocal-to-BGM conversionโ€”while maintaining strict adherence to prompts across 50+ languages. This paves the way for powerful tools that seamlessly integrate into the creative workflows of music artists, producers, and content creators. ๐ŸŽธ

โœจ Features

ACE-Step Framework

โšก Performance

  • โœ… Ultra-Fast Generation โ€” Under 2s per full song on A100, under 10s on RTX 3090 (0.5s to 10s on A100 depending on think mode & diffusion steps)
  • โœ… Flexible Duration โ€” Supports 10 seconds to 10 minutes (600s) audio generation
  • โœ… Batch Generation โ€” Generate up to 8 songs simultaneously

๐ŸŽต Generation Quality

  • โœ… Commercial-Grade Output โ€” Quality beyond most commercial music models (between Suno v4.5 and Suno v5)
  • โœ… Rich Style Support โ€” 1000+ instruments and styles with fine-grained timbre description
  • โœ… Multi-Language Lyrics โ€” Supports 50+ languages with lyrics prompt for structure & style control

๐ŸŽ›๏ธ Versatility & Control

FeatureDescription
โœ… Reference Audio InputUse reference audio to guide generation style
โœ… Cover GenerationCreate covers from existing audio
โœ… Repaint & EditSelective local audio editing and regeneration
โœ… Track SeparationSeparate audio into individual stems
โœ… Multi-Track GenerationAdd layers like Suno Studio's "Add Layer" feature
โœ… Vocal2BGMAuto-generate accompaniment for vocal tracks
โœ… Metadata ControlControl duration, BPM, key/scale, time signature
โœ… Simple ModeGenerate full songs from simple descriptions
โœ… Query RewritingAuto LM expansion of tags and lyrics
โœ… Audio UnderstandingExtract BPM, key/scale, time signature & caption from audio
โœ… LRC GenerationAuto-generate lyric timestamps for generated music
โœ… LoRA TrainingOne-click annotation & training in Gradio. 8 songs, 1 hour on 3090 (12GB VRAM)
โœ… Quality ScoringAutomatic quality assessment for generated audio

๐Ÿ”” Staying ahead

Star ACE-Step on GitHub and be instantly notified of new releases

๐Ÿค Partners

ComfyUI Zilliz Milvus Zeabur Majik's Music Studio

โšก Quick Start

๐ŸŽต Don't want to install locally? Try acemusic.ai โ€” 100% free, no GPU required!

Requirements: Python 3.11-3.12, CUDA GPU recommended (also supports MPS / ROCm / Intel XPU / CPU)

Note: ROCm on Windows requires Python 3.12 (AMD officially provides Python 3.12 wheels only)

# 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh          # macOS / Linux
# powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"  # Windows

# 2. Clone & install
git clone https://github.com/ACE-Step/ACE-Step-1.5.git
cd ACE-Step-1.5
uv sync

# 3. Launch Gradio UI (models auto-download on first run)
uv run acestep

# Or launch REST API server
uv run acestep-api

Open http://localhost:7860 (Gradio) or http://localhost:8001 (API).

๐Ÿ“ฆ Windows users: A portable package with pre-installed dependencies is available. See Installation Guide.

๐Ÿ“ฆ MacOS users: A portable package with pre-installed dependencies is available. See Installation Guide.

๐Ÿ“– Full installation guide (AMD/ROCm, Intel GPU, CPU, environment variables, command-line options): English | ไธญๆ–‡ | ๆ—ฅๆœฌ่ชž

๐Ÿ’ก Which Model Should I Choose?

Your GPU VRAMRecommended DiTRecommended LM ModelBackendNotes
โ‰ค6GB2B turboNone (DiT only)โ€”LM disabled by default; INT8 quantization + full CPU offload
6-8GB2B turboacestep-5Hz-lm-0.6BptLightweight LM with PyTorch backend
8-16GB2B turbo/sftacestep-5Hz-lm-0.6B / 1.7Bvllm0.6B for 8-12GB, 1.7B for 12-16GB
16-20GB2B sft or XL turboacestep-5Hz-lm-1.7BvllmXL requires CPU offload below 20GB
20-24GBXL turbo/sftacestep-5Hz-lm-1.7BvllmXL fits without offload; 4B LM available
โ‰ฅ24GBXL sft (or xl-base for extract/lego/complete)acestep-5Hz-lm-4BvllmBest quality, all models fit without offload

XL (4B) models (acestep-v15-xl-*) offer higher audio quality with ~9GB VRAM for weights (vs ~4.7GB for 2B). They require โ‰ฅ12GB VRAM (with offload + quantization) or โ‰ฅ20GB (without offload). All LM models are fully compatible with XL.

The UI automatically selects the best configuration for your GPU. All settings (LM model, backend, offloading, quantization) are tier-aware and pre-configured.

๐Ÿ“– GPU compatibility details: English | ไธญๆ–‡ | ๆ—ฅๆœฌ่ชž | ํ•œ๊ตญ์–ด

๐Ÿš€ Launch Scripts

Ready-to-use launch scripts for all platforms with auto environment detection, update checking, and dependency installation.

PlatformScriptsBackend
Windowsstart_gradio_ui.bat, start_api_server.batCUDA
Windows (ROCm)start_gradio_ui_rocm.bat, start_api_server_rocm.batAMD ROCm
Linuxstart_gradio_ui.sh, start_api_server.shCUDA
macOSstart_gradio_ui_macos.sh, start_api_server_macos.shMLX (Apple Silicon)
# Windows
start_gradio_ui.bat

# Linux
chmod +x start_gradio_ui.sh && ./start_gradio_ui.sh

# macOS (Apple Silicon)
chmod +x start_gradio_ui_macos.sh && ./start_gradio_ui_macos.sh

โš™๏ธ Customizing Launch Settings

Recommended: Create a .env file to customize models, ports, and other settings. Your .env configuration will survive repository updates.

# Copy the example file
cp .env.example .env

# Edit with your preferred settings
# Examples in .env:
ACESTEP_CONFIG_PATH=acestep-v15-turbo
ACESTEP_LM_MODEL_PATH=acestep-5Hz-lm-1.7B
PORT=7860
LANGUAGE=en

๐Ÿ“– Script configuration & customization: English | ไธญๆ–‡ | ๆ—ฅๆœฌ่ชž

๐Ÿ“š Documentation

Usage Guides

MethodDescriptionDocumentation
๐Ÿ–ฅ๏ธ Gradio Web UIInteractive web interface for music generationGuide
๐Ÿงญ UI Support BaselineSupported UI boundary and future UI parity checklistGuide
๐ŸŽ›๏ธ VST3 PluginStandalone VST3 plugin (C++/GGML) for DAW integrationacestep.vst3
๐Ÿ Python APIProgrammatic access for integrationGuide
๐ŸŒ REST APIHTTP-based async API for servicesGuide
โŒจ๏ธ CLIInteractive wizard and configurationGuide

Setup & Configuration

TopicDocumentation
๐Ÿ“ฆ Installation (all platforms)English | ไธญๆ–‡ | ๆ—ฅๆœฌ่ชž
๐ŸŽฎ GPU CompatibilityEnglish | ไธญๆ–‡ | ๆ—ฅๆœฌ่ชž
๐Ÿ”ง GPU TroubleshootingEnglish
๐Ÿ”ฌ Benchmark & ProfilingEnglish | ไธญๆ–‡

Multi-Language Docs

LanguageAPIGradioInferenceTutorialLoRA TrainingInstallBenchmark
๐Ÿ‡บ๐Ÿ‡ธ EnglishLinkLinkLinkLinkLinkLinkLink
๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡LinkLinkLinkLinkLinkLinkLink
๐Ÿ‡ฏ๐Ÿ‡ต ๆ—ฅๆœฌ่ชžLinkLinkLinkLinkLinkLinkโ€”
๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ดLinkLinkLinkLinkLinkโ€”โ€”

๐Ÿ“– Tutorial

๐ŸŽฏ Must Read: Comprehensive guide to ACE-Step 1.5's design philosophy and usage methods.

LanguageLink
๐Ÿ‡บ๐Ÿ‡ธ EnglishEnglish Tutorial
๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡ไธญๆ–‡ๆ•™็จ‹
๐Ÿ‡ฏ๐Ÿ‡ต ๆ—ฅๆœฌ่ชžๆ—ฅๆœฌ่ชžใƒใƒฅใƒผใƒˆใƒชใ‚ขใƒซ

This tutorial covers: mental models and design philosophy, model architecture and selection, input control (text and audio), inference hyperparameters, random factors and optimization strategies.

๐Ÿ”จ Train

๐Ÿ“– LoRA Training Tutorial โ€” step-by-step guide covering data preparation, annotation, preprocessing, and training:

LanguageLink
๐Ÿ‡บ๐Ÿ‡ธ EnglishLoRA Training Tutorial
๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡LoRA ่ฎญ็ปƒๆ•™็จ‹
๐Ÿ‡ฏ๐Ÿ‡ต ๆ—ฅๆœฌ่ชžLoRA ใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐใƒใƒฅใƒผใƒˆใƒชใ‚ขใƒซ
๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ดLoRA ํ•™์Šต ํŠœํ† ๋ฆฌ์–ผ

See also the LoRA Training tab in Gradio UI for one-click training, or Gradio Guide - LoRA Training for UI reference.

๐Ÿ”ง Advanced Training with Side-Step โ€” CLI-based training toolkit with corrected timestep sampling, LoKR adapters, VRAM optimization, gradient sensitivity analysis, and more. See the Side-Step documentation.

๐Ÿ—๏ธ Architecture

ACE-Step Framework

๐Ÿฆ Model Zoo

Model Zoo

DiT Models

DiT ModelPre-TrainingSFTRLCFGStepRefer audioText2MusicCoverRepaintExtractLegoCompleteQualityDiversityFine-TunabilityHugging Face
acestep-v15-baseโœ…โŒโŒโœ…50โœ…โœ…โœ…โœ…โœ…โœ…โœ…MediumHighEasyLink
acestep-v15-sftโœ…โœ…โŒโœ…50โœ…โœ…โœ…โœ…โŒโŒโŒHighMediumEasyLink
acestep-v15-turboโœ…โœ…โŒโŒ8โœ…โœ…โœ…โœ…โŒโŒโŒVery HighMediumMediumLink

XL (4B) DiT Models

XL models use a larger 4B-parameter DiT decoder (~9GB bf16) for higher audio quality. They require โ‰ฅ12GB VRAM (with offload + quantization) or โ‰ฅ20GB (without offload). All LM models are fully compatible.

DiT ModelPre-TrainingSFTRLCFGStepRefer audioText2MusicCoverRepaintExtractLegoCompleteQualityDiversityFine-TunabilityHugging Face
acestep-v15-xl-baseโœ…โŒโŒโœ…50โœ…โœ…โœ…โœ…โœ…โœ…โœ…HighHighEasyLink
acestep-v15-xl-sftโœ…โœ…โŒโœ…50โœ…โœ…โœ…โœ…โŒโŒโŒVery HighMediumEasyLink
acestep-v15-xl-turboโœ…โœ…โŒโŒ8โœ…โœ…โœ…โœ…โŒโŒโŒVery HighMediumMediumLink

LM Models

LM ModelPretrain fromPre-TrainingSFTRLCoT metasQuery rewriteAudio UnderstandingComposition CapabilityCopy MelodyHugging Face
acestep-5Hz-lm-0.6BQwen3-0.6Bโœ…โœ…โœ…โœ…โœ…MediumMediumWeakโœ…
acestep-5Hz-lm-1.7BQwen3-1.7Bโœ…โœ…โœ…โœ…โœ…MediumMediumMediumโœ…
acestep-5Hz-lm-4BQwen3-4Bโœ…โœ…โœ…โœ…โœ…StrongStrongStrongโœ…

๐Ÿ”ฌ Benchmark

ACE-Step 1.5 includes profile_inference.py, a profiling & benchmarking tool that measures LLM, DiT, and VAE timing across devices and configurations.

python profile_inference.py                        # Single-run profile
python profile_inference.py --mode benchmark       # Configuration matrix

๐Ÿ“– Full guide (all modes, CLI options, output interpretation): English | ไธญๆ–‡

๐Ÿ“œ License & Disclaimer

This project is licensed under MIT

ACE-Step enables original music generation across diverse genres, with applications in creative production, education, and entertainment. While designed to support positive and artistic use cases, we acknowledge potential risks such as unintentional copyright infringement due to stylistic similarity, inappropriate blending of cultural elements, and misuse for generating harmful content. To ensure responsible use, we encourage users to verify the originality of generated works, clearly disclose AI involvement, and obtain appropriate permissions when adapting protected styles or materials. By using ACE-Step, you agree to uphold these principles and respect artistic integrity, cultural diversity, and legal compliance. The authors are not responsible for any misuse of the model, including but not limited to copyright violations, cultural insensitivity, or the generation of harmful content.

๐Ÿ”” Important Notice
The only official website for the ACE-Step project is our GitHub Pages site.
We do not operate any other websites.
๐Ÿšซ Fake domains include but are not limited to: ac**p.com, a**p.org, a***c.org
โš ๏ธ Please be cautious. Do not visit, trust, or make payments on any of those sites.

๐ŸŒ Community & Ecosystem

Check out Awesome ACE-Step โ€” a curated list of community projects, alternative UIs, ComfyUI nodes, cloud deployments, training tools, and more built around ACE-Step.

๐Ÿ™ Acknowledgements

This project is co-led by ACE Studio and StepFun.

๐Ÿ“– Citation

If you find this project useful for your research, please consider citing:

@misc{gong2026acestep,
	title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
	author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, 
	howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
	year={2026},
	note={GitHub repository}
}