ACE-Step API Client Documentation

March 30, 2026 · View on GitHub

Language / 语言 / 言語: English | 中文 | 日本語


This service provides an HTTP-based asynchronous music generation API.

Basic Workflow:

  1. Call POST /release_task to submit a task and obtain a task_id.
  2. Call POST /query_result to batch query task status until status is 1 (succeeded) or 2 (failed).
  3. Download audio files via GET /v1/audio?path=... URLs returned in the result.

Table of Contents


1. Authentication

The API supports optional API key authentication. When enabled, a valid key must be provided in requests.

Authentication Methods

Two authentication methods are supported:

Method A: ai_token in request body

{
  "ai_token": "your-api-key",
  "prompt": "upbeat pop song",
  ...
}

Method B: Authorization header

curl -X POST http://localhost:8001/release_task \
  -H 'Authorization: Bearer your-api-key' \
  -H 'Content-Type: application/json' \
  -d '{"prompt": "upbeat pop song"}'

Configuring API Key

Set via environment variable or command-line argument:

# Environment variable
export ACESTEP_API_KEY=your-secret-key

# Or command-line argument
python -m acestep.api_server --api-key your-secret-key

2. Response Format

All API responses use a unified wrapper format:

{
  "data": { ... },
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}
FieldTypeDescription
dataanyActual response data
codeintStatus code (200=success)
errorstringError message (null on success)
timestampintResponse timestamp (milliseconds)
extraanyExtra information (usually null)

3. Task Status Description

Task status (status) is represented as integers:

Status CodeStatus NameDescription
0queued/runningTask is queued or in progress
1succeededGeneration succeeded, result is ready
2failedGeneration failed

4. Create Generation Task

4.1 API Definition

  • URL: /release_task
  • Method: POST
  • Content-Type: application/json, multipart/form-data, or application/x-www-form-urlencoded

4.2 Request Parameters

Parameter Naming Convention

The API supports both snake_case and camelCase naming for most parameters. For example:

  • audio_duration / duration / audioDuration
  • key_scale / keyscale / keyScale
  • time_signature / timesignature / timeSignature
  • sample_query / sampleQuery / description / desc
  • use_format / useFormat / format

Additionally, metadata can be passed in a nested object (metas, metadata, or user_metadata).

Method A: JSON Request (application/json)

Suitable for passing only text parameters, or referencing audio file paths that already exist on the server.

Basic Parameters:

Parameter NameTypeDefaultDescription
promptstring""Music description prompt (alias: caption)
lyricsstring""Lyrics content
thinkingboolfalseWhether to use 5Hz LM to generate audio codes (lm-dit behavior)
vocal_languagestring"en"Lyrics language (en, zh, ja, etc.)
audio_formatstring"mp3"Output format: flac, mp3, opus, aac, wav, wav32

Sample/Description Mode Parameters:

Parameter NameTypeDefaultDescription
sample_modeboolfalseEnable random sample generation mode (auto-generates caption/lyrics/metas via LM)
sample_querystring""Natural language description for sample generation (e.g., "a soft Bengali love song"). Aliases: description, desc
use_formatboolfalseUse LM to enhance/format the provided caption and lyrics. Alias: format

Multi-Model Support:

Parameter NameTypeDefaultDescription
modelstringnullSelect which DiT model to use (e.g., "acestep-v15-turbo", "acestep-v15-turbo-shift3"). Use /v1/models to list available models. If not specified, uses the default model.

thinking Semantics (Important):

  • thinking=false:
    • The server will NOT use 5Hz LM to generate audio_code_string.
    • DiT runs in text2music mode and ignores any provided audio_code_string.
  • thinking=true:
    • The server will use 5Hz LM to generate audio_code_string (lm-dit behavior).
    • DiT runs with LM-generated codes for enhanced music quality.

Note: The LM is automatically skipped for cover, repaint, and extract task types, even if thinking=true is set. These tasks work directly with source audio and do not benefit from LM planning. Setting thinking=true has no effect for these tasks. The LM is only used when the task type is text2music, lego, or complete.

Metadata Auto-Completion (Conditional):

When use_cot_caption=true or use_cot_language=true or metadata fields are missing, the server may call 5Hz LM to fill the missing fields based on caption/lyrics:

  • bpm
  • key_scale
  • time_signature
  • audio_duration

User-provided values always win; LM only fills the fields that are empty/missing.

Music Attribute Parameters:

Parameter NameTypeDefaultDescription
bpmintnullSpecify tempo (BPM), range 30-300
key_scalestring""Key/scale (e.g., "C Major", "Am"). Aliases: keyscale, keyScale
time_signaturestring""Time signature (2, 3, 4, 6 for 2/4, 3/4, 4/4, 6/8). Aliases: timesignature, timeSignature
audio_durationfloatnullGeneration duration (seconds), range 10-600. Aliases: duration, target_duration

Audio Codes (Optional):

Parameter NameTypeDefaultDescription
audio_code_stringstring or string[]""Audio semantic tokens (5Hz) for llm_dit. Alias: audioCodeString

Generation Control Parameters:

Parameter NameTypeDefaultDescription
inference_stepsint8Number of inference steps. Turbo model: 1-20 (recommended 8). Base model: 1-200 (recommended 32-64).
guidance_scalefloat7.0Prompt guidance coefficient. Only effective for base model.
use_random_seedbooltrueWhether to use random seed
seedint-1Specify seed (when use_random_seed=false)
batch_sizeint2Batch generation count (max 8)

Advanced DiT Parameters:

Parameter NameTypeDefaultDescription
shiftfloat3.0Timestep shift factor (range 1.0-5.0). Only effective for base models, not turbo models.
infer_methodstring"ode"Diffusion inference method: "ode" (Euler, faster) or "sde" (stochastic).
timestepsstringnullCustom timesteps as comma-separated values (e.g., "0.97,0.76,0.615,0.5,0.395,0.28,0.18,0.085,0"). Overrides inference_steps and shift.
use_adgboolfalseUse Adaptive Dual Guidance (base model only)
cfg_interval_startfloat0.0CFG application start ratio (0.0-1.0)
cfg_interval_endfloat1.0CFG application end ratio (0.0-1.0)

5Hz LM Parameters (Optional, server-side):

These parameters control 5Hz LM sampling, used for metadata auto-completion and (when thinking=true) codes generation.

Parameter NameTypeDefaultDescription
lm_model_pathstringnull5Hz LM checkpoint dir name (e.g. acestep-5Hz-lm-0.6B)
lm_backendstring"vllm"vllm or pt
lm_temperaturefloat0.85Sampling temperature
lm_cfg_scalefloat2.5CFG scale (>1 enables CFG)
lm_negative_promptstring"NO USER INPUT"Negative prompt used by CFG
lm_top_kintnullTop-k (0/null disables)
lm_top_pfloat0.9Top-p (>=1 will be treated as disabled)
lm_repetition_penaltyfloat1.0Repetition penalty

LM CoT (Chain-of-Thought) Parameters:

Parameter NameTypeDefaultDescription
use_cot_captionbooltrueLet LM rewrite/enhance the input caption via CoT reasoning. Aliases: cot_caption, cot-caption
use_cot_languagebooltrueLet LM detect vocal language via CoT. Aliases: cot_language, cot-language
constrained_decodingbooltrueEnable FSM-based constrained decoding for structured LM output. Aliases: constrainedDecoding, constrained
constrained_decoding_debugboolfalseEnable debug logging for constrained decoding
allow_lm_batchbooltrueAllow LM batch processing for efficiency

Edit/Reference Audio Parameters (requires absolute path on server):

Parameter NameTypeDefaultDescription
reference_audio_pathstringnullReference audio path (Style Transfer)
src_audio_pathstringnullSource audio path (Repainting/Cover)
task_typestring"text2music"Task type: text2music, cover, repaint, lego, extract, complete
instructionstringautoEdit instruction (auto-generated based on task_type if not provided)
repainting_startfloat0.0Repainting start time (seconds)
repainting_endfloatnullRepainting end time (seconds), -1 for end of audio
audio_cover_strengthfloat1.0Cover strength (0.0-1.0). Lower values (0.2) for style transfer.

Method B: File Upload (multipart/form-data)

Use this when you need to upload local audio files as reference or source audio.

In addition to supporting all the above fields as Form Fields, the following file fields are also supported:

  • reference_audio or ref_audio: (File) Upload reference audio file
  • src_audio or ctx_audio: (File) Upload source audio file

Note: After uploading files, the corresponding _path parameters will be automatically ignored, and the system will use the temporary file path after upload.

4.3 Response Example

{
  "data": {
    "task_id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "queued",
    "queue_position": 1
  },
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}

4.4 Usage Examples (cURL)

Basic JSON Method:

curl -X POST http://localhost:8001/release_task \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "upbeat pop song",
    "lyrics": "Hello world",
    "inference_steps": 8
  }'

With thinking=true (LM generates codes + fills missing metas):

curl -X POST http://localhost:8001/release_task \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "upbeat pop song",
    "lyrics": "Hello world",
    "thinking": true,
    "lm_temperature": 0.85,
    "lm_cfg_scale": 2.5
  }'

Description-driven generation (sample_query):

curl -X POST http://localhost:8001/release_task \
  -H 'Content-Type: application/json' \
  -d '{
    "sample_query": "a soft Bengali love song for a quiet evening",
    "thinking": true
  }'

With format enhancement (use_format=true):

curl -X POST http://localhost:8001/release_task \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "pop rock",
    "lyrics": "[Verse 1]\nWalking down the street...",
    "use_format": true,
    "thinking": true
  }'

Select specific model:

curl -X POST http://localhost:8001/release_task \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "electronic dance music",
    "model": "acestep-v15-turbo",
    "thinking": true
  }'

With custom timesteps:

curl -X POST http://localhost:8001/release_task \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "jazz piano trio",
    "timesteps": "0.97,0.76,0.615,0.5,0.395,0.28,0.18,0.085,0",
    "thinking": true
  }'

File Upload Method:

curl -X POST http://localhost:8001/release_task \
  -F "prompt=remix this song" \
  -F "src_audio=@/path/to/local/song.mp3" \
  -F "task_type=repaint"

5. Batch Query Task Results

5.1 API Definition

  • URL: /query_result
  • Method: POST
  • Content-Type: application/json or application/x-www-form-urlencoded

5.2 Request Parameters

Parameter NameTypeDescription
task_id_liststring (JSON array) or arrayList of task IDs to query

5.3 Response Example

{
  "data": [
    {
      "task_id": "550e8400-e29b-41d4-a716-446655440000",
      "status": 1,
      "result": "[{\"file\": \"/v1/audio?path=...\", \"wave\": \"\", \"status\": 1, \"create_time\": 1700000000, \"env\": \"development\", \"prompt\": \"upbeat pop song\", \"lyrics\": \"Hello world\", \"metas\": {\"bpm\": 120, \"duration\": 30, \"genres\": \"\", \"keyscale\": \"C Major\", \"timesignature\": \"4\"}, \"generation_info\": \"...\", \"seed_value\": \"12345,67890\", \"lm_model\": \"acestep-5Hz-lm-0.6B\", \"dit_model\": \"acestep-v15-turbo\"}]"
    }
  ],
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}

Result Field Description (result is a JSON string, after parsing contains):

FieldTypeDescription
filestringAudio file URL (use with /v1/audio endpoint)
wavestringWaveform data (usually empty)
statusintStatus code (0=in progress, 1=success, 2=failed)
create_timeintCreation time (Unix timestamp)
envstringEnvironment identifier
promptstringPrompt used
lyricsstringLyrics used
metasobjectMetadata (bpm, duration, genres, keyscale, timesignature)
generation_infostringGeneration info summary
seed_valuestringSeed values used (comma-separated)
lm_modelstringLM model name used
dit_modelstringDiT model name used

5.4 Usage Example

curl -X POST http://localhost:8001/query_result \
  -H 'Content-Type: application/json' \
  -d '{
    "task_id_list": ["550e8400-e29b-41d4-a716-446655440000"]
  }'

6. Format Input

6.1 API Definition

  • URL: /format_input
  • Method: POST

This endpoint uses LLM to enhance and format user-provided caption and lyrics.

6.2 Request Parameters

Parameter NameTypeDefaultDescription
promptstring""Music description prompt
lyricsstring""Lyrics content
temperaturefloat0.85LM sampling temperature
param_objstring (JSON)"{}"JSON object containing metadata (duration, bpm, key, time_signature, language)

6.3 Response Example

{
  "data": {
    "caption": "Enhanced music description",
    "lyrics": "Formatted lyrics...",
    "bpm": 120,
    "key_scale": "C Major",
    "time_signature": "4",
    "duration": 180,
    "vocal_language": "en"
  },
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}

6.4 Usage Example

curl -X POST http://localhost:8001/format_input \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "pop rock",
    "lyrics": "Walking down the street",
    "param_obj": "{\"duration\": 180, \"language\": \"en\"}"
  }'

7. Get Random Sample

7.1 API Definition

  • URL: /create_random_sample
  • Method: POST

This endpoint returns random sample parameters from pre-loaded example data for form filling.

7.2 Request Parameters

Parameter NameTypeDefaultDescription
sample_typestring"simple_mode"Sample type: "simple_mode" or "custom_mode"

7.3 Response Example

{
  "data": {
    "caption": "Upbeat pop song with guitar accompaniment",
    "lyrics": "[Verse 1]\nSunshine on my face...",
    "bpm": 120,
    "key_scale": "G Major",
    "time_signature": "4",
    "duration": 180,
    "vocal_language": "en"
  },
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}

7.4 Usage Example

curl -X POST http://localhost:8001/create_random_sample \
  -H 'Content-Type: application/json' \
  -d '{"sample_type": "simple_mode"}'

8. List Available Models

8.1 API Definition

  • URL: /v1/models
  • Method: GET

Returns a list of available DiT models loaded on the server.

8.2 Response Example

{
  "data": {
    "models": [
      {
        "name": "acestep-v15-turbo",
        "is_default": true
      },
      {
        "name": "acestep-v15-turbo-shift3",
        "is_default": false
      }
    ],
    "default_model": "acestep-v15-turbo"
  },
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}

8.3 Usage Example

curl http://localhost:8001/v1/models

9. Initialize or Switch Models

9.1 API Definition

  • URL: /v1/init
  • Method: POST

Initialize or switch DiT and LM models on demand without restarting the server.

9.2 Request Parameters

ParameterTypeDefaultDescription
modelstringnullDiT model name to load (e.g., "acestep-v15-base"). If omitted, re-initializes the current model for the target slot.
slotint (1-3)1Handler slot to initialize. Slots 2 and 3 require ACESTEP_CONFIG_PATH2 / ACESTEP_CONFIG_PATH3 to have been set at startup.
init_llmboolfalseWhether to also initialize the LM in this request.
lm_model_pathstringnullLM model path override (e.g., "acestep-5Hz-lm-1.7B").

9.3 Response Example

{
  "data": {
    "message": "Model initialization completed",
    "slot": 2,
    "loaded_model": "acestep-v15-base",
    "loaded_lm_model": null,
    "models": [
      {"name": "acestep-v15-base", "is_default": false, "is_loaded": true},
      {"name": "acestep-v15-turbo", "is_default": true, "is_loaded": true}
    ],
    "lm_models": [],
    "llm_initialized": false
  },
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}

9.4 Usage Examples

# Initialize default slot (slot 1)
curl -X POST http://localhost:8001/v1/init \
  -H 'Content-Type: application/json' \
  -d '{"model": "acestep-v15-base"}'

# Load a different model into slot 2
curl -X POST http://localhost:8001/v1/init \
  -H 'Content-Type: application/json' \
  -d '{"model": "acestep-v15-base", "slot": 2}'

# Initialize slot 1 with LM
curl -X POST http://localhost:8001/v1/init \
  -H 'Content-Type: application/json' \
  -d '{"model": "acestep-v15-turbo", "init_llm": true, "lm_model_path": "acestep-5Hz-lm-1.7B"}'

Note: Slots 2 and 3 are only available when ACESTEP_CONFIG_PATH2 / ACESTEP_CONFIG_PATH3 environment variables were set before starting the server. Attempting to initialize an unavailable slot returns a 400 error.


10. Server Statistics

10.1 API Definition

  • URL: /v1/stats
  • Method: GET

Returns server runtime statistics.

10.2 Response Example

{
  "data": {
    "jobs": {
      "total": 100,
      "queued": 5,
      "running": 1,
      "succeeded": 90,
      "failed": 4
    },
    "queue_size": 5,
    "queue_maxsize": 200,
    "avg_job_seconds": 8.5
  },
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}

10.3 Usage Example

curl http://localhost:8001/v1/stats

11. Download Audio Files

11.1 API Definition

  • URL: /v1/audio
  • Method: GET

Download generated audio files by path.

11.2 Request Parameters

Parameter NameTypeDescription
pathstringURL-encoded path to the audio file

11.3 Usage Example

# Download using the URL from task result
curl "http://localhost:8001/v1/audio?path=%2Ftmp%2Fapi_audio%2Fabc123.mp3" -o output.mp3

12. Health Check

12.1 API Definition

  • URL: /health
  • Method: GET

Returns service health status.

12.2 Response Example

{
  "data": {
    "status": "ok",
    "service": "ACE-Step API",
    "version": "1.0"
  },
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}

13. Environment Variables

The API server can be configured using environment variables:

Server Configuration

VariableDefaultDescription
ACESTEP_API_HOST127.0.0.1Server bind host
ACESTEP_API_PORT8001Server bind port
ACESTEP_API_KEY(empty)API authentication key (empty disables auth)
ACESTEP_API_WORKERS1API worker thread count

Model Configuration

VariableDefaultDescription
ACESTEP_CONFIG_PATHacestep-v15-turboPrimary DiT model path
ACESTEP_CONFIG_PATH2(empty)Secondary DiT model path (optional)
ACESTEP_CONFIG_PATH3(empty)Third DiT model path (optional)
ACESTEP_DEVICEautoDevice for model loading
ACESTEP_USE_FLASH_ATTENTIONtrueEnable flash attention
ACESTEP_OFFLOAD_TO_CPUfalseOffload models to CPU when idle
ACESTEP_OFFLOAD_DIT_TO_CPUfalseOffload DiT specifically to CPU

LM Configuration

VariableDefaultDescription
ACESTEP_INIT_LLMautoWhether to initialize LM at startup (auto determines based on GPU)
ACESTEP_LM_MODEL_PATHacestep-5Hz-lm-0.6BDefault 5Hz LM model
ACESTEP_LM_BACKENDvllmLM backend (vllm or pt)
ACESTEP_LM_DEVICE(same as ACESTEP_DEVICE)Device for LM
ACESTEP_LM_OFFLOAD_TO_CPUfalseOffload LM to CPU

Queue Configuration

VariableDefaultDescription
ACESTEP_QUEUE_MAXSIZE200Maximum queue size
ACESTEP_QUEUE_WORKERS1Number of queue workers
ACESTEP_AVG_JOB_SECONDS5.0Initial average job duration estimate
ACESTEP_AVG_WINDOW50Window for averaging job duration

Cache Configuration

VariableDefaultDescription
ACESTEP_TMPDIR.cache/acestep/tmpTemporary file directory
TRITON_CACHE_DIR.cache/acestep/tritonTriton cache directory
TORCHINDUCTOR_CACHE_DIR.cache/acestep/torchinductorTorchInductor cache directory

Training API

The API server exposes endpoints for fine-tuning adapters from preprocessed tensor datasets. Training runs asynchronously in the background; use the status and stop endpoints to monitor and control training.

LoRA Training

  • URL: /v1/training/start
  • Method: POST

Starts a LoRA training run. See the LoRA Training Tutorial for parameter details.

LoKr Training

  • URL: /v1/training/start_lokr
  • Method: POST

Starts a LoKr (Kronecker) training run. LoKr is a faster alternative to LoRA that uses Kronecker decomposition.

LoKr-specific parameters:

ParameterTypeDefaultDescription
tensor_dirstring(required)Directory with preprocessed tensors
output_dirstring"./lokr_output"Output directory for checkpoints
lokr_linear_dimint64Linear dimension (1-256)
lokr_linear_alphaint128Linear alpha (1-512)
lokr_factorint-1Kronecker factor (-1 = auto, otherwise 1-8)
lokr_decompose_bothboolfalseDecompose both matrices
lokr_use_tuckerboolfalseUse Tucker decomposition
lokr_use_scalarboolfalseUse scalar calibration
lokr_weight_decomposebooltrueEnable DoRA mode
learning_ratefloat0.03Learning rate
train_epochsint500Training epochs
train_batch_sizeint1Batch size
gradient_accumulationint4Gradient accumulation steps
save_every_n_epochsint5Checkpoint save frequency
training_shiftfloat3.0Timestep shift
training_seedint42Random seed
gradient_checkpointingboolfalseTrade speed for lower VRAM

Usage example:

curl -X POST http://localhost:8001/v1/training/start_lokr \
  -H 'Content-Type: application/json' \
  -d '{
    "tensor_dir": "/path/to/tensors",
    "output_dir": "./lokr_output",
    "lokr_linear_dim": 64,
    "lokr_linear_alpha": 128,
    "learning_rate": 0.03,
    "train_epochs": 500
  }'

Error Handling

HTTP Status Codes:

  • 200: Success
  • 400: Invalid request (bad JSON, missing fields)
  • 401: Unauthorized (missing or invalid API key)
  • 404: Resource not found
  • 415: Unsupported Content-Type
  • 429: Server busy (queue is full)
  • 500: Internal server error

Error Response Format:

{
  "detail": "Error message describing the issue"
}

Best Practices

  1. Use thinking=true for best quality results with LM-enhanced generation.

  2. Use sample_query/description for quick generation from natural language descriptions.

  3. Use use_format=true when you have caption/lyrics but want LM to enhance them.

  4. Batch query task status using the /query_result endpoint to query multiple tasks at once.

  5. Check /v1/stats to understand server load and average job time.

  6. Use multi-model support by setting ACESTEP_CONFIG_PATH2 and ACESTEP_CONFIG_PATH3 environment variables, then select with the model parameter.

  7. For production, set ACESTEP_API_KEY to enable authentication and secure your API.

  8. For low VRAM environments, enable ACESTEP_OFFLOAD_TO_CPU=true to support longer audio generation.