FL AceStep Training

March 1, 2026 · View on GitHub

LoRA training nodes for ComfyUI powered by ACE-Step 1.5, the open-source music generation foundation model. Train custom LoRAs to personalize music generation with your own style, voice, or genre — entirely within ComfyUI's node graph.

ACE-Step ComfyUI Patreon

Workflow Preview

Features

  • End-to-End Training — Full LoRA training pipeline inside ComfyUI's node graph
  • Dataset Management — Scan audio directories, auto-label with LLM, load sidecar metadata
  • Tiled VAE Encoding — Handles long audio via 30-second chunks with 2-second overlap
  • Real-Time Training UI — Live loss chart, progress bar, and stats via WebSocket widget
  • Auto Model Download — LLM models download automatically from HuggingFace on first use
  • Native ComfyUI Types — Uses MODEL, VAE, and CLIP from ComfyUI's built-in checkpoint loader

Nodes

NodeCategoryDescription
FL AceStep LLM LoaderLoadersLoad 5Hz causal LM (0.6B / 1.7B / 4B) for auto-labeling
FL AceStep Scan Audio DirectoryDatasetRecursively scan folders for audio files with sidecar metadata
FL AceStep Auto-Label SamplesDatasetGenerate captions, BPM, key, genre, and lyrics via LLM
FL AceStep Preprocess DatasetDatasetVAE-encode audio and CLIP-encode text, save as .pt tensors
FL AceStep Training ConfigurationTrainingConfigure LoRA rank/alpha/dropout and training hyperparameters
FL AceStep Train LoRATrainingRun flow matching training loop with real-time progress widget

Installation

Manual

cd ComfyUI/custom_nodes
git clone https://github.com/filliptm/ComfyUI-FL-AceStep-Training.git
cd ComfyUI-FL-AceStep-Training
pip install -r requirements.txt

Frontend (optional rebuild)

npm install
npm run build

The pre-built JS is included in js/, so rebuilding is only needed if modifying the training widget UI.

Quick Start

Training Pipeline

  1. Load Checkpoint — Use ComfyUI's native Load Checkpoint node with an ACE-Step model to get MODEL, VAE, and CLIP
  2. Load LLM (optional) — Add FL AceStep LLM Loader if you want auto-labeling
  3. Scan Dataset — Use FL AceStep Scan Audio Directory to find your audio files
  4. Label — Connect MODEL, VAE, and LLM to FL AceStep Auto-Label Samples for LLM-generated metadata
  5. Preprocess — Run FL AceStep Preprocess Dataset with MODEL, VAE, and CLIP to encode audio/text to tensors
  6. Configure — Set LoRA rank, learning rate, epochs in FL AceStep Training Configuration
  7. Train — Connect MODEL and config to FL AceStep Train LoRA and execute

Using Trained LoRAs

Use ComfyUI's native LoRA loading nodes to apply your trained LoRA for inference with the built-in ACE-Step nodes.

Node Details

FL AceStep LLM Loader

Loads one of three 5Hz causal language models for auto-labeling audio samples.

InputTypeDefaultNotes
model_nameDropdownacestep-5Hz-lm-1.7BAlso: 0.6B, 4B
deviceDropdownautoauto / cuda / cpu
backendDropdownptpt / vllm
checkpoint_pathSTRING(empty)Optional, leave empty for auto-download

Output: ACESTEP_LLM

FL AceStep Scan Audio Directory

Recursively scans a directory for audio files and loads accompanying metadata.

InputTypeDefaultNotes
directorySTRINGPath to audio folder
all_instrumentalBOOLEANTrueMark all samples as instrumental
custom_tagSTRING(empty)LoRA activation tag (e.g., my_style)
tag_positionDropdownprependprepend / append / replace

Outputs: ACESTEP_DATASET, sample count, status

FL AceStep Auto-Label Samples

Uses the loaded LLM to generate metadata for each audio sample.

InputTypeDefaultNotes
datasetACESTEP_DATASETFrom Scan Directory
modelMODELACE-Step model (purple)
vaeVAEACE-Step VAE (red) — used for audio-to-codes
llmACESTEP_LLMFrom LLM Loader
skip_metasBOOLEANFalseSkip BPM/key/time signature
only_unlabeledBOOLEANFalseProcess only unlabeled samples
format_lyricsBOOLEANFalseFormat user-provided lyrics with LLM
transcribe_lyricsBOOLEANFalseTranscribe lyrics from audio

Outputs: ACESTEP_DATASET, labeled count, status

FL AceStep Preprocess Dataset

VAE-encodes audio and CLIP-encodes text to .pt tensor files for training.

InputTypeDefaultNotes
datasetACESTEP_DATASETFrom label or scan node
modelMODELACE-Step model (purple)
vaeVAEACE-Step VAE (red)
clipCLIPACE-Step CLIP (yellow)
output_dirSTRING./output/acestep/datasets
max_durationFLOAT240.010–600 seconds
genre_ratioINT00–100% chance to use genre instead of caption

Outputs: output path, sample count, status

FL AceStep Training Configuration

ParameterDefaultRange
LoRA Rank84–256 (step 4)
LoRA Alpha164–512 (step 4)
LoRA Dropout0.10–0.5
Learning Rate1e-41e-6 – 1e-2
Max Epochs10010–10000
Batch Size11–8
Gradient Accumulation41–16
Save Every N Epochs105–1000
Seed42
Optional
Warmup Steps1000–1000
Weight Decay0.010–0.1
Max Grad Norm1.00.1–10.0
Target Modulesq_proj,k_proj,v_proj,o_projComma-separated

Output: ACESTEP_TRAINING_CONFIG

Mixed precision is fixed to bf16. The turbo model uses 8-step discrete timesteps with shift=3.0.

FL AceStep Train LoRA

Runs the training loop with flow matching loss: MSE(predicted_v, x1 - x0).

InputTypeDefaultNotes
modelMODELACE-Step model (purple)
configACESTEP_TRAINING_CONFIGFrom config node
tensor_dirSTRING./output/acestep/datasetsDirectory of .pt files
lora_nameSTRINGmy_loraName for the trained LoRA (used as subfolder)
resume_fromSTRING(empty)Path to checkpoint to resume from

Outputs: MODEL (with LoRA), final LoRA path, status

The training widget displays a live loss chart, progress bar, and per-epoch stats via WebSocket (acestep.training.progress).

Supported Audio Formats

.wav, .mp3, .flac, .ogg, .opus, .m4a

Sidecar files for metadata:

  • .txt files alongside audio for lyrics
  • key_bpm.csv or metadata.csv for BPM, key, and caption data

Requirements

  • Python 3.10+
  • NVIDIA GPU with 8GB+ VRAM (bf16 training)
  • PyTorch 2.0+
  • PEFT (Parameter-Efficient Fine-Tuning)
  • Transformers, Diffusers, Accelerate

License

MIT