💃💃InfiniteDance: Scalable 3D Dance Generation Towards in-the-wild Generalization💃💃

May 11, 2026 · View on GitHub

Status: 🚧 Repository under active development. We are continuously adding more data and features. More data and features are coming soon!

🚀 Overview

InfiniteDance is a comprehensive framework for scalable 3D music-to-dance generation, designed for high-quality generalization in-the-wild.

📂 Repository Structure

InfiniteDance
├── All_LargeDanceAR/              # Main generation module
├── DanceVQVAE/                    # VQ-VAE for motion quantization (follows MoMask)
└── InfiniteDanceData/             # Dataset directory (Should be placed at root)
    ├── dance/                     # Motion tokens (.npy)
    ├── music/                     # Music features (.npy)
    ├── partition/                 # Data splits (train/val/test)
    └── styles/                    # Style metadata

⚙️ Installation

# Clone the repository
git clone git@github.com:MotrixLab/InfiniteDance.git

cd InfiniteDance

# Install dependencies
pip install -r requirements.txt

📥 Downloads (Data & Weights)

All datasets and pre-trained checkpoints are hosted on Hugging Face. After download, place them in the following locations (relative to the repo root unless you use absolute paths):

🤗 Hugging Face CheckPoints: InfiniteDance

1. Data Setup

Download the InfiniteDanceData folder and place it in the repo root:

# Path: <your path>/InfiniteDance_opensource/InfiniteDanceData

2. Model Weights Setup

Please place the downloaded weights in their respective directories:

VQ-VAE Weights: All_LargeDanceAR/models/checkpoints/dance_vqvae.pth
InfiniteDance Fine-tuned Weights: All_LargeDanceAR/output/exp_m2d_infinitedance/best_model_stage2.pt
Base LLM: The released checkpoint already contains the full LLaMA-3.2-1B backbone weights, so you do not need to download anything from Meta. We ship the architecture config.json in All_LargeDanceAR/models/Llama3.2-1B/.

After placement, the expected structure looks like this:

InfiniteDance├── InfiniteDanceData/
│   ├── dance/
│   ├── music/
│   ├── partition/
│   └── styles/
└── All_LargeDanceAR/
    ├── models/
    │   ├── checkpoints/
    │   └── Llama3.2-1B/
    ├── RetrievalNet/
    │   └── checkpoints/
    └── output/
        └── exp_m2d_infinitedance/

💃 Usage

1. Inference & Reproduction

The model takes per-frame MuQ embeddings as input ((T, 1024) float32 .npy, ~30 frames per second). Two ways to provide them:

Use the released test set — download muq_features_test_infinitedance.tar.gz from Hugging Face and extract it; this is what infer.sh defaults to.

Use your own audio — convert wav / mp3 to MuQ embeddings first:

cd All_LargeDanceAR
python utils/extract_muq.py \
    --in_dir  /path/to/your_audio_dir \
    --out_dir ../InfiniteDanceData/music/muq_features/my_songs

Then point infer.sh at the new directory:

MUSIC_PATH=../InfiniteDanceData/music/muq_features/my_songs bash infer.sh

You can run the full inference pipeline (Generation → Post-processing → Visualization) using the provided shell script or by running the python scripts manually.

Option A: Quick Start (Recommended)

infer.sh runs Inference → tokens-to-SMPL → optional rendering, with anti-collapse decoding enabled by default.

cd All_LargeDanceAR
DATA_ROOT=../InfiniteDanceData \
CHECKPOINT_PATH=./output/exp_m2d_infinitedance/best_model_stage2.pt \
bash infer.sh

Common overrides: GPU_ID, PROCESSES_PER_GPU, STYLE, MUSIC_LENGTH, DANCE_LENGTH, TEMPERATURE, TOP_K, TOP_P, SEED. Anti-collapse decoding is enabled by default; see the comments at the top of infer.sh to tune it.

Option B: Manual Execution

cd All_LargeDanceAR

python infer_llama_infinitedance.py \
    --music_path ../InfiniteDanceData/music/muq_features/test_infinitedance \
    --checkpoint_path ./output/exp_m2d_infinitedance/best_model_stage2.pt \
    --vqvae_checkpoint_path ./models/checkpoints/dance_vqvae.pth \
    --output_dir ./infer_results \
    --style Popular --music_length 320 --dance_length 288 \
    --temperature 0.8 --top_k 15 --top_p 0.95 --seed 42

Visualization Pipeline: If you ran the manual inference above, proceed to visualize the results:

# 1. Convert tokens to SMPL joints (.npy)
python ./utils/tokens2smpl.py --npy_dir ./infer_results/dance

# 2. Render joints to video (.mp4)
python ./visualization/render_plot_npy.py --joints_dir ./infer_results/dance/npy/joints

1.1 Metrics

metrics.sh runs FID-k / FID-m / Div-k / Div-m and the official Beat-Align score.

cd All_LargeDanceAR
bash metrics.sh <pred_root> [device_id]
# pred_root e.g. ./infer/dance_<TS>/dance/npy/joints

2. Training

Two-stage training (stage 1: bridges + adapters, LLM frozen; stage 2: full fine-tune) is run via DDP. Edit train.sh (or pass env vars) and launch:

cd All_LargeDanceAR

# Default: 4 GPUs, bf16, with regularization (weight_decay=0.10,
# llama_dropout=0.15, cond_drop_prob=0.15)
DATA_ROOT=../InfiniteDanceData bash train.sh

# Other GPU counts
GPUS=0,1 WS=2 DATA_ROOT=../InfiniteDanceData bash train.sh

# Warm-start from a previous stage-2 checkpoint
PREV_CKPT=./output/m2d_llama/<run>/epoch_X_stage2.pt bash train.sh

📝 Citation

If you use this code or dataset in your research, please cite our work:

@misc{li2026infinitedancescalable3ddance,
      title={InfiniteDance: Scalable 3D Dance Generation Towards in-the-wild Generalization}, 
      author={Ronghui Li and Zhongyuan Hu and Li Siyao and Youliang Zhang and Haozhe Xie and Mingyuan Zhang and Jie Guo and Xiu Li and Ziwei Liu},
      year={2026},
      eprint={2603.13375},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.13375}, 
}