DiffSinger Dataset Tools
April 26, 2026 ยท View on GitHub
DiffSinger dataset processing tools for singing voice synthesis data preparation, including audio slicing, labeling, forced alignment, and audio-to-MIDI transcription.
Applications
| Application | Description |
|---|---|
| MinLabel | Audio labeling tool with G2P conversion (Mandarin/Cantonese/Japanese) |
| SlurCutter | DiffSinger sentence/MIDI editor with piano roll F0 visualization |
| AudioSlicer | RMS-based automatic audio slicing with Audacity CSV marker support |
| LyricFA | Lyric forced alignment using FunASR Paraformer (Chinese) |
| HubertFA | HuBERT phoneme forced alignment with Praat TextGrid output |
| GameInfer | GAME audio-to-MIDI transcription (4-model ONNX pipeline) |
Supported Platforms
- Microsoft Windows (10 ~ 11) โ primary, with DirectML GPU acceleration
- Apple macOS (11+)
- Linux (Tested on Ubuntu)
Models
AsrModel
Used for LyricFA, only supports Chinese. jp&&en version(beta)
SomeModel
FblModel
Currently, FoxBreatheLabeler only supports annotating breathing using TextGrid files output from SOFA (i.e. overlaying new "AP" annotations on intervals already marked as "SP").
GAME Model
Required for GameInfer. Place the model directory (containing config.json, encoder.onnx, segmenter.onnx, bd2dur.onnx, dur2bd.onnx, estimator.onnx) under <app_dir>/model/.
Build from Source
Requirements
| Component | Requirement | Detailed |
|---|---|---|
| Qt | >=6.8.0 | Core, Gui, Widgets, Svg, Network |
| Compiler | >=C++17 | MSVC 2022, GCC, Clang |
| CMake | >=3.17 | >=3.20 is recommended |
Tested with Qt 6.8.3 and Qt 6.9.3. CI builds use Qt 6.9.3.
Setup Environment
You need to install Qt libraries first.
Windows
cd /D src/libs
cmake -Dep=dml -P ../../scripts/setup-onnxruntime.cmake
cd ../../
set QT_DIR=<dir> # directory `Qt6Config.cmake` locates
set Qt6_DIR=%QT_DIR%
set VCPKG_KEEP_ENV_VARS=QT_DIR;Qt6_DIR
git clone https://github.com/microsoft/vcpkg.git
cd /D vcpkg
bootstrap-vcpkg.bat
vcpkg install ^
--x-manifest-root=../scripts/vcpkg-manifest ^
--x-install-root=./installed ^
--triplet=x64-windows
Unix
cd src/libs
cmake -Dep=cpu -P ../../scripts/setup-onnxruntime.cmake
cd ../../
export QT_DIR=<dir> # directory `Qt6Config.cmake` locates
export Qt6_DIR=$QT_DIR
export VCPKG_KEEP_ENV_VARS="QT_DIR;Qt6_DIR"
git clone https://github.com/microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg install \
--x-manifest-root=../scripts/vcpkg-manifest \
--x-install-root=./installed \
--triplet=<triplet>
# triplet:
# Mac: `x64-osx` or `arm64-osx`
# Linux: `x64-linux` or `arm64-linux`
Build & Install
cmake -B build -G Ninja \
-DCMAKE_INSTALL_PREFIX=<dir> \
-DCMAKE_PREFIX_PATH=<dir> \
-DCMAKE_TOOLCHAIN_FILE=vcpkg/scripts/buildsystems/vcpkg.cmake \
-DCMAKE_BUILD_TYPE=Release
cmake --build build --target all
cmake --build build --target install
CMake Build Options
| Option | Default | Description |
|---|---|---|
BUILD_TESTS | ON | Build src/tests/ subdirectory (currently empty placeholder) |
AUDIO_UTIL_BUILD_TESTS | ON | Build TestAudioUtil |
GAME_INFER_BUILD_TESTS | ON | Build TestGame |
SOME_INFER_BUILD_TESTS | ON | Build TestSome |
RMVPE_INFER_BUILD_TESTS | ON | Build TestRmvpe |
ONNXRUNTIME_ENABLE_DML | ON (Windows) | Enable DirectML GPU acceleration |
ONNXRUNTIME_ENABLE_CUDA | OFF | Enable CUDA GPU acceleration |
Build Outputs
| Type | Files |
|---|---|
| Applications | MinLabel.exe, SlurCutter.exe, AudioSlicer.exe, LyricFA.exe, HubertFA.exe, GameInfer.exe |
| Test executables | TestGame.exe, TestRmvpe.exe, TestSome.exe, TestAudioUtil.exe |
| Shared libraries | game-infer.dll, rmvpe-infer.dll, some-infer.dll, audio-util.dll |
Libraries
Related Projects
-
- Apache 2.0 License
-
- Apache 2.0 License
Dependencies
- Qt 6 (6.8+)
- GNU LGPL v2.1 or later
- ONNX Runtime
- MIT License
- FFmpeg
- GNU LGPL v2.1 or later
- LAME
- GNU LGPL v2.0
- SDL
- Zlib License
- SndFile
- GNU LGPL v2.1 or later
- vcpkg
- MIT License
- r8brain-free-src
- MIT License
- FunASR
- MIT License
- fftw3
- GNU GPL v2.0
- yaml-cpp
- MIT License
- wolf-midi
- MIT License
- nlohmann/json
- MIT License
- FoxBreatheLabeler
- GNU AGPL v3.0
- textgrid.hpp
- MIT License
- soxr
- GNU LGPL v2.1
- mpg123
- GNU LGPL v2.1
License
This repository is licensed under the Apache 2.0 License.