SingingVocoders
March 22, 2025 · View on GitHub
A collection of neural vocoders suitable for singing voice synthesis tasks.
If you have any questions, please open an issue.
Quick Start
Preprocessing
Run the following preprocessing script
python process.py --config (your config path) --num_cpu (Number of cpu threads used in preprocessing) --strx (1 for a forced absolute path 0 for a relative path)
The following configuration items are what you need to change during preprocessing
data_input_path: [] # the path for your data
data_out_path: [] # the path for the preprocessed output
val_num: 10 # the number of validation audio
An example
data_input_path: ['wav/in1','wav/in2']
data_out_path: ['wav/out1','wav/out2']
val_num: 5 # This is the number of valves you want.
# (The files are automatically extracted during preprocessing.)
# (The paths in the two lists are one-to-one, so the number should be the same.)
# (Then, the preprocessor scans all .wav and .flac files, including subfolders.)
# (Normally, there are only these three to change.)
Training
Adjust config according to your GPU memory (mini_nsf and pc_aug is enabled by default)
For 24GB memory (default)
crop_mel_frames: 48
batch_size: 10
pc_aug_rate: 0.5
For 16GB memory (need manual editing)
crop_mel_frames: 32
batch_size: 10
pc_aug_rate: 0.4
Run the following training script
python train.py --config (your config path) --exp_name (your ckpt name) --work_dir (working directory, optional)
Configuration items under test
use_stftloss: false (Whether to use stft loss)
lab_aux_melloss: 45
lab_aux_stftloss: 2.5 (The mixing ratio of the two losses)
If you have other needs, you can modify the stftloss related parameters
Export the checkpoint
if you finish training you can use this script to export the diffsinger vocoder checkpoint
python export_ckpt.py --ckpt_path (your ckpt path) --save_path (output ckpt path) --work_dir (working directory, optional)
Online data augmentation (recommend)
add config
key_aug: true (Do augment during training)
key_aug_prob: 0.5 (Data augmentation probability)
aug_min: 0.9 (Minimum f0 adjustment multiplier)
aug_max: 1.4 (Maximum f0 adjustment multiplier)
Note that data augmentation may damage the sound quality!
Note
Because of some problems the actual number of steps is half of what it shows
To fine-tune the nsf-hifigan vocoder, please download and unzip the weights in releases, and modify the 'finetune_ckpt_path' item in ft_hifigan.yaml to the checkpoint file.
For fine-tuning please use 44100 Hz samplerate audio and do not modify other mel parameters unless you know exactly what you are doing
How to use the fine-tuning function is recommended to refer to the openvpi/DiffSinger project documentation
The exported weights can be used in project such as DDSP-SVC, Diffusion-SVC, so-vits-svc and DiffSinger (openvpi)
If you want to further export them to onnx format weights for use in OpenUtau, please use this script
The inheritance relationship of configuration items in the configuration file is: base.yaml -> base_hifi.yaml -> ft_hifigan.yaml
A small number of steps of fine-tuning can freeze the mpd module.
It is not recommended to use bf16, as it may cause sound quality problems.
Almost 2k steps is enough for fine-tuning of small dataset.
Other models
HiFivae.yaml hifivae.yaml training vae model
base_hifi_chroma.yaml training ignore 8th degree nsf hifigan
base_hifi.yaml Training nsf hifigan
base_ddspgan.yaml Training ddsp model with discriminator
ddsp_univnet.yaml Training ddsp mixed univnet model
nsf_univnet.yaml Training univnet with nsf (recommended)
univnet.yaml Training original univnet
lvc_base_ddspgan.yaml Training ddsp model with lvc filters
Special Statements
We regret to publish a verified Registry of Hostile Conduct (shown as below). This registry documents individuals/entities who have engaged in long-term destructive activities against the development team.
We solemnly declare:
- Strongly recommend all users review this registry before downloading and using this vocoder
- No technical or legal restrictions are currently imposed on listed parties, as the vocoder is still licensed under CC BY-NC-SA 4.0
- Reserve the right to apply further restrictions in case of persistent malicious acts
Registry of Hostile Conduct
| Name | Identifiers | Reason |
|---|---|---|
| 旋转_turning_point | QQ: 2673587414; Bilibili UID: 285801087; Discord username: colstone233 | Engaging in long-term hostile and personal attacks against developers, repeatedly spreading false information about DiffSinger and the development team, and interfering with the development process of the vocoder and other projects in the community |