HPMDubbing_Vocoder
April 3, 2023 ยท View on GitHub
This repository is the vocoder of our model (HPMDubbing), which is used to convert the mel-spectrogram generated by our model into time-domain waveform.

Pretrained Models
We provide the pretrained models. One can download the checkpoints of generator (e.g., g_05000000) within the listed folders.
| Folder Name | Sampling Rate | Hop Length | Segment Size | Win Length | Params. | Dataset | Fine-Tuned |
|---|---|---|---|---|---|---|---|
| HPM_Chem | 16000 Hz | 160 | 8000 | 640 | 55M | LibriTTS | No |
| HPM_V2C | 22050 Hz | 220 | 9900 | 880 | 58M | LibriTTS | No |
Training
- Please run
orpython train_V2C_HiFiGAN.py --config config_V2C_22050Hz.jsonpython train_hifigan_16KHz.py --config config_Chem_16KHz.json
Inference
- inference.py : wav -> mel -> wav
python inference.py --checkpoint_file [Your path of checkpoint_file] - inference_e2e.py : mel -> wav
python inference_e2e.py --checkpoint_file [Your path of checkpoint_file]
tensorboard
- Please run
ortensorboard --logdir HifiGAN_16/logs/ --port=[Your port]tensorboard --logdir My_vocoder_V2C/logs/ --port=[Your port]
References
- HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis, J. Kong, et al.