Apollo Training Data Preprocessing

March 7, 2025 · View on GitHub

访客统计 GitHub stars Static Badge

This repository contains code for preprocessing training data for the Apollo audio processing model. It includes two scripts for handling different datasets: moisesdb_preprocess.py and musdb_preprocess.py.

Overview

The preprocessing code performs the following steps:

  1. Loads audio files
  2. Applies Voice Activity Detection (VAD) to filter out silent segments
  3. Splits the audio into fixed-length sessions
  4. Stores the processed data in HDF5 format for efficient training

Usage

MoisesDB Preprocessing

To preprocess the MoisesDB dataset:

python moisesdb_preprocess.py

By default, this will:

  • Read data from moisesdb/moisesdb_v0.1
  • Output processed data to musdb18hq-moises-hdf5
  • Sample rate is set to 44.1kHz
  • Session length is 6 seconds

You can modify the default paths and parameters in the script as needed.

MUSDB Preprocessing

To preprocess the MUSDB18-HQ dataset:

python musdb_preprocess.py

By default, this will:

  • Read training data from musdb18hq/train
  • Read test data from musdb18hq/test
  • Output processed data to musdb18hq-moises-hdf5
  • Sample rate is set to 44.1kHz
  • Session length is 6 seconds

You can modify the default paths and parameters in the script as needed.

Implementation Details

Both scripts implement a VAD (Voice Activity Detection) function that:

  • Takes in audio data
  • Calculates power thresholds
  • Identifies segments where audio is present (not silent)
  • Extracts valid audio segments
  • Returns segments that have significant audio content

The processed data is stored in an HDF5 format, with each track organized in a separate directory.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Contact

If you have any questions or suggestions, please feel free to contact me at

Citation

If you find this code useful, please consider citing the following paper:

@inproceedings{li2025apollo,
  title={Apollo: Band-sequence Modeling for High-Quality Music Restoration in Compressed Audio},
  author={Li, Kai and Luo, Yi},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2025},
  organization={IEEE}
}