Conv-TasNet

March 31, 2023 · View on GitHub

:bangbang:new:bangbang:: The modified training and testing code is now able to separate speech properly.

:bangbang:new:bangbang:: Updated model code, added code for skip connection section.

:bangbang:notice:bangbang:: Training Batch size setting 8/16

:bangbang:notice:bangbang:: The implementation of another article optimizing Conv-TasNet has been open sourced in "Deep-Encoder-Decoder-Conv-TasNet".

Demo Pages: Results of pure speech separation model

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Pytorch's Implement

Luo Y, Mesgarani N. Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(8): 1256-1266.

GitHub issues GitHub forks GitHub stars Twitter

Requirement

  • Pytorch 1.3.0
  • TorchAudio 0.3.1
  • PyYAML 5.1.2

Accomplished goal

  • Support Multi-GPU Training, you can see the train.yml
  • Use the Dataloader Method That Comes With Pytorch
  • Provide Pre-Training Models

Preparation files before training

  1. Generate dataset using create-speaker-mixtures.zip with WSJ0 or TIMI
  2. Generate scp file using script file of create_scp.py

Training this model

  • If you want to adjust the network parameters and the path of the training file, please modify the option/train/train.yml file.
  • Training Command
    python train.py ./option/train/train.yml
    

Inference this model

  • Inference Command (Use this command if you need to test a large number of audio files.)

    python Separation.py -mix_scp 1.scp -yaml ./config/train/train.yml -model best.pt -gpuid [0,1,2,3,4,5,6,7] -save_path ./checkpoint
    
  • Inference Command (Use this command if you need to test a single audio files.)

    python Separation_wav.py -mix_wav 1.wav -yaml ./config/train/train.yml -model best.pt -gpuid [0,1,2,3,4,5,6,7] -save_path ./checkpoint
    

Results

  • Currently training, the results will be displayed when the training is over.
  • The following table is the experimental results of different parameters in the paper
NLBHScPXRNormalizationCausalReceptive fieldModel SizeSI-SNRiSDRi
12840128256128372gLNx1.281.5M13.013.3
25640128256128372gLNx1.281.5M13.113.4
51240128256128372gLNx1.281.7M13.313.6
51240128256256372gLNx1.282.4M13.013.3
51240128512128372gLNx1.283.1M13.313.6
51240128512512372gLNx1.286.2M13.513.8
51240256256256372gLNx1.283.2M13.013.3
51240256512256372gLNx1.286.0M13.413.7
51240256512512372gLNx1.288.1M13.213.5
51240128512128364gLNx1.275.1M14.114.4
51240128512128346gLNx0.465.1M13.914.2
51240128512128383gLNx3.835.1M14.514.8
51232128512128383gLNx3.065.1M14.715.0
51216128512128383gLNx1.535.1M15.315.6
51216128512128383cLN1.535.1M10.611.0

Pre-Train Model

:bangbang:new:bangbang:: Huggingface Pretrain Google Driver

Our Results Image

Reference