TAU1
April 4, 2018 · View on GitHub
Naoya Takahashi¹, Stefan Uhlich², Franck Giron², Michael Enenkl², Thomas Kemp², Nabarun Goswami³, Yuki Mitsufuji¹
¹Sony Corporation, Audio Technology Development Department, Tokyo, Japan
²Sony European Technology Center (EuTEC), Stuttgart, Germany
³Sony India Software Center, Bangalore, India
Naoya.Takahashi [at] sony.com
Additional Info
- is_blind: no
- additional_training_data: yes
Supplemental Material
- Code: not available
- Demos: not available
Method
This submission blends two systems as described in [1]. The first system is TAK3 (MMDenseNets[2]) and the second system is UHL3 (LSTM). We linearly blend the raw outputs of the two systems using
where . From these estimates, we then compute the power spectral densities and spatial covariance matrices of the multichannel Wiener filter(MWF) [3].
References
- S. Uhlich, M. Porcu, F. Giron, M. Enenkl, T. Kemp, N. Takahashi and Y. Mitsufuji: Improving music source separation based on deep neural networks through data augmentation and network blending, Proc. ICASSP, 2017
- N. Takahashi and Y. Mitsufuji: Multi-scale multi-band DenseNets for audio source separation, Proc. WASPAA, 2017
- A. A. Nugraha, A. Liutkus, and E. Vincent. "Multichannel music separation with deep neural networks." EUSIPCO, 2016.