MWF
April 4, 2018 · View on GitHub
Antoine Liutkus, Fabian-Robert Stöter Inria and LIRMM, University of Montpellier, France antoine.liutkus@inria.fr
Additional Info
- is_blind: no
- additional_training_data: no
Supplementary Material
- Code: https://github.com/sigsep/sigsep-mus-oracle
- Demos: Not available
Method
Introduction
The Multichannel Wiener Filter (MWF) exploiting the Local Gaussian Model (LGM) has been initially proposed in the following paper:
Duong, Ngoc QK, Emmanuel Vincent, and Rémi Gribonval. "Under-determined reverberant audio source separation using a full-rank spatial covariance model." IEEE Transactions on Audio, Speech, and Language Processing 18.7 (2010): 1830-1840.
Its core feature is to extend the single channel Wiener filter by exploiting interchannel correlations of the sources.
Notations
We write for the 3-dimensional complex array obtained by stacking the Short-Time Frequency Transforms (STFT) of left and right channels of the mixture. Its dimensions are , where stand for the number of frequency bands and time frames, respectively. Its values at Time-Frequency (TF) bin are written . The mixture is taken as the sum of the sources images: , which correspond to the isolated instruments and are also stereo.
The local Gaussian model
The local Gaussian model assumes is a circularly-symmetric Gaussian random vector, as described in:
Gallager, Robert G. "Circularly-symmetric Gaussian random vectors." Technical report, MIT (2008).
This is written: , where:
- is the Power Spectral Density (PSD) of source at TF bin . It can be understood as the energy at that TF bin.
- is the Spatial Covariance Matrix (SCM) of source at frequency . It is a $2\times 2$ matrix that encodes the correlations between left and right channels for this source at that frequency. The SCM can be understood as encoding how much one channel gives any information about the other through correlations. Note that in the LGM, the SCM is assumed to be constant over time, which basically means we expect all sources to have a consistant spatial configuration throughout the song.
The LGM model can be shown to generalize several previously proposed models, such as the linear instantaneous and the convolutive, that assume some deterministic relationship between left and right channels. Its strength is to relax such approaches by introducing some stochasticity: channels are only assumed correlated, and not necessarily either independent or deterministically related.
Separation
One advantage of the LGM is that it allows for straightforward separation, if we have the true parameters and . In short, each source is estimated as:
where denotes pseudo inversion. This operation is denoted as Multichannel Wiener Filtering, and is the one implemented here.
Parameter estimation
This submission is an oracle, meaning that it knows the true sources to compute the optimal parameters and . This submission is therefore intended as an upper bound on performance that can be attained by methods based on multichannel filtering.
Given the true sources , the parameters are estimated through the method discussed in the aforementioned paper by N. Duong, replacing the estimated sources by their true values.
References
- A. Liutkus and F.-R. Stöter, The 2018 Signal Separation Evaluation Campaign, Proceedings of LVA/ICA, 2018
@inproceedings{sisec2018, title={The 2018 signal separation evaluation campaign}, author={A. Liutkus and F.-R. St{"o}ter and N. Ito}, booktitle={International Conference on Latent Variable Analysis and Signal Separation}, year={2018}, }