Semi-supervised Multichannel Speech Enhancement With Variational Autoencoders And Non-negative Matrix Factorization
2018 Β· Simon Leglaive, Laurent Girin, Radu Horaud
Abstract
In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments. Our work is based on a well-established multichannel local Gaussian modeling framework. We propose to use a neural network for modeling the speech spectro-temporal content. The parameters of this supervised model are learned using the framework of variational autoencoders. The noisy recording environment is supposed to be unknown, so the noise spectro-temporal modeling remains unsupervised and is based on non-negative matrix factorization (NMF). We develop a Monte Carlo expectation-maximization algorithm and we experimentally show that the proposed approach outperforms its NMF-based counterpart, where speech is modeled using supervised NMF.
Authors
(none)
Tags
Stats
Related papers
- Statistical Speech Enhancement Based On Probabilistic Integration Of Variational Autoencoder And Non-negative Matrix Factorization (2017)15.00
- Supervised And Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization (2017)18.80
- Generalized Multichannel Variational Autoencoder For Underdetermined Source Separation (2018)7.81
- Robust Unsupervised Audio-visual Speech Enhancement Using A Mixture Of Variational Autoencoders (2019)9.23
- Unsupervised Speech Enhancement Based On Multichannel Nmf-informed Beamforming For Noise-robust Automatic Speech Recognition (2019)13.23
- Audio-visual Speech Enhancement Using Conditional Variational Auto-encoders (2019)13.65
- A Speech Enhancement Algorithm Based On Non-negative Hidden Markov Model And Kullback-leibler Divergence (2020)5.84
- Deep Variational Generative Models For Audio-visual Speech Separation (2020)0.00