A Weighted-variance Variational Autoencoder Model For Speech Enhancement
2022 Β· Ali Golmakani, Mostafa Sadeghi, Xavier Alameda-Pineda, et al.
Abstract
We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. In contrast to this commonly used approach, we propose a weighted variance generative model, where the contribution of each spectrogram time-frame in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech generative modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram auto-encoding and speech enhancement demonstrate the effectiveness and robustness of the proposed approach compared to the standard unweighted variance model.
Authors
(none)
Tags
Stats
Related papers
- A Statistically Principled And Computationally Efficient Approach To Speech Enhancement Using Variational Autoencoders (2019)9.23
- Statistical Speech Enhancement Based On Probabilistic Integration Of Variational Autoencoder And Non-negative Matrix Factorization (2017)15.00
- Audio-visual Speech Enhancement Using Conditional Variational Auto-encoders (2019)13.65
- A Recurrent Variational Autoencoder For Speech Enhancement (2019)13.97
- Guided Variational Autoencoder For Speech Enhancement With A Supervised Classifier (2021)8.60
- Unsupervised Speech Enhancement Using Dynamical Variational Auto-encoders (2021)13.28
- Complex Recurrent Variational Autoencoder With Application To Speech Enhancement (2022)0.00
- Audio-visual Speech Enhancement With A Deep Kalman Filter Generative Model (2022)6.34