Investigation Of Speech And Noise Latent Representations In Single-channel Vae-based Speech Enhancement
2025 Β· Jiatong Li, Simon Doclo
Abstract
Recently, a variational autoencoder (VAE)-based single-channel speech enhancement system using Bayesian permutation training has been proposed, which uses two pretrained VAEs to obtain latent representations for speech and noise. Based on these pretrained VAEs, a noisy VAE learns to generate speech and noise latent representations from noisy speech for speech enhancement. Modifying the pretrained VAE loss terms affects the pretrained speech and noise latent representations. In this paper, we investigate how these different representations affect speech enhancement performance. Experiments on the DNS3, WSJ0-QUT, and VoiceBank-DEMAND datasets show that a latent space where speech and noise representations are clearly separated significantly improves performance over standard VAEs, which produce overlapping speech and noise representations.
Authors
(none)
Tags
Stats
Related papers
- A Bayesian Permutation Training Deep Representation Learning Method For Speech Enhancement With Variational Autoencoder (2022)7.16
- I-DCCRN-VAE: An Improved Deep Representation Learning Framework For Complex Vae-based Single-channel Speech Enhancement (2025)0.00
- A Statistically Principled And Computationally Efficient Approach To Speech Enhancement Using Variational Autoencoders (2019)9.23
- Statistical Speech Enhancement Based On Probabilistic Integration Of Variational Autoencoder And Non-negative Matrix Factorization (2017)15.00
- Mixture Of Inference Networks For Vae-based Audio-visual Speech Enhancement (2019)10.35
- Unsupervised Speech Enhancement Using Dynamical Variational Auto-encoders (2021)13.28
- Audio-visual Speech Enhancement Using Conditional Variational Auto-encoders (2019)13.65
- Learning Latent Representations For Speech Generation And Transformation (2017)13.50