Towards Speech Enhancement Using A Variational U-net Architecture
2020 · Eike J. Nustede, Jörn Anemüller
Abstract
We investigate the viability of a variational U-Net architecture for denoising of single-channel audio data. Deep network speech enhancement systems commonly aim to estimate filter masks, or opt to work on the waveform signal, potentially neglecting relationships across higher dimensional spectro-temporal features. We study the adoption of a probabilistic bottleneck into the classic U-Net architecture for direct spectral reconstruction. Evaluation of several ablation network variants is carried out using signal-to-distortion ratio and perceptual measures, on audio data that includes known and unknown noise types as well as reverberation. Our experiments show that the residual (skip) connections in the proposed system are a prerequisite for successful spectral reconstruction, i.e., without filter mask estimation. Results show, on average, an advantage of the proposed variational U-Net architecture over its classic, non-variational version in signal enhancement performance under reverber
Authors
(none)
Tags
Stats
Related papers
- Single-channel Speech Enhancement With Deep Complex U-networks And Probabilistic Latent Space Models (2023)5.24
- A Comparative Evaluation Of Deep Learning Models For Speech Enhancement In Real-world Noisy Environments (2025)0.00
- Improved Speech Enhancement With The Wave-u-net (2018)0.00
- Investigation Of Speech And Noise Latent Representations In Single-channel Vae-based Speech Enhancement (2025)0.00
- Speech Denoising By Parametric Resynthesis (2019)7.16
- A Wavenet For Speech Denoising (2017)18.47
- A Statistically Principled And Computationally Efficient Approach To Speech Enhancement Using Variational Autoencoders (2019)9.23
- Analysis Of DNN Speech Signal Enhancement For Robust Speaker Recognition (2018)11.39