I-DCCRN-VAE: An Improved Deep Representation Learning Framework For Complex Vae-based Single-channel Speech Enhancement
2025 Β· Jiatong Li, Simon Doclo
Abstract
Recently, a complex variational autoencoder (VAE)-based single-channel speech enhancement system based on the DCCRN architecture has been proposed. In this system, a noise suppression VAE (NSVAE) learns to extract clean speech representations from noisy speech using pretrained clean speech and noise VAEs with skip connections. In this paper, we improve DCCRN-VAE by incorporating three key modifications: 1) removing the skip connections in the pretrained VAEs to encourage more informative speech and noise latent representations; 2) using \(\beta\)-VAE in pretraining to better balance reconstruction and latent space regularization; and 3) a NSVAE generating both speech and noise latent representations. Experiments show that the proposed system achieves comparable performance as the DCCRN and DCCRN-VAE baselines on the matched DNS3 dataset but outperforms the baselines on mismatched datasets (WSJ0-QUT, Voicebank-DEMEND), demonstrating improved generalization ability. In addition, an ablat
Authors
(none)
Tags
Stats
Related papers
- A Deep Representation Learning-based Speech Enhancement Method Using Complex Convolution Recurrent Variational Autoencoder (2023)7.16
- Investigation Of Speech And Noise Latent Representations In Single-channel Vae-based Speech Enhancement (2025)0.00
- DCCRN+: Channel-wise Subband DCCRN With SNR Estimation For Speech Enhancement (2021)0.00
- Unsupervised Speech Enhancement Using Dynamical Variational Auto-encoders (2021)13.28
- A Bayesian Permutation Training Deep Representation Learning Method For Speech Enhancement With Variational Autoencoder (2022)7.16
- Complex Recurrent Variational Autoencoder With Application To Speech Enhancement (2022)0.00
- Multi-channel End-to-end Neural Network For Speech Enhancement, Source Localization, And Voice Activity Detection (2022)0.00
- Conditional Deep Hierarchical Variational Autoencoder For Voice Conversion (2021)0.00