Abstract

Generally, the performance of deep neural networks (DNNs) heavily depends on the quality of data representation learning. Our preliminary work has emphasized the significance of deep representation learning (DRL) in the context of speech enhancement (SE) applications. Specifically, our initial SE algorithm employed a gated recurrent unit variational autoencoder (VAE) with a Gaussian distribution to enhance the performance of certain existing SE systems. Building upon our preliminary framework, this paper introduces a novel approach for SE using deep complex convolutional recurrent networks with a VAE (DCCRN-VAE). DCCRN-VAE assumes that the latent variables of signals follow complex Gaussian distributions that are modeled by DCCRN, as these distributions can better capture the behaviors of complex signals. Additionally, we propose the application of a residual loss in DCCRN-VAE to further improve the quality of the enhanced speech. \{Compared to our preliminary work, DCCRN-VAE introduce

Authors

(none)

Tags

  • Speech Enhancement
  • Text-to-Speech

Stats

  • citations8
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score7.16
  • arxiv keyxiang2023a

Related papers