Time-graph Frequency Representation With Singular Value Decomposition For Neural Speech Enhancement
2024 Β· Tingting Wang, Tianrui Wang, Meng Ge, et al.
Abstract
Time-frequency (T-F) domain methods for monaural speech enhancement have benefited from the success of deep learning. Recently, focus has been put on designing two-stream network models to predict amplitude mask and phase separately, or, coupling the amplitude and phase into Cartesian coordinates and constructing real and imaginary pairs. However, most methods suffer from the alignment modeling of amplitude and phase (real and imaginary pairs) in a two-stream network framework, which inevitably incurs performance restrictions. In this paper, we introduce a graph Fourier transform defined with the singular value decomposition (GFT-SVD), resulting in real-valued time-graph representation for neural speech enhancement. This real-valued representation-based GFT-SVD provides an ability to align the modeling of amplitude and phase, leading to avoiding recovering the target speech phase information. Our findings demonstrate the effects of real-valued time-graph representation based on GFT-SVD
Authors
(none)
Tags
Stats
Related papers
- Forknet: Simultaneous Time And Time-frequency Domain Modeling For Speech Enhancement (2023)0.00
- PHASEN: A Phase-and-harmonics-aware Speech Enhancement Network (2019)18.20
- Invertible Dnn-based Nonlinear Time-frequency Transform For Speech Enhancement (2019)7.16
- Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform (2021)0.00
- Consistency-aware Multi-channel Speech Enhancement Using Deep Neural Networks (2020)0.00
- FB-MSTCN: A Full-band Single-channel Speech Enhancement Method Based On Multi-scale Temporal Convolutional Network (2022)6.77
- Spectral Masking With Explicit Time-context Windowing For Neural Network-based Monaural Speech Enhancement (2024)3.58
- TSTNN: Two-stage Transformer Based Neural Network For Speech Enhancement In The Time Domain (2021)16.73