Invertible Dnn-based Nonlinear Time-frequency Transform For Speech Enhancement
2019 Β· Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, et al.
Abstract
We propose an end-to-end speech enhancement method with trainable time-frequency~(T-F) transform based on invertible deep neural network~(DNN). The resent development of speech enhancement is brought by using DNN. The ordinary DNN-based speech enhancement employs T-F transform, typically the short-time Fourier transform~(STFT), and estimates a T-F mask using DNN. On the other hand, some methods have considered end-to-end networks which directly estimate the enhanced signals without T-F transform. While end-to-end methods have shown promising results, they are black boxes and hard to understand. Therefore, some end-to-end methods used a DNN to learn the linear T-F transform which is much easier to understand. However, the learned transform may not have a property important for ordinary signal processing. In this paper, as the important property of the T-F transform, perfect reconstruction is considered. An invertible nonlinear T-F transform is constructed by DNNs and learned from data s
Authors
(none)
Tags
Stats
Related papers
- TSTNN: Two-stage Transformer Based Neural Network For Speech Enhancement In The Time Domain (2021)16.73
- End-to-end Speech Enhancement Based On Discrete Cosine Transform (2019)8.09
- Consistency-aware Multi-channel Speech Enhancement Using Deep Neural Networks (2020)0.00
- Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform (2021)0.00
- On The Role Of Spatial, Spectral, And Temporal Processing For Dnn-based Non-linear Multi-channel Speech Enhancement (2022)7.81
- Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks (2020)5.84
- Trainable Adaptive Window Switching For Speech Enhancement (2018)6.77
- Time-graph Frequency Representation With Singular Value Decomposition For Neural Speech Enhancement (2024)2.26