Trainable Adaptive Window Switching For Speech Enhancement
2018 Β· Yuma Koizumi, Noboru Harada, Yoichi Haneda
Abstract
This study proposes a trainable adaptive window switching (AWS) method and apply it to a deep-neural-network (DNN) for speech enhancement in the modified discrete cosine transform domain. Time-frequency (T-F) mask processing in the short-time Fourier transform (STFT)-domain is a typical speech enhancement method. To recover the target signal precisely, DNN-based short-time frequency transforms have recently been investigated and used instead of the STFT. However, since such a fixed-resolution short-time frequency transform method has a T-F resolution problem based on the uncertainty principle, not only the short-time frequency transform but also the length of the windowing function should be optimized. To overcome this problem, we incorporate AWS into the speech enhancement procedure, and the windowing function of each time-frame is manipulated using a DNN depending on the input signal. We confirmed that the proposed method achieved a higher signal-to-distortion ratio than conventional
Authors
(none)
Tags
Stats
Related papers
- Consistency-aware Multi-channel Speech Enhancement Using Deep Neural Networks (2020)0.00
- Invertible Dnn-based Nonlinear Time-frequency Transform For Speech Enhancement (2019)7.16
- Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform (2021)0.00
- Deft-an: Dense Frequency-time Attentive Network For Multichannel Speech Enhancement (2022)12.10
- End-to-end Speech Enhancement Based On Discrete Cosine Transform (2019)8.09
- On The Importance Of Neural Wiener Filter For Resource Efficient Multichannel Speech Enhancement (2024)0.00
- PHASEN: A Phase-and-harmonics-aware Speech Enhancement Network (2019)18.20
- Spectral Masking With Explicit Time-context Windowing For Neural Network-based Monaural Speech Enhancement (2024)3.58