Using Recurrences In Time And Frequency Within U-net Architecture For Speech Enhancement
2018 Β· Tomasz Grzywalski, Szymon Drgas
Abstract
When designing fully-convolutional neural network, there is a trade-off between receptive field size, number of parameters and spatial resolution of features in deeper layers of the network. In this work we present a novel network design based on combination of many convolutional and recurrent layers that solves these dilemmas. We compare our solution with U-nets based models known from the literature and other baseline models on speech enhancement task. We test our solution on TIMIT speech utterances combined with noise segments extracted from NOISEX-92 database and show clear advantage of proposed solution in terms of SDR (signal-to-distortion ratio), SIR (signal-to-interference ratio) and STOI (spectro-temporal objective intelligibility) metrics compared to the current state-of-the-art.
Authors
(none)
Tags
Stats
Related papers
- Dilated U-net Based Approach For Multichannel Speech Enhancement From First-order Ambisonics Recordings (2020)0.00
- Towards Speech Enhancement Using A Variational U-net Architecture (2020)7.81
- Improved Speech Enhancement With The Wave-u-net (2018)0.00
- A Fully Recurrent Feature Extraction For Single Channel Speech Enhancement (2020)0.00
- UL-UNAS: Ultra-lightweight U-nets For Real-time Speech Enhancement Via Network Architecture Search (2025)10.26
- Relunet: Relative Channel Fusion U-net For Multichannel Speech Enhancement (2024)0.00
- Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks (2020)5.84
- Real-time Streaming Wave-u-net With Temporal Convolutions For Multichannel Speech Enhancement (2021)0.00