End-to-end Waveform Utterance Enhancement For Direct Evaluation Metrics Optimization By Fully Convolutional Neural Networks
2017 Β· Szu-Wei Fu, Tao-Wei Wang, Yu Tsao, et al.
Abstract
Speech enhancement model is used to map a noisy speech to a clean speech. In the training stage, an objective function is often adopted to optimize the model parameters. However, in most studies, there is an inconsistency between the model optimization criterion and the evaluation criterion on the enhanced speech. For example, in measuring speech intelligibility, most of the evaluation metric is based on a short-time objective intelligibility (STOI) measure, while the frame based minimum mean square error (MMSE) between estimated and clean speech is widely used in optimizing the model. Due to the inconsistency, there is no guarantee that the trained model can provide optimal performance in applications. In this study, we propose an end-to-end utterance-based speech enhancement framework using fully convolutional neural networks (FCN) to reduce the gap between the model optimization and evaluation criterion. Because of the utterance-based optimization, temporal correlation information o
Authors
(none)
Tags
Stats
Related papers
- Raw Waveform-based Speech Enhancement By Fully Convolutional Networks (2017)16.63
- TFCN: Temporal-frequential Convolutional Network For Single-channel Speech Enhancement (2022)0.00
- Multichannel Speech Enhancement By Raw Waveform-mapping Using Fully Convolutional Networks (2019)12.25
- Multi-metric Optimization Using Generative Adversarial Networks For Near-end Speech Intelligibility Enhancement (2021)8.60
- Multi-modal Hybrid Deep Neural Network For Speech Enhancement (2016)0.00
- Improved Speech Enhancement With The Wave-u-net (2018)0.00
- A Dual-staged Context Aggregation Method Towards Efficient End-to-end Speech Enhancement (2019)0.00
- Multi-cmgan+/+: Leveraging Multi-objective Speech Quality Metric Prediction For Speech Enhancement (2023)0.00