Multi-metric Optimization Using Generative Adversarial Networks For Near-end Speech Intelligibility Enhancement
2021 Β· Haoyu Li, Junichi Yamagishi
Abstract
The intelligibility of speech severely degrades in the presence of environmental noise and reverberation. In this paper, we propose a novel deep learning based system for modifying the speech signal to increase its intelligibility under the equal-power constraint, i.e., signal power before and after modification must be the same. To achieve this, we use generative adversarial networks (GANs) to obtain time-frequency dependent amplification factors, which are then applied to the input raw speech to reallocate the speech energy. Instead of optimizing only a single, simple metric, we train a deep neural network (DNN) model to simultaneously optimize multiple advanced speech metrics, including both intelligibility- and quality-related ones, which results in notable improvements in performance and robustness. Our system can not only work in non-realtime mode for offline audio playback but also support practical real-time speech applications. Experimental results using both objective measure
Authors
(none)
Tags
Stats
Related papers
- Imetricgan: Intelligibility Enhancement For Speech-in-noise Using Generative Adversarial Network-based Metric Learning (2020)9.41
- Towards Generalized Speech Enhancement With Generative Adversarial Networks (2019)10.35
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85
- Boosting Noise Robustness Of Acoustic Model Via Deep Adversarial Training (2018)9.23
- Exploring Speech Enhancement With Generative Adversarial Networks For Robust Speech Recognition (2017)16.14
- Robust Speech Recognition Using Generative Adversarial Networks (2017)11.29
- Conditional Generative Adversarial Networks For Speech Enhancement And Noise-robust Speaker Verification (2017)16.03
- Metricgan: Generative Adversarial Networks Based Black-box Metric Scores Optimization For Speech Enhancement (2019)0.00