Investigating Cross-domain Losses For Speech Enhancement
2020 Β· Sherif Abdulatif, Karim Armanious, Jayasankar T. Sajeev, et al.
Abstract
Recent years have seen a surge in the number of available frameworks for speech enhancement (SE) and recognition. Whether model-based or constructed via deep learning, these frameworks often rely in isolation on either time-domain signals or time-frequency (TF) representations of speech data. In this study, we investigate the advantages of each set of approaches by separately examining their impact on speech intelligibility and quality. Furthermore, we combine the fragmented benefits of time-domain and TF speech representations by introducing two new cross-domain SE frameworks. A quantitative comparative analysis against recent model-based and deep learning SE approaches is performed to illustrate the merit of the proposed frameworks.
Authors
(none)
Tags
Stats
Related papers
- Cross-domain Single-channel Speech Enhancement Model With Bi-projection Fusion Module For Noise-robust ASR (2021)8.09
- Time-domain Speech Enhancement Assisted By Multi-resolution Frequency Encoder And Decoder (2023)9.76
- Toward Universal Speech Enhancement For Diverse Input Conditions (2023)0.00
- Beyond Performance Plateaus: A Comprehensive Study On Scalability In Speech Enhancement (2024)7.81
- A Consolidated View Of Loss Functions For Supervised Deep Learning-based Speech Enhancement (2020)13.93
- Improved Speech Separation With Time-and-frequency Cross-domain Joint Embedding And Clustering (2019)10.74
- Closing The Gap Between Time-domain Multi-channel Speech Enhancement On Real And Simulation Conditions (2021)8.82
- A Modulation-domain Loss For Neural-network-based Real-time Speech Enhancement (2021)8.09