A Consolidated View Of Loss Functions For Supervised Deep Learning-based Speech Enhancement
2020 Β· Sebastian Braun, Ivan Tashev
Abstract
Deep learning-based speech enhancement for real-time applications recently made large advancements. Due to the lack of a tractable perceptual optimization target, many myths around training losses emerged, whereas the contribution to success of the loss functions in many cases has not been investigated isolated from other factors such as network architecture, features, or training procedures. In this work, we investigate a wide variety of loss spectral functions for a recurrent neural network architecture suitable to operate in online frame-by-frame processing. We relate magnitude-only with phase-aware losses, ratios, correlation metrics, and compressed metrics. Our results reveal that combining magnitude-only with phase-aware objectives always leads to improvements, even when the phase is not enhanced. Furthermore, using compressed spectral values also yields a significant improvement. On the other hand, phase-sensitive improvement is best achieved by linear domain losses such as mean
Authors
(none)
Tags
Stats
Related papers
- Effect Of Noise Suppression Losses On Speech Distortion And ASR Performance (2021)10.74
- Perceive And Predict: Self-supervised Speech Representation Based Loss Functions For Speech Enhancement (2023)7.16
- Weighted Speech Distortion Losses For Neural-network-based Real-time Speech Enhancement (2020)14.51
- A Modulation-domain Loss For Neural-network-based Real-time Speech Enhancement (2021)8.09
- An Explicit Consistency-preserving Loss Function For Phase Reconstruction And Speech Enhancement (2024)2.26
- Cheapnet: Improving Light-weight Speech Enhancement Network By Projected Loss Function (2023)0.00
- Unsupervised Speech Enhancement With Speech Recognition Embedding And Disentanglement Losses (2021)8.35
- Phase Continuity: Learning Derivatives Of Phase Spectrum For Speech Enhancement (2022)6.77