A Modulation-domain Loss For Neural-network-based Real-time Speech Enhancement
2021 Β· Tyler Vuong, Yangyang Xia, Richard M. Stern
Abstract
We describe a modulation-domain loss function for deep-learning-based speech enhancement systems. Learnable spectro-temporal receptive fields (STRFs) were adapted to optimize for a speaker identification task. The learned STRFs were then used to calculate a weighted mean-squared error (MSE) in the modulation domain for training a speech enhancement system. Experiments showed that adding the modulation-domain MSE to the MSE in the spectro-temporal domain substantially improved the objective prediction of speech quality and intelligibility for real-time speech enhancement systems without incurring additional computation during inference.
Authors
(none)
Tags
Stats
Related papers
- A Consolidated View Of Loss Functions For Supervised Deep Learning-based Speech Enhancement (2020)13.93
- Weighted Speech Distortion Losses For Neural-network-based Real-time Speech Enhancement (2020)14.51
- Effect Of Noise Suppression Losses On Speech Distortion And ASR Performance (2021)10.74
- Reinforcement Learning Based Speech Enhancement For Robust Speech Recognition (2018)11.08
- Perceive And Predict: Self-supervised Speech Representation Based Loss Functions For Speech Enhancement (2023)7.16
- Investigating Cross-domain Losses For Speech Enhancement (2020)0.00
- Cheapnet: Improving Light-weight Speech Enhancement Network By Projected Loss Function (2023)0.00
- Time-domain Speech Enhancement Assisted By Multi-resolution Frequency Encoder And Decoder (2023)9.76