A Hybrid Continuity Loss To Reduce Over-suppression For Time-domain Target Speaker Extraction
2022 Β· Zexu Pan, Meng Ge, Haizhou Li
Abstract
The speaker extraction algorithm extracts the target speech from a mixture speech containing interference speech and background noise. The extraction process sometimes over-suppresses the extracted target speech, which not only creates artifacts during listening but also harms the performance of downstream automatic speech recognition algorithms. We propose a hybrid continuity loss function for time-domain speaker extraction algorithms to settle the over-suppression problem. On top of the waveform-level loss used for superior signal quality, i.e., SI-SDR, we introduce a multi-resolution delta spectrum loss in the frequency-domain, to ensure the continuity of an extracted speech signal, thus alleviating the over-suppression. We examine the hybrid continuity loss function using a time-domain audio-visual speaker extraction algorithm on the YouTube LRS2-BBC dataset. Experimental results show that the proposed loss function reduces the over-suppression and improves the word error rate of s
Authors
(none)
Tags
Stats
Related papers
- Auxiliary Interference Speaker Loss For Target-speaker Speech Recognition (2019)9.76
- A Consolidated View Of Loss Functions For Supervised Deep Learning-based Speech Enhancement (2020)13.93
- Distortionless Multi-channel Target Speech Enhancement For Overlapped Speech Recognition (2020)0.00
- Target Speaker Extraction For Overlapped Multi-talker Speaker Verification (2019)0.00
- Optimization Of Speaker Extraction Neural Network With Magnitude And Temporal Spectrum Approximation Loss (2019)11.29
- The Sound Of My Voice: Speaker Representation Loss For Target Voice Separation (2019)8.09
- Target Confusion In End-to-end Speaker Extraction: Analysis And Approaches (2022)9.59
- A Two-stage Speaker Extraction Algorithm Under Adverse Acoustic Conditions Using A Single-microphone (2023)0.00