Triplec Learning And Lightweight Speech Enhancement For Multi-condition Target Speech Extraction
2025 Β· Ziling Huang
Abstract
In our recent work, we proposed Lightweight Speech Enhancement Guided Target Speech Extraction (LGTSE) and demonstrated its effectiveness in multi-speaker-plus-noise scenarios. However, real-world applications often involve more diverse and complex conditions, such as one-speaker-plus-noise or two-speaker-without-noise. To address this challenge, we extend LGTSE with a Cross-Condition Consistency learning strategy, termed TripleC Learning. This strategy is first validated under multi-speaker-plus-noise condition and then evaluated for its generalization across diverse scenarios. Moreover, building upon the lightweight front-end denoiser in LGTSE, which can flexibly process both noisy and clean mixtures and shows strong generalization to unseen conditions, we integrate TripleC learning with a proposed parallel universal training scheme that organizes batches containing multiple scenarios for the same target speaker. By enforcing consistent extraction across different conditions, easier
Authors
(none)
Tags
Stats
Related papers
- Lightweight Speech Enhancement Guided Target Speech Extraction In Noisy Multi-speaker Scenarios (2025)0.00
- Limuse: Lightweight Multi-modal Speaker Extraction (2021)0.00
- Toward Universal Speech Enhancement For Diverse Input Conditions (2023)0.00
- 3S-TSE: Efficient Three-stage Target Speaker Extraction For Real-time And Low-resource Applications (2023)5.24
- Spectron: Target Speaker Extraction Using Conditional Transformer With Adversarial Refinement (2024)0.00
- Incorporating Multi-target In Multi-stage Speech Enhancement Model For Better Generalization (2021)0.00
- Triplet Entropy Loss: Improving The Generalisation Of Short Speech Language Identification Systems (2020)0.00
- Improving Curriculum Learning For Target Speaker Extraction With Synthetic Speakers (2024)2.26