A Lightweight Dual-stage Framework For Personalized Speech Enhancement Based On Deepfilternet2
2024 · Thomas Serre, Mathieu Fontaine, Éric Benhaim, et al.
Abstract
Isolating the desired speaker's voice amidst multiplespeakers in a noisy acoustic context is a challenging task. Per-sonalized speech enhancement (PSE) endeavours to achievethis by leveraging prior knowledge of the speaker's voice.Recent research efforts have yielded promising PSE mod-els, albeit often accompanied by computationally intensivearchitectures, unsuitable for resource-constrained embeddeddevices. In this paper, we introduce a novel method to per-sonalize a lightweight dual-stage Speech Enhancement (SE)model and implement it within DeepFilterNet2, a SE modelrenowned for its state-of-the-art performance. We seek anoptimal integration of speaker information within the model,exploring different positions for the integration of the speakerembeddings within the dual-stage enhancement architec-ture. We also investigate a tailored training strategy whenadapting DeepFilterNet2 to a PSE task. We show that ourpersonalization method greatly improves the performancesof DeepFilterNet2 wh
Authors
(none)
Tags
Stats
Related papers
- Personalized Speech Enhancement Without A Separate Speaker Embedding Model (2024)5.24
- Sef-pnet: Speaker Encoder-free Personalized Speech Enhancement With Local And Global Contexts Aggregation (2025)2.26
- Real-time Joint Personalized Speech Enhancement And Acoustic Echo Cancellation (2022)4.52
- Personalized Percepnet: Real-time, Low-complexity Target Voice Separation And Enhancement (2021)10.97
- Dynamic Acoustic Compensation And Adaptive Focal Training For Personalized Speech Enhancement (2022)4.52
- Lisennet: Lightweight Sub-band And Dual-path Modeling For Real-time Speech Enhancement (2024)9.03
- Cross-attention Is All You Need: Real-time Streaming Transformers For Personalised Speech Enhancement (2022)0.00
- The Potential Of Neural Speech Synthesis-based Data Augmentation For Personalized Speech Enhancement (2022)6.77