Efficient Personalized Speech Enhancement Through Self-supervised Learning
2021 Β· Aswin Sivaraman, Minje Kim
Abstract
This work presents self-supervised learning methods for developing monaural speaker-specific (i.e., personalized) speech enhancement models. While generalist models must broadly address many speakers, specialist models can adapt their enhancement function towards a particular speaker's voice, expecting to solve a narrower problem. Hence, specialists are capable of achieving more optimal performance in addition to reducing computational complexity. However, naive personalization methods can require clean speech from the target user, which is inconvenient to acquire, e.g., due to subpar recording conditions. To this end, we pose personalization as either a zero-shot task, in which no additional clean speech of the target speaker is used for training, or a few-shot learning task, in which the goal is to minimize the duration of the clean speech used for transfer learning. With this paper, we propose self-supervised learning methods as a solution to both zero- and few-shot personalization
Authors
(none)
Tags
Stats
Related papers
- Self-supervised Learning From Contrastive Mixtures For Personalized Speech Enhancement (2020)0.00
- Personalized Speech Enhancement Through Self-supervised Data Augmentation And Purification (2021)9.92
- Zero-shot Personalized Speech Enhancement Through Speaker-informed Model Selection (2021)7.16
- The Universal Personalizer: Few-shot Dysarthric Speech Recognition Via Meta-learning (2025)0.00
- Self-supervised Learning Based Monaural Speech Enhancement With Multi-task Pre-training (2021)0.00
- Self-supervised Speaker Recognition Training Using Human-machine Dialogues (2022)5.84
- Automatic Data Augmentation Selection And Parametrization In Contrastive Self-supervised Speech Representation Learning (2022)5.24
- Learning Problem-agnostic Speech Representations From Multiple Self-supervised Tasks (2019)15.54