Personalized Speech Enhancement Through Self-supervised Data Augmentation And Purification
2021 Β· Aswin Sivaraman, Sunwoo Kim, Minje Kim
Abstract
Training personalized speech enhancement models is innately a no-shot learning problem due to privacy constraints and limited access to noise-free speech from the target user. If there is an abundance of unlabeled noisy speech from the test-time user, a personalized speech enhancement model can be trained using self-supervised learning. One straightforward approach to model personalization is to use the target speaker's noisy recordings as pseudo-sources. Then, a pseudo denoising model learns to remove injected training noises and recover the pseudo-sources. However, this approach is volatile as it depends on the quality of the pseudo-sources, which may be too noisy. As a remedy, we propose an improvement to the self-supervised approach through data purification. We first train an SNR predictor model to estimate the frame-by-frame SNR of the pseudo-sources. Then, the predictor's estimates are converted into weights which adjust the frame-by-frame contribution of the pseudo-sources towa
Authors
(none)
Tags
Stats
Related papers
- Efficient Personalized Speech Enhancement Through Self-supervised Learning (2021)10.21
- Self-supervised Learning From Contrastive Mixtures For Personalized Speech Enhancement (2020)0.00
- Zero-shot Personalized Speech Enhancement Through Speaker-informed Model Selection (2021)7.16
- The Potential Of Neural Speech Synthesis-based Data Augmentation For Personalized Speech Enhancement (2022)6.77
- Joint Training Of Speech Enhancement And Self-supervised Model For Noise-robust ASR (2022)0.00
- Personalized Speech Enhancement Without A Separate Speaker Embedding Model (2024)5.24
- Self-supervised Pretraining For Robust Personalized Voice Activity Detection In Adverse Conditions (2023)6.34
- Automatic Data Augmentation For Domain Adapted Fine-tuning Of Self-supervised Speech Representations (2023)0.00