Zero-shot Personalized Speech Enhancement Through Speaker-informed Model Selection
2021 Β· Aswin Sivaraman, Minje Kim
Abstract
This paper presents a novel zero-shot learning approach towards personalized speech enhancement through the use of a sparsely active ensemble model. Optimizing speech denoising systems towards a particular test-time speaker can improve performance and reduce run-time complexity. However, test-time model adaptation may be challenging if collecting data from the test-time speaker is not possible. To this end, we propose using an ensemble model wherein each specialist module denoises noisy utterances from a distinct partition of training set speakers. The gating module inexpensively estimates test-time speaker characteristics in the form of an embedding vector and selects the most appropriate specialist module for denoising the test signal. Grouping the training set speakers into non-overlapping semantically similar groups is non-trivial and ill-defined. To do this, we first train a Siamese network using noisy speech pairs to maximize or minimize the similarity of its output vectors depen
Authors
(none)
Tags
Stats
Related papers
- Efficient Personalized Speech Enhancement Through Self-supervised Learning (2021)10.21
- Speech Enhancement With Zero-shot Model Selection (2020)7.81
- Personalized Speech Enhancement Through Self-supervised Data Augmentation And Purification (2021)9.92
- Personalized Speech Enhancement Without A Separate Speaker Embedding Model (2024)5.24
- Self-supervised Learning From Contrastive Mixtures For Personalized Speech Enhancement (2020)0.00
- Enhancing Zero-shot Multi-speaker TTS With Negated Speaker Representations (2024)3.58
- Generalizable Zero-shot Speaker Adaptive Speech Synthesis With Disentangled Representations (2023)6.34
- Zero-shot Multi-speaker Text-to-speech With State-of-the-art Neural Speaker Embeddings (2019)15.67