Robust Speaker Extraction Network Based On Iterative Refined Adaptation
2020 Β· Chengyun Deng, Shiqian Ma, Yi Zhang, et al.
Abstract
Speaker extraction aims to extract target speech signal from a multi-talker environment with interference speakers and surrounding noise, given the target speaker's reference information. Most speaker extraction systems achieve satisfactory performance on the premise that the test speakers have been encountered during training time. Such systems suffer from performance degradation given unseen target speakers and/or mismatched reference voiceprint information. In this paper we propose a novel strategy named Iterative Refined Adaptation (IRA) to improve the robustness and generalization capability of speaker extraction systems in the aforementioned scenarios. Given an initial speaker embedding encoded by an auxiliary network, the extraction network can obtain a latent representation of the target speaker, which is fed back to the auxiliary network to get a refined embedding to provide more accurate guidance for the extraction network. Experiments on WSJ0-2mix-extr and WHAM! dataset conf
Authors
(none)
Tags
Stats
Related papers
- A Two-stage Speaker Extraction Algorithm Under Adverse Acoustic Conditions Using A Single-microphone (2023)0.00
- Multi-stage Speaker Extraction With Utterance And Frame-level Reference Signals (2020)12.54
- Robust Speaker Recognition Using Unsupervised Adversarial Invariance (2019)9.76
- Target Speech Extraction Based On Blind Source Separation And X-vector-based Speaker Selection Trained With Data Augmentation (2020)0.00
- DEAAN: Disentangled Embedding And Adversarial Adaptation Network For Robust Speaker Representation Learning (2020)9.59
- Time-domain Speech Extraction With Spatial Information And Multi Speaker Conditioning Mechanism (2021)7.81
- Speaker Reinforcement Using Target Source Extraction For Robust Automatic Speech Recognition (2022)7.50
- Audio-visual Active Speaker Extraction For Sparsely Overlapped Multi-talker Speech (2023)7.50