Target Confusion In End-to-end Speaker Extraction: Analysis And Approaches
2022 Β· Zifeng Zhao, Dongchao Yang, Rongzhi Gu, et al.
Abstract
Recently, end-to-end speaker extraction has attracted increasing attention and shown promising results. However, its performance is often inferior to that of a blind source separation (BSS) counterpart with a similar network architecture, due to the auxiliary speaker encoder may sometimes generate ambiguous speaker embeddings. Such ambiguous guidance information may confuse the separation network and hence lead to wrong extraction results, which deteriorates the overall performance. We refer to this as the target confusion problem. In this paper, we conduct an analysis of such an issue and solve it in two stages. In the training phase, we propose to integrate metric learning methods to improve the distinguishability of embeddings produced by the speaker encoder. While for inference, a novel post-filtering strategy is designed to revise the wrong results. Specifically, we first identify these confusion samples by measuring the similarities between output estimates and enrollment utteran
Authors
(none)
Tags
Stats
Related papers
- X-sepformer: End-to-end Speaker Extraction Network With Explicit Optimization On Speaker Confusion (2023)0.00
- Target Speech Extraction Based On Blind Source Separation And X-vector-based Speaker Selection Trained With Data Augmentation (2020)0.00
- New Insights On Target Speaker Extraction (2022)0.00
- Speaker-conditioning Single-channel Target Speaker Extraction Using Conformer-based Architectures (2022)6.34
- Quantitative Evidence On Overlooked Aspects Of Enrollment Speaker Embeddings For Target Speaker Separation (2022)7.16
- X-crossnet: A Complex Spectral Mapping Approach To Target Speaker Extraction With Cross Attention Speaker Embedding Fusion (2024)0.00
- Target Speaker Extraction By Directly Exploiting Contextual Information In The Time-frequency Domain (2024)9.59
- Focus On The Sound Around You: Monaural Target Speaker Extraction Via Distance And Speaker Information (2023)7.81