Supervised Speaker Embedding De-mixing In Two-speaker Environment
2020 Β· Yanpei Shi, Thomas Hain
Abstract
Separating different speaker properties from a multi-speaker environment is challenging. Instead of separating a two-speaker signal in signal space like speech source separation, a speaker embedding de-mixing approach is proposed. The proposed approach separates different speaker properties from a two-speaker signal in embedding space. The proposed approach contains two steps. In step one, the clean speaker embeddings are learned and collected by a residual TDNN based network. In step two, the two-speaker signal and the embedding of one of the speakers are both input to a speaker embedding de-mixing network. The de-mixing network is trained to generate the embedding of the other speaker by reconstruction loss. Speaker identification accuracy and the cosine similarity score between the clean embeddings and the de-mixed embeddings are used to evaluate the quality of the obtained embeddings. Experiments are done in two kind of data: artificial augmented two-speaker data (TIMIT) and real w
Authors
(none)
Tags
Stats
Related papers
- Single-channel Speech Separation With Auxiliary Speaker Embeddings (2019)0.00
- Quantitative Evidence On Overlooked Aspects Of Enrollment Speaker Embeddings For Target Speaker Separation (2022)7.16
- Real-time Speech Enhancement And Separation With A Unified Deep Neural Network For Single/dual Talker Scenarios (2023)2.26
- Individualized Conditioning And Negative Distances For Speaker Separation (2022)2.26
- TS-SEP: Joint Diarization And Separation Conditioned On Estimated Speaker Embeddings (2023)10.35
- EEND-DEMUX: End-to-end Neural Speaker Diarization Via Demultiplexed Speaker Embeddings (2023)0.00
- Boosting Unknown-number Speaker Separation With Transformer Decoder-based Attractor (2024)0.00
- Separate And Reconstruct: Asymmetric Encoder-decoder For Speech Separation (2024)0.00