Mirnet: Learning Multiple Identities Representations In Overlapped Speech
2020 Β· Hyewon Han, Soo-Whan Chung, Hong-Goo Kang
Abstract
Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determine identity information when there are multiple concurrent speakers in a given signal. In this paper, we propose a novel deep speaker representation strategy that can reliably extract multiple speaker identities from an overlapped speech. We design a network that can extract a high-level embedding that contains information about each speaker's identity from a given mixture. Unlike conventional approaches that need reference acoustic features for training, our proposed algorithm only requires the speaker identity labels of the overlapped speech segments. We demonstrate the effectiveness and usefulness of our algorithm in a speaker verification task and a speech separation system conditioned on the target speaker embeddings obtained through the proposed method.
Authors
(none)
Tags
Stats
Related papers
- Speaker Verification In Multi-speaker Environments Using Temporal Feature Fusion (2022)0.00
- End-to-end Multi-speaker Speech Recognition Using Speaker Embeddings And Transfer Learning (2019)9.41
- Learning Speaker Representations With Mutual Information (2018)11.76
- Joint Speaker Counting, Speech Recognition, And Speaker Identification For Overlapped Speech Of Any Number Of Speakers (2020)12.54
- Leveraging Speaker Attribute Information Using Multi Task Learning For Speaker Verification And Diarization (2020)6.34
- Speaker Verification Using Convolutional Neural Networks (2018)0.00
- Supervised Speaker Embedding De-mixing In Two-speaker Environment (2020)0.00
- Compositional Embedding Models For Speaker Identification And Diarization With Simultaneous Speech From 2+ Speakers (2020)3.58