Triplet Network With Attention For Speaker Diarization
2018 Β· Huan Song, Megan Willi, Jayaraman J. Thiagarajan, et al.
Abstract
In automatic speech processing systems, speaker diarization is a crucial front-end component to separate segments from different speakers. Inspired by the recent success of deep neural networks (DNNs) in semantic inferencing, triplet loss-based architectures have been successfully used for this problem. However, existing work utilizes conventional i-vectors as the input representation and builds simple fully connected networks for metric learning, thus not fully leveraging the modeling power of DNN architectures. This paper investigates the importance of learning effective representations from the sequences directly in metric learning pipelines for speaker diarization. More specifically, we propose to employ attention models to learn embeddings and the metric jointly in an end-to-end fashion. Experiments are conducted on the CALLHOME conversational speech corpus. The diarization results demonstrate that, besides providing a unified model, the proposed approach achieves improved perform
Authors
(none)
Tags
Stats
Related papers
- Speaker Diarization With LSTM (2017)17.48
- Speaker Diarization Using Deep Recurrent Convolutional Neural Networks For Speaker Embeddings (2017)9.41
- Latent Space Representation For Multi-target Speaker Detection And Identification With A Sparse Dataset Using Triplet Neural Networks (2019)5.24
- Designing An Effective Metric Learning Pipeline For Speaker Diarization (2018)8.60
- Leveraging Speaker Embeddings In End-to-end Neural Diarization For Two-speaker Scenarios (2024)0.00
- Multi-scale Speaker Embedding-based Graph Attention Networks For Speaker Diarisation (2021)8.35
- End-to-end Diarization For Variable Number Of Speakers With Local-global Networks And Discriminative Speaker Embeddings (2021)0.00
- Combination Of Deep Speaker Embeddings For Diarisation (2020)8.60