Latent Space Representation For Multi-target Speaker Detection And Identification With A Sparse Dataset Using Triplet Neural Networks
2019 Β· Kin Wai Cheuk, Balamurali B. T., Gemma Roig, et al.
Abstract
We present an approach to tackle the speaker recognition problem using Triplet Neural Networks. Currently, the \(i\)-vector representation with probabilistic linear discriminant analysis (PLDA) is the most commonly used technique to solve this problem, due to high classification accuracy with a relatively short computation time. In this paper, we explore a neural network approach, namely Triplet Neural Networks (TNNs), to built a latent space for different classifiers to solve the Multi-Target Speaker Detection and Identification Challenge Evaluation 2018 (MCE 2018) dataset. This training set contains \(i\)-vectors from 3,631 speakers, with only 3 samples for each speaker, thus making speaker recognition a challenging task. When using the train and development set for training both the TNN and baseline model (i.e., similarity evaluation directly on the \(i\)-vector representation), our proposed model outperforms the baseline by 23%. When reducing the training data to only using the tra
Authors
(none)
Tags
Stats
Related papers
- Triplet Network With Attention For Speaker Diarization (2018)7.16
- Triplet Based Embedding Distance And Similarity Learning For Text-independent Speaker Verification (2019)5.24
- Titanet: Neural Model For Speaker Representation With 1D Depth-wise Separable Convolutions And Global Context (2021)14.90
- Deep Neural Network Based I-vector Mapping For Speaker Verification Using Short Utterances (2018)0.00
- End-to-end DNN Based Speaker Recognition Inspired By I-vector And PLDA (2017)10.35
- Triplet Entropy Loss: Improving The Generalisation Of Short Speech Language Identification Systems (2020)0.00
- Multi-target Extractor And Detector For Unknown-number Speaker Diarization (2022)8.09
- Triplet Loss Based Embeddings For Forensic Speaker Identification In Spanish (2021)2.26