Speaker Embedding Extraction With Phonetic Information
2018 Β· Yi Liu, Liang He, Jia Liu, et al.
Abstract
Speaker embeddings achieve promising results on many speaker verification tasks. Phonetic information, as an important component of speech, is rarely considered in the extraction of speaker embeddings. In this paper, we introduce phonetic information to the speaker embedding extraction based on the x-vector architecture. Two methods using phonetic vectors and multi-task learning are proposed. On the Fisher dataset, our best system outperforms the original x-vector approach by 20% in EER, and by 15%, 15% in minDCF08 and minDCF10, respectively. Experiments conducted on NIST SRE10 further demonstrate the effectiveness of the proposed methods.
Authors
(none)
Tags
Stats
Related papers
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Speech Rhythm-based Speaker Embeddings Extraction From Phonemes And Phoneme Duration For Multi-speaker Speech Synthesis (2024)3.58
- Multi-task Learning With High-order Statistics For X-vector Based Text-independent Speaker Verification (2019)8.35
- Y-vector: Multiscale Waveform Encoder For Speaker Embedding (2020)8.60
- Probing The Information Encoded In X-vectors (2019)13.23
- A Comparative Re-assessment Of Feature Extractors For Deep Speaker Embeddings (2020)8.09
- Gaussian Speaker Embedding Learning For Text-independent Speaker Verification (2020)0.00
- Triplet Based Embedding Distance And Similarity Learning For Text-independent Speaker Verification (2019)5.24