Probing The Information Encoded In X-vectors
2019 Β· Desh Raj, David Snyder, Daniel Povey, et al.
Abstract
Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks. In this paper, we use simple classifiers to investigate the contents encoded by x-vector embeddings. We probe these embeddings for information related to the speaker, channel, transcription (sentence, words, phones), and meta information about the utterance (duration and augmentation type), and compare these with the information encoded by i-vectors across a varying number of dimensions. We also study the effect of data augmentation during extractor training on the information captured by x-vectors. Experiments on the RedDots data set show that x-vectors capture spoken content and channel-related information, while performing well on speaker verification tasks.
Authors
(none)
Tags
Stats
Related papers
- Generative X-vectors For Text-independent Speaker Verification (2018)7.16
- Multi-task Learning With High-order Statistics For X-vector Based Text-independent Speaker Verification (2019)8.35
- Speaker Embedding Extraction With Phonetic Information (2018)11.85
- X-vectors Meet Emotions: A Study On Dependencies Between Emotion And Speaker Recognition (2020)14.23
- Vae-based Domain Adaptation For Speaker Verification (2019)7.50
- Gaussian Speaker Embedding Learning For Text-independent Speaker Verification (2020)0.00
- On Bottleneck Features For Text-dependent Speaker Verification Using X-vectors (2020)0.00
- Unleashing The Unused Potential Of I-vectors Enabled By GPU Acceleration (2019)2.26