Unleashing The Unused Potential Of I-vectors Enabled By GPU Acceleration
2019 Β· Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, et al.
Abstract
Speaker embeddings are continuous-value vector representations that allow easy comparison between voices of speakers with simple geometric operations. Among others, i-vector and x-vector have emerged as the mainstream methods for speaker embedding. In this paper, we illustrate the use of modern computation platform to harness the benefit of GPU acceleration for i-vector extraction. In particular, we achieve an acceleration of 3000 times in frame posterior computation compared to real time and 25 times in training the i-vector extractor compared to the CPU baseline from Kaldi toolkit. This significant speed-up allows the exploration of ideas that were hitherto impossible. In particular, we show that it is beneficial to update the universal background model (UBM) and re-compute frame alignments while training the i-vector extractor. Additionally, we are able to study different variations of i-vector extractors more rigorously than before. In this process, we reveal some undocumented deta
Authors
(none)
Tags
Stats
Related papers
- Factorization Of Discriminatively Trained I-vector Extractor For Speaker Recognition (2019)0.00
- Discriminatively Re-trained I-vector Extractor For Speaker Recognition (2018)5.84
- Probing The Information Encoded In X-vectors (2019)13.23
- Generative X-vectors For Text-independent Speaker Verification (2018)7.16
- Investigation Of Using VAE For I-vector Speaker Verification (2017)0.00
- Y-vector: Multiscale Waveform Encoder For Speaker Embedding (2020)8.60
- Supervector Compression Strategies To Speed Up I-vector System Development (2018)5.24
- Memory-efficient Training For Deep Speaker Embedding Learning In Speaker Verification (2024)2.26