Time-contrastive Learning Based Deep Bottleneck Features For Text-dependent Speaker Verification
2019 Β· Achintya Kr. Sarkar, Zheng-Hua Tan, Hao Tang, et al.
Abstract
There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases and triphone states for improving the performance of text-dependent speaker verification (TD-SV). However, a moderate success has been achieved. A recent study [1] presented a time contrastive learning (TCL) concept to explore the non-stationarity of brain signals for classification of brain states. Speech signals have similar non-stationarity property, and TCL further has the advantage of having no need for labeled data. We therefore present a TCL based BN feature extraction method. The method uniformly partitions each speech utterance in a training dataset into a predefined number of multi-frame segments. Each segment in an utterance corresponds to one class, and class labels are shared across utterances. DNNs are then trained to discriminate all speech frames among the classes to exploit the temporal structure of speech. In additio
Authors
(none)
Tags
Stats
Related papers
- Time-contrastive Learning Based DNN Bottleneck Features For Text-dependent Speaker Verification (2017)9.92
- On Bottleneck Features For Text-dependent Speaker Verification Using X-vectors (2020)0.00
- Deep Speaker Feature Learning For Text-independent Speaker Verification (2017)12.54
- Language Identification With Deep Bottleneck Features (2018)0.00
- Dnn-based Cross-lingual Voice Conversion Using Bottleneck Features (2019)3.58
- Vocal Tract Length Perturbation For Text-dependent Speaker Verification With Autoregressive Prediction Coding (2020)8.09
- On Training Targets And Activation Functions For Deep Representation Learning In Text-dependent Speaker Verification (2022)4.52
- DS-TDNN: Dual-stream Time-delay Neural Network With Global-aware Filter For Speaker Verification (2023)8.60