End-to-end Attention Based Text-dependent Speaker Verification
2017 Β· Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, et al.
Abstract
A new type of End-to-End system for text-dependent speaker verification is presented in this paper. Previously, using the phonetically discriminative/speaker discriminative DNNs as feature extractors for speaker verification has shown promising results. The extracted frame-level (DNN bottleneck, posterior or d-vector) features are equally weighted and aggregated to compute an utterance-level speaker representation (d-vector or i-vector). In this work we use speaker discriminative CNNs to extract the noise-robust frame-level features. These features are smartly combined to form an utterance-level speaker vector through an attention mechanism. The proposed attention model takes the speaker discriminative information and the phonetic information to learn the weights. The whole system, including the CNN and attention model, is joint optimized using an end-to-end criterion. The training algorithm imitates exactly the evaluation process --- directly mapping a test utterance and a few target
Authors
(none)
Tags
Stats
Related papers
- End-to-end DNN Based Speaker Recognition Inspired By I-vector And PLDA (2017)10.35
- Attention Back-end For Automatic Speaker Verification With Multiple Enrollment Utterances (2021)10.21
- ECAPA-TDNN: Emphasized Channel Attention, Propagation And Aggregation In TDNN Based Speaker Verification (2020)23.07
- Phonetic-attention Scoring For Deep Speaker Features In Speaker Verification (2018)2.26
- Convolution-based Channel-frequency Attention For Text-independent Speaker Verification (2022)7.50
- Self Multi-head Attention For Speaker Recognition (2019)13.84
- Frequency And Temporal Convolutional Attention For Text-independent Speaker Recognition (2019)0.00
- Graph Attention Networks For Speaker Verification (2020)9.23