S-vectors And TESA: Speaker Embeddings And A Speaker Authenticator Based On Transformer Encoder
2020 Β· N J Metilda Sagaya Mary, S Umesh, Sandesh V Katta
Abstract
One of the most popular speaker embeddings is x-vectors, which are obtained from an architecture that gradually builds a larger temporal context with layers. In this paper, we propose to derive speaker embeddings from Transformer's encoder trained for speaker classification. Self-attention, on which Transformer's encoder is built, attends to all the features over the entire utterance and might be more suitable in capturing the speaker characteristics in an utterance. We refer to the speaker embeddings obtained from the proposed speaker classification model as s-vectors to emphasize that they are obtained from an architecture that heavily relies on self-attention. Through experiments, we demonstrate that s-vectors perform better than x-vectors. In addition to the s-vectors, we also propose a new architecture based on Transformer's encoder for speaker verification as a replacement for speaker verification based on conventional probabilistic linear discriminant analysis (PLDA). This archi
Authors
(none)
Tags
Stats
Related papers
- Investigation Of Speaker-adaptation Methods In Transformer Based ASR (2020)0.00
- Investigation Of Speaker Representation For Target-speaker Speech Processing (2024)4.52
- P-vectors: A Parallel-coupled Tdnn/transformer Network For Speaker Verification (2023)5.84
- Y-vector: Multiscale Waveform Encoder For Speaker Embedding (2020)8.60
- T-vectors: Weakly Supervised Speaker Identification Using Hierarchical Transformer Model (2020)0.00
- Improving Transformer-based Networks With Locality For Automatic Speaker Verification (2023)0.00
- Adapting End-to-end Neural Speaker Verification To New Languages And Recording Conditions With Adversarial Training (2018)9.59
- Quantitative Evidence On Overlooked Aspects Of Enrollment Speaker Embeddings For Target Speaker Separation (2022)7.16