Embedding-based Speaker Adaptive Training Of Deep Neural Networks
2017 Β· Xiaodong Cui, Vaibhava Goel, George Saon
Abstract
An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker, are mapped through a control network to layer-dependent element-wise affine transformations to canonicalize the internal feature representations at the output of hidden layers of a main network. The control network for generating the speaker-dependent mappings is jointly estimated with the main network for the overall speaker adaptive acoustic modeling. Experiments on large vocabulary continuous speech recognition (LVCSR) tasks show that the proposed SAT scheme can yield superior performance over the widely-used speaker-aware training using i-vectors with speaker-adapted input features.
Authors
(none)
Tags
Stats
Related papers
- Embeddings For DNN Speaker Adaptive Training (2019)7.16
- Speaker Adaptive Training Using Model Agnostic Meta-learning (2019)9.92
- Adapting End-to-end Neural Speaker Verification To New Languages And Recording Conditions With Adversarial Training (2018)9.59
- Vae-based Domain Adaptation For Speaker Verification (2019)7.50
- DEAAN: Disentangled Embedding And Adversarial Adaptation Network For Robust Speaker Representation Learning (2020)9.59
- An Improved Deep Neural Network For Modeling Speaker Characteristics At Different Temporal Scales (2020)6.34
- Analyzing Deep Cnn-based Utterance Embeddings For Acoustic Model Adaptation (2018)6.77
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00