Speaker Adaptation For End-to-end CTC Models
2019 Β· Ke Li, Jinyu Li, Yong Zhao, et al.
Abstract
We propose two approaches for speaker adaptation in end-to-end (E2E) automatic speech recognition systems. One is Kullback-Leibler divergence (KLD) regularization and the other is multi-task learning (MTL). Both approaches aim to address the data sparsity especially output target sparsity issue of speaker adaptation in E2E systems. The KLD regularization adapts a model by forcing the output distribution from the adapted model to be close to the unadapted one. The MTL utilizes a jointly trained auxiliary task to improve the performance of the main task. We investigated our approaches on E2E connectionist temporal classification (CTC) models with three different types of output units. Experiments on the Microsoft short message dictation task demonstrated that MTL outperforms KLD regularization. In particular, the MTL adaptation obtained 8.8% and 4.0% relative word error rate reductions (WERRs) for supervised and unsupervised adaptations for the word CTC model, and 9.6% and 3.8% relative
Authors
(none)
Tags
Stats
Related papers
- Multilingual Training And Cross-lingual Adaptation On Ctc-based Acoustic Model (2017)0.00
- Unsupervised Model-based Speaker Adaptation Of End-to-end Lattice-free MMI Model For Speech Recognition (2022)2.26
- Multiple-hypothesis Ctc-based Semi-supervised Adaptation Of End-to-end Speech Recognition (2021)5.84
- Listen, Attend, Spell And Adapt: Speaker Adapted Sequence-to-sequence ASR (2019)8.82
- Towards Personalization Of CTC Speech Recognition Models With Contextual Adapters And Adaptive Boosting (2022)0.00
- MT2KD: Towards A General-purpose Encoder For Speech, Speaker, And Audio Events (2024)0.00
- Empirical Evaluation Of Speaker Adaptation On DNN Based Acoustic Model (2018)5.24
- Inter-kd: Intermediate Knowledge Distillation For Ctc-based Automatic Speech Recognition (2022)7.50