Advancing Multi-accented LSTM-CTC Speech Recognition Using A Domain Specific Student-teacher Learning Paradigm
2018 Β· Shahram Ghorbani, Ahmet E. Bulut, John H. L. Hansen
Abstract
Non-native speech causes automatic speech recognition systems to degrade in performance. Past strategies to address this challenge have considered model adaptation, accent classification with a model selection, alternate pronunciation lexicon, etc. In this study, we consider a recurrent neural network (RNN) with connectionist temporal classification (CTC) cost function trained on multi-accent English data including US (Native), Indian and Hispanic accents. We exploit dark knowledge from a model trained with the multi-accent data to train student models under the guidance of both a teacher model and CTC cost of target transcription. We show that transferring knowledge from a single RNN-CTC trained model toward a student model, yields better performance than the stand-alone teacher model. Since the outputs of different trained CTC models are not necessarily aligned, it is not possible to simply use an ensemble of CTC teacher models. To address this problem, we train accent specific model
Authors
(none)
Tags
Stats
Related papers
- Multilingual Training And Cross-lingual Adaptation On Ctc-based Acoustic Model (2017)0.00
- Large-scale Domain Adaptation Via Teacher-student Learning (2017)13.93
- Joint Ctc-attention Based End-to-end Speech Recognition Using Multi-task Learning (2016)20.43
- Teach An All-rounder With Experts In Different Domains (2019)2.26
- Multitask Learning With CTC And Segmental CRF For Speech Recognition (2017)0.00
- Inter-kd: Intermediate Knowledge Distillation For Ctc-based Automatic Speech Recognition (2022)7.50
- Decoupling And Interacting Multi-task Learning Network For Joint Speech And Accent Recognition (2023)9.03
- Distilling Knowledge From Ensembles Of Acoustic Models For Joint Ctc-attention End-to-end Speech Recognition (2020)8.09