Abstract

Non-native speech causes automatic speech recognition systems to degrade in performance. Past strategies to address this challenge have considered model adaptation, accent classification with a model selection, alternate pronunciation lexicon, etc. In this study, we consider a recurrent neural network (RNN) with connectionist temporal classification (CTC) cost function trained on multi-accent English data including US (Native), Indian and Hispanic accents. We exploit dark knowledge from a model trained with the multi-accent data to train student models under the guidance of both a teacher model and CTC cost of target transcription. We show that transferring knowledge from a single RNN-CTC trained model toward a student model, yields better performance than the stand-alone teacher model. Since the outputs of different trained CTC models are not necessarily aligned, it is not possible to simply use an ensemble of CTC teacher models. To address this problem, we train accent specific model

Authors

(none)

Tags

  • Speech Recognition
  • Speech Translation

Stats

Related papers