Low Resource German ASR With Untranscribed Data Spoken By Non-native Children -- INTERSPEECH 2021 Shared Task SPAPL System
2021 Β· Jinhan Wang, Yunzheng Zhu, Ruchao Fan, et al.
Abstract
This paper describes the SPAPL system for the INTERSPEECH 2021 Challenge: Shared Task on Automatic Speech Recognition for Non-Native Children's Speech in German. ~ 5 hours of transcribed data and ~ 60 hours of untranscribed data are provided to develop a German ASR system for children. For the training of the transcribed data, we propose a non-speech state discriminative loss (NSDL) to mitigate the influence of long-duration non-speech segments within speech utterances. In order to explore the use of the untranscribed data, various approaches are implemented and combined together to incrementally improve the system performance. First, bidirectional autoregressive predictive coding (Bi-APC) is used to learn initial parameters for acoustic modelling using the provided untranscribed data. Second, incremental semi-supervised learning is further used to iteratively generate pseudo-transcribed data. Third, different data augmentation schemes are used at different training stages to increase
Authors
(none)
Tags
Stats
Related papers
- The NTNU System At The Interspeech 2020 Non-native Children's Speech ASR Challenge (2020)6.34
- Data Augmentation Using Prosody And False Starts To Recognize Non-native Children's Speech (2020)8.35
- Bi-apc: Bidirectional Autoregressive Predictive Coding For Unsupervised Pre-training And Its Application To Children's ASR (2021)6.34
- Generative Adversarial Training Data Adaptation For Very Low-resource Automatic Speech Recognition (2020)6.77
- From Weak Labels To Strong Results: Utilizing 5,000 Hours Of Noisy Classroom Transcripts With Minimal Accurate Data (2025)0.00
- Adaptive Activation Network For Low Resource Multilingual Speech Recognition (2022)0.00
- Almost-unsupervised Speech Recognition With Close-to-zero Resource Based On Phonetic Structures Learned From Very Small Unpaired Speech And Text Data (2018)0.00
- Pretraining By Backtranslation For End-to-end ASR In Low-resource Settings (2018)0.00