Low-resource Speech Recognition And Dialect Identification Of Irish In A Multi-task Framework
2024 · Liam Lonergan, Mengjie Qian, Neasa Ní Chiaráin, et al.
Abstract
This paper explores the use of Hybrid CTC/Attention encoder-decoder models trained with Intermediate CTC (InterCTC) for Irish (Gaelic) low-resource speech recognition (ASR) and dialect identification (DID). Results are compared to the current best performing models trained for ASR (TDNN-HMM) and DID (ECAPA-TDNN). An optimal InterCTC setting is initially established using a Conformer encoder. This setting is then used to train a model with an E-branchformer encoder and the performance of both architectures are compared. A multi-task fine-tuning approach is adopted for language model (LM) shallow fusion. The experiments yielded an improvement in DID accuracy of 10.8% relative to a baseline ECAPA-TDNN, and WER performance approaching the TDNN-HMM model. This multi-task approach emerges as a promising strategy for Irish low-resource ASR and DID.
Authors
(none)
Tags
Stats
Related papers
- An Improved Hybrid Ctc-attention Model For Speech Recognition (2018)0.00
- Linguistic-enhanced Transformer With CTC Embedding For Speech Recognition (2022)2.26
- Multilingual Training And Cross-lingual Adaptation On Ctc-based Acoustic Model (2017)0.00
- Decoupling And Interacting Multi-task Learning Network For Joint Speech And Accent Recognition (2023)9.03
- Joint Ctc-attention Based End-to-end Speech Recognition Using Multi-task Learning (2016)20.43
- Hierarchical Multitask Learning For Ctc-based Speech Recognition (2018)0.00
- Multi-encoder Multi-resolution Framework For End-to-end Speech Recognition (2018)0.00
- Improving LSTM-CTC Based ASR Performance In Domains With Limited Training Data (2017)0.00