Decoupling And Interacting Multi-task Learning Network For Joint Speech And Accent Recognition
2023 Β· Qijie Shao, Pengcheng Guo, Jinghao Yan, et al.
Abstract
Accents, as variations from standard pronunciation, pose significant challenges for speech recognition systems. Although joint automatic speech recognition (ASR) and accent recognition (AR) training has been proven effective in handling multi-accent scenarios, current multi-task ASR-AR approaches overlook the granularity differences between tasks. Fine-grained units capture pronunciation-related accent characteristics, while coarse-grained units are better for learning linguistic information. Moreover, an explicit interaction of two tasks can also provide complementary information and improve the performance of each other, but it is rarely used by existing approaches. In this paper, we propose a novel Decoupling and Interacting Multi-task Network (DIMNet) for joint speech and accent recognition, which is comprised of a connectionist temporal classification (CTC) branch, an AR branch, an ASR branch, and a bottom feature encoder. Specifically, AR and ASR are first decoupled by separated
Authors
(none)
Tags
Stats
Related papers
- E2e-based Multi-task Learning Approach To Joint Speech And Accent Recognition (2021)0.00
- Multilingual Approach To Joint Speech And Accent Recognition With DNN-HMM Framework (2020)0.00
- 4D ASR: Joint Modeling Of CTC, Attention, Transducer, And Mask-predict Decoders (2022)7.50
- Towards Decoupling Frontend Enhancement And Backend Recognition In Monaural Robust ASR (2024)4.52
- Joint Ctc-attention Based End-to-end Speech Recognition Using Multi-task Learning (2016)20.43
- Accent And Speaker Disentanglement In Many-to-many Voice Conversion (2020)10.35
- Advanced Accent/dialect Identification And Accentedness Assessment With Multi-embedding Models And Automatic Speech Recognition (2023)7.16
- Multi-encoder Multi-resolution Framework For End-to-end Speech Recognition (2018)0.00