Factorised Speaker-environment Adaptive Training Of Conformer Speech Recognition Systems
2023 Β· Jiajun Deng, Guinan Li, Xurong Xie, et al.
Abstract
Rich sources of variability in natural speech present significant challenges to current data intensive speech recognition technologies. To model both speaker and environment level diversity, this paper proposes a novel Bayesian factorised speaker-environment adaptive training and test time adaptation approach for Conformer ASR models. Speaker and environment level characteristics are separately modeled using compact hidden output transforms, which are then linearly or hierarchically combined to represent any speaker-environment combination. Bayesian learning is further utilized to model the adaptation parameter uncertainty. Experiments on the 300-hr WHAM noise corrupted Switchboard data suggest that factorised adaptation consistently outperforms the baseline and speaker label only adapted Conformers by up to 3.1% absolute (10.4% relative) word error rate reductions. Further analysis shows the proposed method offers potential for rapid adaption to unseen speaker-environment conditions.
Authors
(none)
Tags
Stats
Related papers
- Confidence Score Based Conformer Speaker Adaptation For Speech Recognition (2022)8.09
- Confidence Score Based Speaker Adaptation Of Conformer Speech Recognition Systems (2023)8.35
- Homogeneous Speaker Features For On-the-fly Dysarthric And Elderly Speaker Adaptation (2024)0.00
- On-the-fly Feature Based Rapid Speaker Adaptation For Dysarthric And Elderly Speech Recognition (2022)6.34
- Resource-efficient Adaptation Of Speech Foundation Models For Multi-speaker ASR (2024)3.58
- Residual Adapters For Parameter-efficient ASR Adaptation To Atypical And Accented Speech (2021)10.74
- Bayesian Learning For Deep Neural Network Adaptation (2020)9.76
- Hyper-parameter Adaptation Of Conformer ASR Systems For Elderly And Dysarthric Speech Recognition (2023)0.00