Layer-wise Fast Adaptation For End-to-end Multi-accent Speech Recognition
2022 Β· Xun Gong, Yizhou Lu, Zhikai Zhou, et al.
Abstract
Accent variability has posed a huge challenge to automatic speech recognition~(ASR) modeling. Although one-hot accent vector based adaptation systems are commonly used, they require prior knowledge about the target accent and cannot handle unseen accents. Furthermore, simply concatenating accent embeddings does not make good use of accent knowledge, which has limited improvements. In this work, we aim to tackle these problems with a novel layer-wise adaptation structure injected into the E2E ASR model encoder. The adapter layer encodes an arbitrary accent in the accent space and assists the ASR model in recognizing accented speech. Given an utterance, the adaptation structure extracts the corresponding accent information and transforms the input acoustic feature into an accent-related feature through the linear combination of all accent bases. We further explore the injection position of the adaptation layer, the number of accent bases, and different types of accent bases to achieve be
Authors
(none)
Tags
Stats
Related papers
- Multi-accent Adaptation Based On Gate Mechanism (2020)8.35
- Residual Adapters For Parameter-efficient ASR Adaptation To Atypical And Accented Speech (2021)10.74
- E2e-based Multi-task Learning Approach To Joint Speech And Accent Recognition (2021)0.00
- Qifusion-net: Layer-adapted Stream/non-stream Model For End-to-end Multi-accent Speech Recognition (2024)3.58
- Accent-robust Automatic Speech Recognition Using Supervised And Unsupervised Wav2vec Embeddings (2021)0.00
- Multi-scale Accent Modeling And Disentangling For Multi-speaker Multi-accent Text-to-speech Synthesis (2024)2.26
- A Highly Adaptive Acoustic Model For Accurate Multi-dialect Speech Recognition (2022)10.85
- Best Of Both Worlds: Robust Accented Speech Recognition With Adversarial Transfer Learning (2021)9.23