Research On An Improved Conformer End-to-end Speech Recognition Model With R-drop Structure
2023 Β· Weidong Ji, Shijie Zan, Guohui Zhou, et al.
Abstract
To address the issue of poor generalization ability in end-to-end speech recognition models within deep learning, this study proposes a new Conformer-based speech recognition model called "Conformer-R" that incorporates the R-drop structure. This model combines the Conformer model, which has shown promising results in speech recognition, with the R-drop structure. By doing so, the model is able to effectively model both local and global speech information while also reducing overfitting through the use of the R-drop structure. This enhances the model's ability to generalize and improves overall recognition efficiency. The model was first pre-trained on the Aishell1 and Wenetspeech datasets for general domain adaptation, and subsequently fine-tuned on computer-related audio data. Comparison tests with classic models such as LAS and Wenet were performed on the same test set, demonstrating the Conformer-R model's ability to effectively improve generalization.
Authors
(none)
Tags
Stats
Related papers
- Nextformer: A Convnext Augmented Conformer For End-to-end Speech Recognition (2022)0.00
- Self-consistent Context Aware Conformer Transducer For Speech Recognition (2024)0.00
- Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition (2023)14.47
- Efficient Conformer: Progressive Downsampling And Grouped Attention For Automatic Speech Recognition (2021)13.79
- Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019)0.00
- Advancing CTC-CRF Based End-to-end Speech Recognition With Wordpieces And Conformers (2021)0.00
- Efficient Conformer With Prob-sparse Attention Mechanism For End-to-endspeech Recognition (2021)8.09
- A Comparative Analysis Between Conformer-transducer, Whisper, And Wav2vec2 For Improving The Child Speech Recognition (2023)7.16