E-branchformer: Branchformer With Enhanced Merging For Speech Recognition
2022 Β· Kwangyoun Kim, Felix Wu, Yifan Peng, et al.
Abstract
Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR). Several other studies have explored integrating convolution and self-attention but they have not managed to match Conformer's performance. The recently introduced Branchformer achieves comparable performance to Conformer by using dedicated branches of convolution and self-attention and merging local and global context from each branch. In this paper, we propose E-Branchformer, which enhances Branchformer by applying an effective merging method and stacking additional point-wise modules. E-Branchformer sets new state-of-the-art word error rates (WERs) 1.81% and 3.65% on LibriSpeech test-clean and test-other sets without using any external training data.
Authors
(none)
Tags
Stats
Related papers
- A Comparative Study On E-branchformer Vs Conformer In Speech Recognition, Translation, And Understanding Tasks (2023)7.81
- Branchformer: Parallel Mlp-attention Architectures To Capture Local And Global Context For Speech Recognition And Understanding (2022)0.00
- Efficient Conformer With Prob-sparse Attention Mechanism For End-to-endspeech Recognition (2021)8.09
- Efficient Conformer: Progressive Downsampling And Grouped Attention For Automatic Speech Recognition (2021)13.79
- Nextformer: A Convnext Augmented Conformer For End-to-end Speech Recognition (2022)0.00
- Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition (2023)14.47
- Self-consistent Context Aware Conformer Transducer For Speech Recognition (2024)0.00
- Late Fusion Ensembles For Speech Recognition On Diverse Input Audio Representations (2024)0.00