A Comparative Study On E-branchformer Vs Conformer In Speech Recognition, Translation, And Understanding Tasks
2023 Β· Yifan Peng, Kwangyoun Kim, Felix Wu, et al.
Abstract
Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU). Recently, a new encoder called E-Branchformer has outperformed Conformer in the LibriSpeech ASR benchmark, making it promising for more general speech applications. This work compares E-Branchformer and Conformer through extensive experiments using different types of end-to-end sequence-to-sequence models. Results demonstrate that E-Branchformer achieves comparable or better performance than Conformer in almost all evaluation sets across 15 ASR, 2 ST, and 3 SLU benchmarks, while being more stable during training. We will release our training configurations and pre-trained models for reproducibility, which can benefit the speech community.
Authors
(none)
Tags
Stats
Related papers
- E-branchformer: Branchformer With Enhanced Merging For Speech Recognition (2022)14.66
- Branchformer: Parallel Mlp-attention Architectures To Capture Local And Global Context For Speech Recognition And Understanding (2022)0.00
- Recent Developments On Espnet Toolkit Boosted By Conformer (2020)0.00
- Conformer-based Hybrid ASR System For Switchboard Dataset (2021)9.41
- Nextformer: A Convnext Augmented Conformer For End-to-end Speech Recognition (2022)0.00
- Efficient Conformer With Prob-sparse Attention Mechanism For End-to-endspeech Recognition (2021)8.09
- Towards A Unified Conformer Structure: From ASR To ASV Task (2022)13.11
- Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition (2023)14.47