Branchformer: Parallel Mlp-attention Architectures To Capture Local And Global Context For Speech Recognition And Understanding
2022 · Yifan Peng, Siddharth Dalmia, Ian Lane, et al.
Abstract
Conformer has proven to be effective in many speech processing tasks. It combines the benefits of extracting local dependencies using convolutions and global dependencies using self-attention. Inspired by this, we propose a more flexible, interpretable and customizable encoder alternative, Branchformer, with parallel branches for modeling various ranged dependencies in end-to-end speech processing. In each encoder layer, one branch employs self-attention or its variant to capture long-range dependencies, while the other branch utilizes an MLP module with convolutional gating (cgMLP) to extract local relationships. We conduct experiments on several speech recognition and spoken language understanding benchmarks. Results show that our model outperforms both Transformer and cgMLP. It also matches with or outperforms state-of-the-art results achieved by Conformer. Furthermore, we show various strategies to reduce computation thanks to the two-branch architecture, including the ability to h
Authors
(none)
Tags
Stats
Related papers
- E-branchformer: Branchformer With Enhanced Merging For Speech Recognition (2022)14.66
- A Comparative Study On E-branchformer Vs Conformer In Speech Recognition, Translation, And Understanding Tasks (2023)7.81
- Tailored Design Of Audio-visual Speech Recognition Models Using Branchformers (2024)2.35
- Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition (2023)14.47
- Efficient Conformer With Prob-sparse Attention Mechanism For End-to-endspeech Recognition (2021)8.09
- PCNN: A Lightweight Parallel Conformer Neural Network For Efficient Monaural Speech Enhancement (2023)6.77
- Efficient Conformer: Progressive Downsampling And Grouped Attention For Automatic Speech Recognition (2021)13.79
- Multiformer: A Head-configurable Transformer-based Model For Direct Speech Translation (2022)0.00