Interformer: Interactive Local And Global Features Fusion For Automatic Speech Recognition
2023 Β· Zhi-Hao Lai, Tian-Hao Zhang, Qi Liu, et al.
Abstract
The local and global features are both essential for automatic speech recognition (ASR). Many recent methods have verified that simply combining local and global features can further promote ASR performance. However, these methods pay less attention to the interaction of local and global features, and their series architectures are rigid to reflect local and global relationships. To address these issues, this paper proposes InterFormer for interactive local and global features fusion to learn a better representation for ASR. Specifically, we combine the convolution block with the transformer block in a parallel design. Besides, we propose a bidirectional feature interaction module (BFIM) and a selective fusion module (SFM) to implement the interaction and fusion of local and global features, respectively. Extensive experiments on public ASR datasets demonstrate the effectiveness of our proposed InterFormer and its superior performance over the other Transformer and Conformer models.
Authors
(none)
Tags
Stats
Related papers
- An Enhanced Res2net With Local And Global Feature Fusion For Speaker Verification (2023)19.74
- Multi-dimensional And Multi-scale Modeling For Speech Separation Optimized By Discriminative Learning (2023)0.00
- A Multi-level Acoustic Feature Extraction Framework For Transformer Based End-to-end Speech Recognition (2021)0.00
- Improving Transformer-based Conversational ASR By Inter-sentential Attention Mechanism (2022)7.50
- Attentive Fusion Enhanced Audio-visual Encoding For Transformer Based Robust Speech Recognition (2020)0.00
- Tf-locoformer: Transformer With Local Modeling By Convolution For Speech Separation And Enhancement (2024)10.35
- Speechformer++: A Hierarchical Efficient Framework For Paralinguistic Speech Processing (2023)14.43
- Interactive Feature Fusion For End-to-end Noise-robust Speech Recognition (2021)12.10