Speechformer: A Hierarchical Efficient Framework Incorporating The Characteristics Of Speech
2022 Β· Weidong Chen, Xiaofen Xing, Xiangmin Xu, et al.
Abstract
Transformer has obtained promising results on cognitive speech signal processing field, which is of interest in various applications ranging from emotion to neurocognitive disorder analysis. However, most works treat speech signal as a whole, leading to the neglect of the pronunciation structure that is unique to speech and reflects the cognitive process. Meanwhile, Transformer has heavy computational burden due to its full attention operation. In this paper, a hierarchical efficient framework, called SpeechFormer, which considers the structural characteristics of speech, is proposed and can be served as a general-purpose backbone for cognitive speech signal processing. The proposed SpeechFormer consists of frame, phoneme, word and utterance stages in succession, each performing a neighboring attention according to the structural pattern of speech with high computational efficiency. SpeechFormer is evaluated on speech emotion recognition (IEMOCAP & MELD) and neurocognitive disorder det
Authors
(none)
Tags
Stats
Related papers
- Speechformer++: A Hierarchical Efficient Framework For Paralinguistic Speech Processing (2023)14.43
- Speechformer: Reducing Information Loss In Direct Speech Translation (2021)7.16
- Efficient Transformer-based Speech Enhancement Using Long Frames And STFT Magnitudes (2022)9.59
- Attention Is All You Need In Speech Separation (2020)20.59
- Exploring Self-attention Mechanisms For Speech Separation (2022)12.54
- Multi-microphone Speech Emotion Recognition Using The Hierarchical Token-semantic Audio Transformer Architecture (2024)5.24
- Resource-efficient Separation Transformer (2022)7.81
- Mossformer: Pushing The Performance Limit Of Monaural Speech Separation Using Gated Single-head Transformer With Convolution-augmented Joint Self-attentions (2023)13.55