Speechformer++: A Hierarchical Efficient Framework For Paralinguistic Speech Processing
2023 Β· Weidong Chen, Xiaofen Xing, Xiangmin Xu, et al.
Abstract
Paralinguistic speech processing is important in addressing many issues, such as sentiment and neurocognitive disorder analyses. Recently, Transformer has achieved remarkable success in the natural language processing field and has demonstrated its adaptation to speech. However, previous works on Transformer in the speech field have not incorporated the properties of speech, leaving the full potential of Transformer unexplored. In this paper, we consider the characteristics of speech and propose a general structure-based framework, called SpeechFormer++, for paralinguistic speech processing. More concretely, following the component relationship in the speech signal, we design a unit encoder to model the intra- and inter-unit information (i.e., frames, phones, and words) efficiently. According to the hierarchical relationship, we utilize merging blocks to generate features at different granularities, which is consistent with the structural pattern in the speech signal. Moreover, a word
Authors
(none)
Tags
Stats
Related papers
- Speechformer: A Hierarchical Efficient Framework Incorporating The Characteristics Of Speech (2022)12.99
- Speechformer: Reducing Information Loss In Direct Speech Translation (2021)7.16
- Generative Pre-trained Speech Language Model With Efficient Hierarchical Transformer (2024)5.96
- Efficient Transformer-based Speech Enhancement Using Long Frames And STFT Magnitudes (2022)9.59
- Paraformer: Fast And Accurate Parallel Transformer For Non-autoregressive End-to-end Speech Recognition (2022)15.10
- Exploring Self-attention Mechanisms For Speech Separation (2022)12.54
- Attention Is All You Need In Speech Separation (2020)20.59
- A Multi-level Acoustic Feature Extraction Framework For Transformer Based End-to-end Speech Recognition (2021)0.00