SSHR: Leveraging Self-supervised Hierarchical Representations For Multilingual Automatic Speech Recognition
2023 Β· Hongfei Xue, Qijie Shao, Kaixun Huang, et al.
Abstract
Multilingual automatic speech recognition (ASR) systems have garnered attention for their potential to extend language coverage globally. While self-supervised learning (SSL) models, like MMS, have demonstrated their effectiveness in multilingual ASR, it is worth noting that various layers' representations potentially contain distinct information that has not been fully leveraged. In this study, we propose a novel method that leverages self-supervised hierarchical representations (SSHR) to fine-tune the MMS model. We first analyze the different layers of MMS and show that the middle layers capture language-related information, and the high layers encode content-related information, which gradually decreases in the final layers. Then, we extract a language-related frame from correlated middle layers and guide specific language extraction through self-attention mechanisms. Additionally, we steer the model toward acquiring more content-related information in the final layers using our pro
Authors
(none)
Tags
Stats
Related papers
- Fusion Of Discrete Representations And Self-augmented Representations For Multilingual Automatic Speech Recognition (2024)2.26
- Multi-variant Consistency Based Self-supervised Learning For Robust Automatic Speech Recognition (2021)0.00
- Deploying Self-supervised Learning In The Wild For Hybrid Automatic Speech Recognition (2022)0.00
- End-to-end Integration Of Speech Recognition, Dereverberation, Beamforming, And Self-supervised Learning Representation (2022)8.60
- Efficient Infusion Of Self-supervised Representations In Automatic Speech Recognition (2024)0.00
- An Empirical Analysis Of Speech Self-supervised Learning At Multiple Resolutions (2024)0.00
- Large Language Model Guided Decoding For Self-supervised Speech Recognition (2025)0.00
- Improved Self-supervised Multilingual Speech Representation Learning Combined With Auxiliary Language Information (2022)0.00