Fusion Of Discrete Representations And Self-augmented Representations For Multilingual Automatic Speech Recognition
2024 Β· Shih-Heng Wang, Jiatong Shi, Chien-Yu Huang, et al.
Abstract
Self-supervised learning (SSL) models have shown exceptional capabilities across various speech-processing tasks. Continuous SSL representations are effective but suffer from high computational and storage demands. On the other hand, discrete SSL representations, although with degraded performance, reduce transmission and storage costs, and improve input sequence efficiency through de-duplication and subword-modeling. To boost the performance of discrete representations for ASR, we introduce a novel fusion mechanism that integrates two discrete representations. The fusion mechanism preserves all the benefits of discrete representation while enhancing the model's performance by integrating complementary information. Additionally, we explore "self-augmented'' discrete representations, which apply transformations to a single continuous SSL representation, eliminating the fusion mechanism's dependency on multiple SSL models and further decreasing its inference costs. Experimental results o
Authors
(none)
Tags
Stats
Related papers
- Exploring Effective Fusion Algorithms For Speech Based Self-supervised Learning Models (2022)0.00
- Efficient Infusion Of Self-supervised Representations In Automatic Speech Recognition (2024)0.00
- EFFUSE: Efficient Self-supervised Feature Fusion For E2E ASR In Low Resource And Multilingual Scenarios (2023)6.34
- SSHR: Leveraging Self-supervised Hierarchical Representations For Multilingual Automatic Speech Recognition (2023)0.00
- Exploration Of Efficient End-to-end ASR Using Discretized Input From Self-supervised Learning (2023)12.02
- Combining Spectral And Self-supervised Features For Low Resource Speech Recognition And Translation (2022)8.82
- MMM: Multi-layer Multi-residual Multi-stream Discrete Speech Representation From Self-supervised Learning Model (2024)6.77
- Deploying Self-supervised Learning In The Wild For Hybrid Automatic Speech Recognition (2022)0.00