Exploration Of Efficient End-to-end ASR Using Discretized Input From Self-supervised Learning
2023 Β· Xuankai Chang, Brian Yan, Yuya Fujita, et al.
Abstract
Self-supervised learning (SSL) of speech has shown impressive results in speech-related tasks, particularly in automatic speech recognition (ASR). While most methods employ the output of intermediate layers of the SSL model as real-valued features for downstream tasks, there is potential in exploring alternative approaches that use discretized token sequences. This approach offers benefits such as lower storage requirements and the ability to apply techniques from natural language processing. In this paper, we propose a new protocol that utilizes discretized token sequences in ASR tasks, which includes de-duplication and sub-word modeling to enhance the input sequence. It reduces computational cost by decreasing the length of the sequence. Our experiments on the LibriSpeech dataset demonstrate that our proposed protocol performs competitively with conventional ASR systems using continuous input features, while reducing computational and storage costs.
Authors
(none)
Tags
Stats
Related papers
- Fine-tuning Strategies For Faster Inference Using Speech Self-supervised Models: A Comparative Study (2023)8.35
- Deploying Self-supervised Learning In The Wild For Hybrid Automatic Speech Recognition (2022)0.00
- Efficient Infusion Of Self-supervised Representations In Automatic Speech Recognition (2024)0.00
- Fusion Of Discrete Representations And Self-augmented Representations For Multilingual Automatic Speech Recognition (2024)2.26
- Efficient Extraction Of Noise-robust Discrete Units From Self-supervised Speech Models (2024)0.00
- How To Learn A New Language? An Efficient Solution For Self-supervised Learning Models Unseen Languages Adaption In Low-resource Scenario (2024)0.00
- Investigating Self-supervised Learning For Speech Enhancement And Separation (2022)13.44
- Downstream Task Agnostic Speech Enhancement With Self-supervised Representation Loss (2023)6.77