Deploying Self-supervised Learning In The Wild For Hybrid Automatic Speech Recognition
2022 Β· Mostafa Karimi, Changliang Liu, Kenichi Kumatani, et al.
Abstract
Self-supervised learning (SSL) methods have proven to be very successful in automatic speech recognition (ASR). These great improvements have been reported mostly based on highly curated datasets such as LibriSpeech for non-streaming End-to-End ASR models. However, the pivotal characteristics of SSL is to be utilized for any untranscribed audio data. In this paper, we provide a full exploration on how to utilize uncurated audio data in SSL from data pre-processing to deploying an streaming hybrid ASR model. More specifically, we present (1) the effect of Audio Event Detection (AED) model in data pre-processing pipeline (2) analysis on choosing optimizer and learning rate scheduling (3) comparison of recently developed contrastive losses, (4) comparison of various pre-training strategies such as utilization of in-domain versus out-domain pre-training data, monolingual versus multilingual pre-training data, multi-head multilingual SSL versus single-head multilingual SSL and supervised pr
Authors
(none)
Tags
Stats
Related papers
- Analyzing The Factors Affecting Usefulness Of Self-supervised Pre-trained Representations For Speech Recognition (2022)0.00
- Fine-tuning Strategies For Faster Inference Using Speech Self-supervised Models: A Comparative Study (2023)8.35
- Efficient Infusion Of Self-supervised Representations In Automatic Speech Recognition (2024)0.00
- Boosting Cross-domain Speech Recognition With Self-supervision (2022)0.00
- Multi-variant Consistency Based Self-supervised Learning For Robust Automatic Speech Recognition (2021)0.00
- Investigating Self-supervised Learning For Speech Enhancement And Separation (2022)13.44
- Exploration Of Efficient End-to-end ASR Using Discretized Input From Self-supervised Learning (2023)12.02
- Comparing Self-supervised Learning Models Pre-trained On Human Speech And Animal Vocalizations For Bioacoustics Processing (2025)5.24