Unsupervised Automatic Speech Recognition: A Review
2021 Β· Hanan Aldarmaki, Asad Ullah, Nazar Zaki
Abstract
Automatic Speech Recognition (ASR) systems can be trained to achieve remarkable performance given large amounts of manually transcribed speech, but large labeled data sets can be difficult or expensive to acquire for all languages of interest. In this paper, we review the research literature to identify models and ideas that could lead to fully unsupervised ASR, including unsupervised segmentation of the speech signal, unsupervised mapping from speech segments to text, and semi-supervised models with nominal amounts of labeled examples. The objective of the study is to identify the limitations of what can be learned from speech data alone and to understand the minimum requirements for speech recognition. Identifying these limitations would help optimize the resources and efforts in ASR development for low-resource languages.
Authors
(none)
Tags
Stats
Related papers
- Towards Unsupervised Speech Recognition Without Pronunciation Models (2024)0.00
- Towards Unsupervised Automatic Speech Recognition Trained By Unaligned Speech And Text Only (2018)0.00
- Bigssl: Exploring The Frontier Of Large-scale Semi-supervised Learning For Automatic Speech Recognition (2021)15.73
- Unsupervised Active Learning: Optimizing Labeling Cost-effectiveness For Automatic Speech Recognition (2023)0.00
- REBORN: Reinforcement-learned Boundary Segmentation With Iterative Training For Unsupervised ASR (2024)2.26
- Visualizing Automatic Speech Recognition -- Means For A Better Understanding? (2022)4.52
- Leveraging Data Collection And Unsupervised Learning For Code-switched Tunisian Arabic Automatic Speech Recognition (2023)6.77
- Almost Unsupervised Text To Speech And Automatic Speech Recognition (2019)0.00