What Shall We Do With An Hour Of Data? Speech Recognition For The Un- And Under-served Languages Of Common Voice
2021 Β· Francis M. Tyers, Josh Meyer
Abstract
This technical report describes the methods and results of a three-week sprint to produce deployable speech recognition models for 31 under-served languages of the Common Voice project. We outline the preprocessing steps, hyperparameter selection, and resulting accuracy on official testing sets. In addition to this we evaluate the models on multiple tasks: closed-vocabulary speech recognition, pre-transcription, forced alignment, and key-word spotting. The following experiments use Coqui STT, a toolkit for training and deployment of neural Speech-to-Text models.
Authors
(none)
Tags
Stats
Related papers
- Speech2phone: A Novel And Efficient Method For Training Speaker Recognition Models (2020)2.26
- Less Is More: Accurate Speech Recognition & Translation Without Web-scale Data (2024)0.00
- Spgispeech: 5,000 Hours Of Transcribed Financial Audio For Fully Formatted End-to-end Speech Recognition (2021)0.00
- Leveraging Translations For Speech Transcription In Low-resource Settings (2018)6.77
- Pretraining Approaches For Spoken Language Recognition: Taltech Submission To The OLR 2021 Challenge (2022)6.34
- Automatic Speech Recognition Advancements For Indigenous Languages Of The Americas (2024)3.58
- Google Crowdsourced Speech Corpora And Related Open-source Resources For Low-resource Languages And Dialects: An Overview (2020)0.00
- Squid: Measuring Speech Naturalness In Many Languages (2022)9.41