A Baseline Model For Computationally Inexpensive Speech Recognition For Kazakh Using The Coqui STT Framework
2021 Β· Ilnar Salimzianov
Abstract
Mobile devices are transforming the way people interact with computers, and speech interfaces to applications are ever more important. Automatic Speech Recognition systems recently published are very accurate, but often require powerful machinery (specialised Graphical Processing Units) for inference, which makes them impractical to run on commodity devices, especially in streaming mode. Impressed by the accuracy of, but dissatisfied with the inference times of the baseline Kazakh ASR model of (Khassanov et al.,2021) when not using a GPU, we trained a new baseline acoustic model (on the same dataset as the aforementioned paper) and three language models for use with the Coqui STT framework. Results look promising, but further epochs of training and parameter sweeping or, alternatively, limiting the vocabulary that the ASR system must support, is needed to reach a production-level accuracy.
Authors
(none)
Tags
Stats
Related papers
- A Crowdsourced Open-source Kazakh Speech Corpus And Initial Speech Recognition Baseline (2020)10.85
- A Study Of Multilingual End-to-end Speech Recognition For Kazakh, Russian, And English (2021)8.35
- Gated Low-rank Adaptation For Personalized Code-switching Automatic Speech Recognition On The Low-spec Devices (2024)0.00
- What Shall We Do With An Hour Of Data? Speech Recognition For The Un- And Under-served Languages Of Common Voice (2021)0.00
- Personalized Speech Recognition On Mobile Devices (2016)15.37
- Strategies For Improving Low Resource Speech To Text Translation Relying On Pre-trained ASR Models (2023)5.24
- Allost: Low-resource Speech Translation Without Source Transcription (2021)7.81
- Dyn-asr: Compact, Multilingual Speech Recognition Via Spoken Language And Accent Identification (2021)5.24