A Toolbox For Construction And Analysis Of Speech Datasets
2021 Β· Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg
Abstract
Automatic Speech Recognition and Text-to-Speech systems are primarily trained in a supervised fashion and require high-quality, accurately labeled speech datasets. In this work, we examine common problems with speech data and introduce a toolbox for the construction and interactive error analysis of speech datasets. The construction tool is based on K\"urzinger et al. work, and, to the best of our knowledge, the dataset exploration tool is the world's first open-source tool of this kind. We demonstrate how to apply these tools to create a Russian speech dataset and analyze existing speech datasets (Multilingual LibriSpeech, Mozilla Common Voice). The tools are open sourced as a part of the NeMo framework.
Authors
(none)
Tags
Stats
Related papers
- An Automated End-to-end Open-source Software For High-quality Text-to-speech Dataset Generation (2024)0.00
- Crowdspeech And Voxdiy: Benchmark Datasets For Crowdsourced Audio Transcription (2021)0.00
- Framework For Curating Speech Datasets And Evaluating ASR Systems: A Case Study For Polish (2024)2.16
- Sd-eval: A Benchmark Dataset For Spoken Dialogue Understanding Beyond Words (2024)11.32
- The People's Speech: A Large-scale Diverse English Speech Recognition Dataset For Commercial Usage (2021)0.00
- SOMOS: The Samsung Open MOS Dataset For The Evaluation Of Neural Text-to-speech Synthesis (2022)10.74
- Speechcraft: A Fine-grained Expressive Speech Dataset With Natural Language Description (2024)7.81
- Speech Commands: A Dataset For Limited-vocabulary Speech Recognition (2018)0.00