Kt-speech-crawler: Automatic Dataset Construction For Speech Recognition From Youtube Videos
2019 Β· Egor Lakomkin, Sven Magg, Cornelius Weber, et al.
Abstract
In this paper, we describe KT-Speech-Crawler: an approach for automatic dataset construction for speech recognition by crawling YouTube videos. We outline several filtering and post-processing steps, which extract samples that can be used for training end-to-end neural speech recognition systems. In our experiments, we demonstrate that a single-core version of the crawler can obtain around 150 hours of transcribed speech within a day, containing an estimated 3.5% word error rate in the transcriptions. Automatically collected samples contain reading and spontaneous speech recorded in various conditions including background noise and music, distant microphone recordings, and a variety of accents and reverberation. When training a deep neural network on speech recognition, we observed around 40% word error rate reduction on the Wall Street Journal dataset by integrating 200 hours of the collected samples into the training set. The demo (http://emnlp-demo.lakomkin.me/) and the crawler code
Authors
(none)
Tags
Stats
Related papers
- Jtubespeech: Corpus Of Japanese Speech Collected From Youtube For Speech Recognition And Speaker Verification (2021)0.00
- Voxlingua107: A Dataset For Spoken Language Recognition (2020)14.15
- Gigaspeech 2: An Evolving, Large-scale And Multi-domain ASR Corpus For Low-resource Languages With Automated Crawling, Transcription And Refinement (2024)0.00
- Crowdspeech And Voxdiy: Benchmark Datasets For Crowdsourced Audio Transcription (2021)0.00
- Spot The Conversation: Speaker Diarisation In The Wild (2020)15.31
- Recurrent Neural Network Transducer For Audio-visual Speech Recognition (2019)0.00
- MSR-86K: An Evolving, Multilingual Corpus With 86,300 Hours Of Transcribed Audio For Speech Recognition Research (2024)4.52
- Transcription And Translation Of Videos Using Fine-tuned XLSR Wav2vec2 On Custom Dataset And Mbart (2024)0.00