Jtubespeech: Corpus Of Japanese Speech Collected From Youtube For Speech Recognition And Speaker Verification
2021 · Shinnosuke Takamichi, Ludwig Kürzinger, Takaaki Saeki, et al.
Abstract
In this paper, we construct a new Japanese speech corpus called "JTubeSpeech." Although recent end-to-end learning requires large-size speech corpora, open-sourced such corpora for languages other than English have not yet been established. In this paper, we describe the construction of a corpus from YouTube videos and subtitles for speech recognition and speaker verification. Our method can automatically filter the videos and subtitles with almost no language-dependent processes. We consistently employ Connectionist Temporal Classification (CTC)-based techniques for automatic speech recognition (ASR) and a speaker variation-based method for automatic speaker verification (ASV). We build 1) a large-scale Japanese ASR benchmark with more than 1,300 hours of data and 2) 900 hours of data for Japanese ASV.
Authors
(none)
Tags
Stats
Related papers
- JVS Corpus: Free Japanese Multi-speaker Voice Corpus (2019)0.00
- Kt-speech-crawler: Automatic Dataset Construction For Speech Recognition From Youtube Videos (2019)11.18
- Gigaspeech 2: An Evolving, Large-scale And Multi-domain ASR Corpus For Low-resource Languages With Automated Crawling, Transcription And Refinement (2024)0.00
- A Comparative Study On Neural Architectures And Training Methods For Japanese Speech Recognition (2021)7.50
- MSR-86K: An Evolving, Multilingual Corpus With 86,300 Hours Of Transcribed Audio For Speech Recognition Research (2024)4.52
- Advances In Joint Ctc-attention Based End-to-end Speech Recognition With A Deep CNN Encoder And RNN-LM (2017)16.49
- ANIM-400K: A Large-scale Dataset For Automated End-to-end Dubbing Of Video (2024)8.65
- End-to-end Multimodal Speech Recognition (2018)10.21