Challenging The Boundaries Of Speech Recognition: The MALACH Corpus
2019 · Michael Picheny, Zóltan Tüske, Brian Kingsbury, et al.
Abstract
There has been huge progress in speech recognition over the last several years. Tasks once thought extremely difficult, such as SWITCHBOARD, now approach levels of human performance. The MALACH corpus (LDC catalog LDC2012S05), a 375-Hour subset of a large archive of Holocaust testimonies collected by the Survivors of the Shoah Visual History Foundation, presents significant challenges to the speech community. The collection consists of unconstrained, natural speech filled with disfluencies, heavy accents, age-related coarticulations, un-cued speaker and language switching, and emotional speech - all still open problems for speech recognition systems. Transcription is challenging even for skilled human annotators. This paper proposes that the community place focus on the MALACH corpus to develop speech recognition systems that are more robust with respect to accents, disfluencies and emotional speech. To reduce the barrier for entry, a lexicon and training and testing setups have been c
Authors
(none)
Tags
Stats
Related papers
- TALCS: An Open-source Mandarin-english Code-switching Corpus And A Speech Recognition Baseline (2022)5.84
- Merlion CCS Challenge: A English-mandarin Code-switching Child-directed Speech Corpus For Language Identification And Diarization (2023)0.00
- Transformer-based Automatic Speech Recognition Of Formal And Colloquial Czech In MALACH Project (2022)3.58
- MSR-86K: An Evolving, Multilingual Corpus With 86,300 Hours Of Transcribed Audio For Speech Recognition Research (2024)4.52
- DISPLACE Challenge: Diarization Of Speaker And Language In Conversational Environments (2023)0.00
- Multi-staged Cross-lingual Acoustic Model Adaption For Robust Speech Recognition In Real-world Applications -- A Case Study On German Oral History Interviews (2020)0.00
- Exploring Retraining-free Speech Recognition For Intra-sentential Code-switching (2021)5.84
- Libriheavymix: A 20,000-hour Dataset For Single-channel Reverberant Multi-talker Speech Separation, ASR And Speaker Diarization (2024)5.24