Human And Automatic Speech Recognition Performance On German Oral History Interviews
2022 · Michael Gref, Nike Matthiesen, Christoph Schmidt, et al.
Abstract
Automatic speech recognition systems have accomplished remarkable improvements in transcription accuracy in recent years. On some domains, models now achieve near-human performance. However, transcription performance on oral history has not yet reached human accuracy. In the present work, we investigate how large this gap between human and machine transcription still is. For this purpose, we analyze and compare transcriptions of three humans on a new oral history data set. We estimate a human word error rate of 8.7% for recent German oral history interviews with clean acoustic conditions. For comparison with recent machine transcription accuracy, we present experiments on the adaptation of an acoustic model achieving near-human performance on broadcast speech. We investigate the influence of different adaptation data on robustness and generalization for clean and noisy oral history interviews. We optimize our acoustic models by 5 to 8% relative for this task and achieve 23.9% WER on no
Authors
(none)
Tags
Stats
Related papers
- Multi-staged Cross-lingual Acoustic Model Adaption For Robust Speech Recognition In Real-world Applications -- A Case Study On German Oral History Interviews (2020)0.00
- A Comparative Analysis Of Bilingual And Trilingual Wav2vec Models For Automatic Speech Recognition In Multilingual Oral History Archives (2024)3.58
- Exploring Methods For The Automatic Detection Of Errors In Manual Transcription (2019)0.00
- Generating Human Readable Transcript For Automatic Speech Recognition With Pre-trained Language Model (2021)0.00
- Towards The Evaluation Of Automatic Simultaneous Speech Translation From A Communicative Perspective (2021)9.41
- Robustness Of End-to-end Automatic Speech Recognition Models -- A Case Study Using Mozilla Deepspeech (2021)0.00
- English Accent Accuracy Analysis In A State-of-the-art Automatic Speech Recognition System (2021)0.00
- Performance Improvements Of Probabilistic Transcript-adapted ASR With Recurrent Neural Network And Language-specific Constraints (2016)0.00