Toward Practical Automatic Speech Recognition And Post-processing: A Call For Explainable Error Benchmark Guideline
2024 Β· Seonmin Koo, Chanjun Park, Jinsung Kim, et al.
Abstract
Automatic speech recognition (ASR) outcomes serve as input for downstream tasks, substantially impacting the satisfaction level of end-users. Hence, the diagnosis and enhancement of the vulnerabilities present in the ASR model bear significant importance. However, traditional evaluation methodologies of ASR systems generate a singular, composite quantitative metric, which fails to provide comprehensive insight into specific vulnerabilities. This lack of detail extends to the post-processing stage, resulting in further obfuscation of potential weaknesses. Despite an ASR model's ability to recognize utterances accurately, subpar readability can negatively affect user satisfaction, giving rise to a trade-off between recognition accuracy and user-friendliness. To effectively address this, it is imperative to consider both the speech-level, crucial for recognition accuracy, and the text-level, critical for user-friendliness. Consequently, we propose the development of an Error Explainable B
Authors
(none)
Tags
Stats
Related papers
- Speechcolab Leaderboard: An Open-source Platform For Automatic Speech Recognition Evaluation (2024)9.05
- Open ASR Leaderboard: Towards Reproducible And Transparent Multilingual And Long-form Speech Recognition Evaluation (2025)0.00
- Acoustics-guided Evaluation (AGE): A New Measure For Estimating Performance Of Speech Enhancement Algorithms For Robust ASR (2018)0.00
- How Bad Are Artifacts?: Analyzing The Impact Of Speech Enhancement Errors On ASR (2022)13.17
- Cross-modal ASR Post-processing System For Error Correction And Utterance Rejection (2022)0.00
- ASR-GLUE: A New Multi-task Benchmark For Asr-robust Natural Language Understanding (2021)0.00
- ASR Error Management For Improving Spoken Language Understanding (2017)9.92
- Rethinking Processing Distortions: Disentangling The Impact Of Speech Enhancement Errors On Speech Recognition Performance (2024)8.35