Automatic Speech Recognition System-independent Word Error Rate Estimation
2024 Β· Chanho Park, Mingjie Chen, Thomas Hain
Abstract
Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems. In many applications, it is of interest to estimate WER given a pair of a speech utterance and a transcript. Previous work on WER estimation focused on building models that are trained with a specific ASR system in mind (referred to as ASR system-dependent). These are also domain-dependent and inflexible in real-world applications. In this paper, a hypothesis generation method for ASR System-Independent WER estimation (SIWE) is proposed. In contrast to prior work, the WER estimators are trained using data that simulates ASR system output. Hypotheses are generated using phonetically similar or linguistically more likely alternative words. In WER estimation experiments, the proposed method reaches a similar performance to ASR system-dependent WER estimators on in-domain data and achieves state-of-the-art performance on out-of-domain data. On the out-of-
Authors
(none)
Tags
Stats
Related papers
- Fast Word Error Rate Estimation Using Self-supervised Representations For Speech And Text (2023)5.24
- Semantic-wer: A Unified Metric For The Evaluation Of ASR Transcript For End Usability (2021)0.00
- Beyond Levenshtein: Leveraging Multiple Algorithms For Robust Word Error Rate Computations And Granular Error Classifications (2024)2.26
- On Word Error Rate Definitions And Their Efficient Computation For Multi-speaker Speech Recognition Systems (2022)9.76
- Predicting Word Error Rate For Reverberant Speech (2019)7.16
- WER-BERT: Automatic WER Estimation With BERT In A Balanced Ordinal Classification Paradigm (2021)0.00
- Minimum Word Error Rate Training For Attention-based Sequence-to-sequence Models (2017)14.35
- On The Impact Of Word Error Rate On Acoustic-linguistic Speech Emotion Recognition: An Update For The Deep Learning Era (2021)0.00