A Reference-less Quality Metric For Automatic Speech Recognition Via Contrastive-learning Of A Multi-language Model With Self-supervision
2023 Β· Kamer Ali Yuksel, Thiago Ferreira, Ahmet Gunduz, et al.
Abstract
The common standard for quality evaluation of automatic speech recognition (ASR) systems is reference-based metrics such as the Word Error Rate (WER), computed using manual ground-truth transcriptions that are time-consuming and expensive to obtain. This work proposes a multi-language referenceless quality metric, which allows comparing the performance of different ASR models on a speech dataset without ground truth transcriptions. To estimate the quality of ASR hypotheses, a pre-trained language model (LM) is fine-tuned with contrastive learning in a self-supervised learning manner. In experiments conducted on several unseen test datasets consisting of outputs from top commercial ASR engines in various languages, the proposed referenceless metric obtains a much higher correlation with WER scores and their ranks than the perplexity metric from the state-of-art multi-lingual LM in all experiments, and also reduces WER by more than \(7%\) when used for ensembling hypotheses. The fine-tun
Authors
(none)
Tags
Stats
Related papers
- Norefer: A Referenceless Quality Metric For Automatic Speech Recognition Via Semi-supervised Language Model Fine-tuning With Contrastive Learning (2023)0.00
- Evaluating User Perception Of Speech Recognition System Quality With Semantic Distance Metric (2021)6.77
- ML-LMCL: Mutual Learning And Large-margin Contrastive Learning For Improving ASR Robustness In Spoken Language Understanding (2023)0.00
- Metricnet: Towards Improved Modeling For Non-intrusive Speech Quality Assessment (2021)0.00
- Semantic-wer: A Unified Metric For The Evaluation Of ASR Transcript For End Usability (2021)0.00
- Contrastive Learning For Improving ASR Robustness In Spoken Language Understanding (2022)6.34
- Automatic Quality Assessment For Speech Translation Using Joint ASR And MT Features (2016)4.52
- H_eval: A New Hybrid Evaluation Metric For Automatic Speech Recognition Tasks (2022)6.34