Teles: Temporal Lexeme Similarity Score To Estimate Confidence In End-to-end ASR
2024 Β· Nagarathna Ravi, Thishyan Raj T, Vipul Arora
Abstract
Confidence estimation of predictions from an End-to-End (E2E) Automatic Speech Recognition (ASR) model benefits ASR's downstream and upstream tasks. Class-probability-based confidence scores do not accurately represent the quality of overconfident ASR predictions. An ancillary Confidence Estimation Model (CEM) calibrates the predictions. State-of-the-art (SOTA) solutions use binary target scores for CEM training. However, the binary labels do not reveal the granular information of predicted words, such as temporal alignment between reference and hypothesis and whether the predicted word is entirely incorrect or contains spelling errors. Addressing this issue, we propose a novel Temporal-Lexeme Similarity (TeLeS) confidence score to train CEM. To address the data imbalance of target scores while training CEM, we use shrinkage loss to focus on hard-to-learn data points and minimise the impact of easily learned data points. We conduct experiments with ASR models trained in three languages
Authors
(none)
Tags
Stats
Related papers
- Confidence Estimation For Attention-based Sequence-to-sequence Models For Speech Recognition (2020)11.49
- Accurate And Reliable Confidence Estimation Based On Non-autoregressive End-to-end Speech Recognition System (2023)4.52
- An Evaluation Of Word-level Confidence Estimation For End-to-end Automatic Speech Recognition (2021)0.00
- Multi-task Learning For End-to-end ASR Word And Utterance Confidence With Deletion Prediction (2021)7.50
- Sequence-level Confidence Classifier For ASR Utterance Accuracy And Application To Acoustic Models (2021)5.24
- Fast Entropy-based Methods Of Word-level Confidence Estimation For End-to-end Automatic Speech Recognition (2022)7.16
- Improving Tail Performance Of A Deliberation E2E ASR Model Using A Large Text Corpus (2020)10.21
- Semantic-aware Confidence Calibration For Automated Audio Captioning (2025)0.00