ASR Performance Prediction On Unseen Broadcast Programs Using Convolutional Neural Networks
2018 Β· Zied Elloumi, Laurent Besacier, Olivier Galibert, et al.
Abstract
In this paper, we address a relatively new task: prediction of ASR performance on unseen broadcast programs. We first propose an heterogenous French corpus dedicated to this task. Two prediction approaches are compared: a state-of-the-art performance prediction based on regression (engineered features) and a new strategy based on convolutional neural networks (learnt features). We particularly focus on the combination of both textual (ASR transcription) and signal inputs. While the joint use of textual and signal features did not work for the regression baseline, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably predicts the WER distribution on a collection of speech recordings.
Authors
(none)
Tags
Stats
Related papers
- Analyzing Learned Representations Of A Deep ASR Performance Prediction Model (2018)4.52
- Predicting Word Error Rate For Reverberant Speech (2019)7.16
- ASAPP-ASR: Multistream CNN And Self-attentive SRU For SOTA Speech Recognition (2020)9.03
- Improving RNN Transducer Based ASR With Auxiliary Tasks (2020)9.59
- A Comparison Of Semi-supervised Learning Techniques For Streaming ASR At Scale (2023)2.26
- CTA-RNN: Channel And Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings For Speech Emotion Recognition (2022)5.84
- Multilingual Audio-visual Speech Recognition With Hybrid CTC/RNN-T Fast Conformer (2024)8.60
- Analyzing Large Receptive Field Convolutional Networks For Distant Speech Recognition (2019)5.84