Phonetic And Graphemic Systems For Multi-genre Broadcast Transcription
2018 Β· Yu Wang, Xie Chen, Mark Gales, et al.
Abstract
State-of-the-art English automatic speech recognition systems typically use phonetic rather than graphemic lexicons. Graphemic systems are known to perform less well for English as the mapping from the written form to the spoken form is complicated. However, in recent years the representational power of deep-learning based acoustic models has improved, raising interest in graphemic acoustic models for English, due to the simplicity of generating the lexicon. In this paper, phonetic and graphemic models are compared for an English Multi-Genre Broadcast transcription task. A range of acoustic models based on lattice-free MMI training are constructed using phonetic and graphemic lexicons. For this task, it is found that having a long-span temporal history reduces the difference in performance between the two forms of models. In addition, system combination is examined, using parameter smoothing and hypothesis combination. As the combination approaches become more complicated the differenc
Authors
(none)
Tags
Stats
Related papers
- Analyzing Phonetic And Graphemic Representations In End-to-end Automatic Speech Recognition (2019)9.23
- G2G: Tts-driven Pronunciation Learning For Graphemic Hybrid ASR (2019)8.35
- On The Choice Of Modeling Unit For Sequence-to-sequence Speech Recognition (2019)9.59
- A Systematic Comparison Of Grapheme-based Vs. Phoneme-based Label Units For Encoder-decoder-attention Models (2020)0.00
- From Senones To Chenones: Tied Context-dependent Graphemes For Hybrid Speech Recognition (2019)0.00
- A Two-stage Transliteration Approach To Improve Performance Of A Multilingual ASR (2024)0.00
- English Accent Accuracy Analysis In A State-of-the-art Automatic Speech Recognition System (2021)0.00
- Combining Frame-synchronous And Label-synchronous Systems For Speech Recognition (2021)0.00