Analyzing Phonetic And Graphemic Representations In End-to-end Automatic Speech Recognition
2019 Β· Yonatan Belinkov, Ahmed Ali, James Glass
Abstract
End-to-end neural network systems for automatic speech recognition (ASR) are trained from acoustic features to text transcriptions. In contrast to modular ASR systems, which contain separately-trained components for acoustic modeling, pronunciation lexicon, and language modeling, the end-to-end paradigm is both conceptually simpler and has the potential benefit of training the entire system on the end task. However, such neural network models are more opaque: it is not clear how to interpret the role of different parts of the network and what information it learns during training. In this paper, we analyze the learned internal representations in an end-to-end ASR model. We evaluate the representation quality in terms of several classification tasks, comparing phonemes and graphemes, as well as different articulatory features. We study two languages (English and Arabic) and three datasets, finding remarkable consistency in how different properties are represented in different layers of
Authors
(none)
Tags
Stats
Related papers
- Analyzing Hidden Representations In End-to-end Automatic Speech Recognition Systems (2017)0.00
- What Does A Network Layer Hear? Analyzing Hidden Representations Of End-to-end ASR Through Speech Synthesis (2019)9.41
- Visualizing Automatic Speech Recognition -- Means For A Better Understanding? (2022)4.52
- Phonetic And Graphemic Systems For Multi-genre Broadcast Transcription (2018)7.81
- A Systematic Comparison Of Grapheme-based Vs. Phoneme-based Label Units For Encoder-decoder-attention Models (2020)0.00
- A Comparison Of End-to-end Models For Long-form Speech Recognition (2019)12.93
- A Two-stage Transliteration Approach To Improve Performance Of A Multilingual ASR (2024)0.00
- Analyzing Analytical Methods: The Case Of Phonology In Neural Models Of Spoken Language (2020)6.77