Transformer Based Grapheme-to-phoneme Conversion
2020 · Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth
Abstract
Attention mechanism is one of the most successful techniques in deep learning based Natural Language Processing (NLP). The transformer network architecture is completely based on attention mechanisms, and it outperforms sequence-to-sequence models in neural machine translation without recurrent and convolutional layers. Grapheme-to-phoneme (G2P) conversion is a task of converting letters (grapheme sequence) to their pronunciations (phoneme sequence). It plays a significant role in text-to-speech (TTS) and automatic speech recognition (ASR) systems. In this paper, we investigate the application of transformer architecture to G2P conversion and compare its performance with recurrent and convolutional neural network based approaches. Phoneme and word error rates are evaluated on the CMUDict dataset for US English and the NetTalk dataset. The results show that transformer based G2P outperforms the convolutional-based approach in terms of word error rate and our results significantly exceed
Authors
(none)
Tags
Stats
Related papers
- One Model To Pronounce Them All: Multilingual Grapheme-to-phoneme Conversion With A Transformer Ensemble (2020)0.00
- Massively Multilingual Neural Grapheme-to-phoneme Conversion (2017)9.76
- Liteg2p: A Fast, Light And High Accuracy Model For Grapheme-to-phoneme Conversion (2023)5.84
- Graphspeech: Syntax-aware Graph Attention Network For Neural Speech Synthesis (2020)7.50
- R-g2p: Evaluating And Enhancing Robustness Of Grapheme To Phoneme Conversion By Controlled Noise Introducing And Contextual Information Incorporation (2022)7.50
- Token-level Ensemble Distillation For Grapheme-to-phoneme Conversion (2019)10.35
- Improving Grapheme-to-phoneme Conversion Through In-context Knowledge Retrieval With Large Language Models (2024)2.26
- Transformer-transducer: End-to-end Speech Recognition With Self-attention (2019)0.00