One Model To Pronounce Them All: Multilingual Grapheme-to-phoneme Conversion With A Transformer Ensemble
2020 Β· Kaili Vesik, Muhammad Abdul-Mageed, Miikka Silfverberg
Abstract
The task of grapheme-to-phoneme (G2P) conversion is important for both speech recognition and synthesis. Similar to other speech and language processing tasks, in a scenario where only small-sized training data are available, learning G2P models is challenging. We describe a simple approach of exploiting model ensembles, based on multilingual Transformers and self-training, to develop a highly effective G2P solution for 15 languages. Our models are developed as part of our participation in the SIGMORPHON 2020 Shared Task 1 focused at G2P. Our best models achieve 14.99 word error rate (WER) and 3.30 phoneme error rate (PER), a sizeable improvement over the shared task competitive baselines.
Authors
(none)
Tags
Stats
Related papers
- Massively Multilingual Neural Grapheme-to-phoneme Conversion (2017)9.76
- Transformer Based Grapheme-to-phoneme Conversion (2020)11.39
- Token-level Ensemble Distillation For Grapheme-to-phoneme Conversion (2019)10.35
- Liteg2p: A Fast, Light And High Accuracy Model For Grapheme-to-phoneme Conversion (2023)5.84
- Data-driven Grapheme-to-phoneme Representations For A Lexicon-free Text-to-speech (2024)4.52
- Improving Grapheme-to-phoneme Conversion Through In-context Knowledge Retrieval With Large Language Models (2024)2.26
- R-g2p: Evaluating And Enhancing Robustness Of Grapheme To Phoneme Conversion By Controlled Noise Introducing And Contextual Information Incorporation (2022)7.50
- G2G: Tts-driven Pronunciation Learning For Graphemic Hybrid ASR (2019)8.35