Data-driven Grapheme-to-phoneme Representations For A Lexicon-free Text-to-speech
2024 Β· Abhinav Garg, Jiyeon Kim, Sushil Khyalia, et al.
Abstract
Grapheme-to-Phoneme (G2P) is an essential first step in any modern, high-quality Text-to-Speech (TTS) system. Most of the current G2P systems rely on carefully hand-crafted lexicons developed by experts. This poses a two-fold problem. Firstly, the lexicons are generated using a fixed phoneme set, usually, ARPABET or IPA, which might not be the most optimal way to represent phonemes for all languages. Secondly, the man-hours required to produce such an expert lexicon are very high. In this paper, we eliminate both of these issues by using recent advances in self-supervised learning to obtain data-driven phoneme representations instead of fixed representations. We compare our lexicon-free approach against strong baselines that utilize a well-crafted lexicon. Furthermore, we show that our data-driven lexicon-free method performs as good or even marginally better than the conventional rule-based or lexicon-based neural G2Ps in terms of Mean Opinion Score (MOS) while using no prior language
Authors
(none)
Tags
Stats
Related papers
- Massively Multilingual Neural Grapheme-to-phoneme Conversion (2017)9.76
- Liteg2p: A Fast, Light And High Accuracy Model For Grapheme-to-phoneme Conversion (2023)5.84
- One Model To Pronounce Them All: Multilingual Grapheme-to-phoneme Conversion With A Transformer Ensemble (2020)0.00
- G2G: Tts-driven Pronunciation Learning For Graphemic Hybrid ASR (2019)8.35
- Improving Grapheme-to-phoneme Conversion Through In-context Knowledge Retrieval With Large Language Models (2024)2.26
- R-g2p: Evaluating And Enhancing Robustness Of Grapheme To Phoneme Conversion By Controlled Noise Introducing And Contextual Information Incorporation (2022)7.50
- Token-level Ensemble Distillation For Grapheme-to-phoneme Conversion (2019)10.35
- Transformer Based Grapheme-to-phoneme Conversion (2020)11.39