Improving Grapheme-to-phoneme Conversion Through In-context Knowledge Retrieval With Large Language Models
2024 Β· Dongrui Han, Mingyu Cui, Jiawen Kang, et al.
Abstract
Grapheme-to-phoneme (G2P) conversion is a crucial step in Text-to-Speech (TTS) systems, responsible for mapping grapheme to corresponding phonetic representations. However, it faces ambiguities problems where the same grapheme can represent multiple phonemes depending on contexts, posing a challenge for G2P conversion. Inspired by the remarkable success of Large Language Models (LLMs) in handling context-aware scenarios, contextual G2P conversion systems with LLMs' in-context knowledge retrieval (ICKR) capabilities are proposed to promote disambiguation capability. The efficacy of incorporating ICKR into G2P conversion systems is demonstrated thoroughly on the Librig2p dataset. In particular, the best contextual G2P conversion system using ICKR outperforms the baseline with weighted average phoneme error rate (PER) reductions of 2.0% absolute (28.9% relative). Using GPT-4 in the ICKR system can increase of 3.5% absolute (3.8% relative) on the Librig2p dataset.
Authors
(none)
Tags
Stats
Related papers
- R-g2p: Evaluating And Enhancing Robustness Of Grapheme To Phoneme Conversion By Controlled Noise Introducing And Contextual Information Incorporation (2022)7.50
- Liteg2p: A Fast, Light And High Accuracy Model For Grapheme-to-phoneme Conversion (2023)5.84
- Massively Multilingual Neural Grapheme-to-phoneme Conversion (2017)9.76
- Data-driven Grapheme-to-phoneme Representations For A Lexicon-free Text-to-speech (2024)4.52
- One Model To Pronounce Them All: Multilingual Grapheme-to-phoneme Conversion With A Transformer Ensemble (2020)0.00
- External Knowledge Augmented Polyphone Disambiguation Using Large Language Model (2023)0.00
- G2G: Tts-driven Pronunciation Learning For Graphemic Hybrid ASR (2019)8.35
- Transformer Based Grapheme-to-phoneme Conversion (2020)11.39