Data Processing For Optimizing Naturalness Of Vietnamese Text-to-speech System
2020 Β· Viet Lam Phung, Phan Huy Kinh, Anh Tuan Dinh, et al.
Abstract
Abstract End-to-end text-to-speech (TTS) systems has proved its great success in the presence of a large amount of high-quality training data recorded in anechoic room with high-quality microphone. Another approach is to use available source of found data like radio broadcast news. We aim to optimize the naturalness of TTS system on the found data using a novel data processing method. The data processing method includes 1) utterance selection and 2) prosodic punctuation insertion to prepare training data which can optimize the naturalness of TTS systems. We showed that using the processing data method, an end-to-end TTS achieved a mean opinion score (MOS) of 4.1 compared to 4.3 of natural speech. We showed that the punctuation insertion contributed the most to the result. To facilitate the research and development of TTS systems, we distributed the processed data of one speaker at https://forms.gle/6Hk5YkqgDxAaC2BU6.
Authors
(none)
Tags
Stats
Related papers
- Empowering Global Voices: A Data-efficient, Phoneme-tone Adaptive Approach To High-fidelity Speech Synthesis (2025)0.00
- Low-resource Expressive Text-to-speech Using Data Augmentation (2020)11.29
- Naturalspeech: End-to-end Text To Speech Synthesis With Human-level Quality (2022)16.32
- Text Enhancement For Paragraph Processing In End-to-end Code-switching TTS (2022)0.00
- Non-autoregressive TTS With Explicit Duration Modelling For Low-resource Highly Expressive Speech (2021)8.82
- An Automated End-to-end Open-source Software For High-quality Text-to-speech Dataset Generation (2024)0.00
- Low-resource Mongolian Speech Synthesis Based On Automatic Prosody Annotation (2022)0.00
- You Do Not Need More Data: Improving End-to-end Speech Recognition By Text-to-speech Data Augmentation (2020)11.49