An Objective Evaluation Of The Effects Of Recording Conditions And Speaker Characteristics In Multi-speaker Deep Neural Speech Synthesis
2021 Β· Beata Lorincz, Adriana Stan, Mircea Giurgiu
Abstract
Multi-speaker spoken datasets enable the creation of text-to-speech synthesis (TTS) systems which can output several voice identities. The multi-speaker (MSPK) scenario also enables the use of fewer training samples per speaker. However, in the resulting acoustic model, not all speakers exhibit the same synthetic quality, and some of the voice identities cannot be used at all. In this paper we evaluate the influence of the recording conditions, speaker gender, and speaker particularities over the quality of the synthesised output of a deep neural TTS architecture, namely Tacotron2. The evaluation is possible due to the use of a large Romanian parallel spoken corpus containing over 81 hours of data. Within this setup, we also evaluate the influence of different types of text representations: orthographic, phonetic, and phonetic extended with syllable boundaries and lexical stress markings. We evaluate the results of the MSPK system using the objective measures of equal error rate (E
Authors
(none)
Tags
Stats
Related papers
- Comparison Of Speech Representations For Automatic Quality Estimation In Multi-speaker Text-to-speech Synthesis (2020)0.00
- Deep Voice 2: Multi-speaker Neural Text-to-speech (2017)0.00
- Modeling Multi-speaker Latent Space To Improve Neural TTS: Quick Enrolling New Speaker And Enhancing Premium Voice (2018)0.00
- Training Multi-speaker Neural Text-to-speech Systems Using Speaker-imbalanced Speech Corpora (2019)8.09
- Improving The Quality Of Neural TTS Using Long-form Content And Multi-speaker Multi-style Modeling (2022)3.58
- Combining Speakers Of Multiple Languages To Improve Quality Of Neural Voices (2021)5.24
- Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios (2021)6.77
- Msdtron: A High-capability Multi-speaker Speech Synthesis System For Diverse Data Using Characteristic Information (2021)4.52