Applying Syntax\(\unicode{x2013}\)prosody Mapping Hypothesis And Prosodic Well-formedness Constraints To Neural Sequence-to-sequence Speech Synthesis
2022 Β· Kei Furukawa, Takeshi Kishiyama, Satoshi Nakamura
Abstract
End-to-end text-to-speech synthesis (TTS), which generates speech sounds directly from strings of texts or phonemes, has improved the quality of speech synthesis over the conventional TTS. However, most previous studies have been evaluated based on subjective naturalness and have not objectively examined whether they can reproduce pitch patterns of phonological phenomena such as downstep, rhythmic boost, and initial lowering that reflect syntactic structures in Japanese. These phenomena can be linguistically explained by phonological constraints and the syntax\(\unicode\{x2013\}\)prosody mapping hypothesis (SPMH), which assumes projections from syntactic structures to phonological hierarchy. Although some experiments in psycholinguistics have verified the validity of the SPMH, it is crucial to investigate whether it can be implemented in TTS. To synthesize linguistic phenomena involving syntactic or phonological constraints, we propose a model using phonological symbols based on the SP
Authors
(none)
Tags
Stats
Related papers
- Sequence To Sequence Neural Speech Synthesis With Prosody Modification Capabilities (2019)9.59
- Hierarchical Prosody Modeling For Non-autoregressive Speech Synthesis (2020)10.07
- Hierarchical Prosody Modeling And Control In Non-autoregressive Parallel Neural TTS (2021)8.35
- Investigation Of Learning Abilities On Linguistic Features In Sequence-to-sequence Text-to-speech Synthesis (2020)8.82
- Prosody-controllable Spontaneous TTS With Neural Hmms (2022)8.09
- Prosody Learning Mechanism For Speech Synthesis System Without Text Length Limit (2020)5.84
- Controllable Neural Text-to-speech Synthesis Using Intuitive Prosodic Features (2020)11.76
- Pausespeech: Natural Speech Synthesis Via Pre-trained Language Model And Pause-based Prosody Modeling (2023)2.26