Prosody Learning Mechanism For Speech Synthesis System Without Text Length Limit
2020 Β· Zhen Zeng, Jianzong Wang, Ning Cheng, et al.
Abstract
Recent neural speech synthesis systems have gradually focused on the control of prosody to improve the quality of synthesized speech, but they rarely consider the variability of prosody and the correlation between prosody and semantics together. In this paper, a prosody learning mechanism is proposed to model the prosody of speech based on TTS system, where the prosody information of speech is extracted from the melspectrum by a prosody learner and combined with the phoneme sequence to reconstruct the mel-spectrum. Meanwhile, the sematic features of text from the pre-trained language model is introduced to improve the prosody prediction results. In addition, a novel self-attention structure, named as local attention, is proposed to lift this restriction of input text length, where the relative position information of the sequence is modeled by the relative position matrices so that the position encodings is no longer needed. Experiments on English and Mandarin show that speech with mor
Authors
(none)
Tags
Stats
Related papers
- Applying Syntax\(\unicode{x2013}\)prosody Mapping Hypothesis And Prosodic Well-formedness Constraints To Neural Sequence-to-sequence Speech Synthesis (2022)0.00
- Controllable Neural Text-to-speech Synthesis Using Intuitive Prosodic Features (2020)11.76
- Hierarchical Prosody Modeling For Non-autoregressive Speech Synthesis (2020)10.07
- Prosody-controllable Spontaneous TTS With Neural Hmms (2022)8.09
- Hierarchical Prosody Modeling And Control In Non-autoregressive Parallel Neural TTS (2021)8.35
- Sequence To Sequence Neural Speech Synthesis With Prosody Modification Capabilities (2019)9.59
- Adversarial Learning Of Intermediate Acoustic Feature For End-to-end Lightweight Text-to-speech (2022)0.00
- Spontaneous Style Text-to-speech Synthesis With Controllable Spontaneous Behaviors Based On Language Models (2024)7.81