Pausespeech: Natural Speech Synthesis Via Pre-trained Language Model And Pause-based Prosody Modeling
2023 Β· Ji-Sang Hwang, Sang-Hoon Lee, Seong-Whan Lee
Abstract
Although text-to-speech (TTS) systems have significantly improved, most TTS systems still have limitations in synthesizing speech with appropriate phrasing. For natural speech synthesis, it is important to synthesize the speech with a phrasing structure that groups words into phrases based on semantic information. In this paper, we propose PuaseSpeech, a speech synthesis system with a pre-trained language model and pause-based prosody modeling. First, we introduce a phrasing structure encoder that utilizes a context representation from the pre-trained language model. In the phrasing structure encoder, we extract a speaker-dependent syntactic representation from the context representation and then predict a pause sequence that separates the input text into phrases. Furthermore, we introduce a pause-based word encoder to model word-level prosody based on pause sequence. Experimental results show PauseSpeech outperforms previous models in terms of naturalness. Furthermore, in terms of obj
Authors
(none)
Tags
Stats
Related papers
- Duration-aware Pause Insertion Using Pre-trained Language Model For Multi-speaker Text-to-speech (2023)5.84
- Leveraging The Interplay Between Syntactic And Acoustic Cues For Optimizing Korean TTS Pause Formation (2024)0.00
- Prosodyfm: Unsupervised Phrasing And Intonation Control For Intelligible Speech Synthesis (2024)0.00
- Applying Syntax\(\unicode{x2013}\)prosody Mapping Hypothesis And Prosodic Well-formedness Constraints To Neural Sequence-to-sequence Speech Synthesis (2022)0.00
- Sequence To Sequence Neural Speech Synthesis With Prosody Modification Capabilities (2019)9.59
- Spontaneous Style Text-to-speech Synthesis With Controllable Spontaneous Behaviors Based On Language Models (2024)7.81
- Hierarchical Prosody Modeling For Non-autoregressive Speech Synthesis (2020)10.07
- Modeling Prosodic Phrasing With Multi-task Learning In Tacotron-based TTS (2020)9.41