Period VITS: Variational Inference With Explicit Pitch Modeling For End-to-end Emotional Speech Synthesis
2022 Β· Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, et al.
Abstract
Several fully end-to-end text-to-speech (TTS) models have been proposed that have shown better performance compared to cascade models (i.e., training acoustic and vocoder models separately). However, they often generate unstable pitch contour with audible artifacts when the dataset contains emotional attributes, i.e., large diversity of pronunciation and prosody. To address this problem, we propose Period VITS, a novel end-to-end TTS model that incorporates an explicit periodicity generator. In the proposed method, we introduce a frame pitch predictor that predicts prosodic features, such as pitch and voicing flags, from the input text. From these features, the proposed periodicity generator produces a sample-level sinusoidal source that enables the waveform decoder to accurately reproduce the pitch. Finally, the entire model is jointly optimized in an end-to-end manner with variational inference and adversarial objectives. As a result, the decoder becomes capable of generating more st
Authors
(none)
Tags
Stats
Related papers
- PITS: Variational Pitch Inference Without Fundamental Frequency For End-to-end Pitch-controllable TTS (2023)4.90
- Period Singer: Integrating Periodic And Aperiodic Variational Autoencoders For Natural-sounding End-to-end Singing Voice Synthesis (2024)2.26
- PAVITS: Exploring Prosody-aware VITS For End-to-end Emotional Voice Conversion (2024)8.35
- Conditional Variational Autoencoder With Adversarial Learning For End-to-end Text-to-speech (2021)0.00
- Visinger: Variational Inference With Adversarial Learning For End-to-end Singing Voice Synthesis (2021)12.99
- Cssinger: End-to-end Chunkwise Streaming Singing Voice Synthesis System Based On Conditional Variational Autoencoder (2024)0.00
- Using Generative Modelling To Produce Varied Intonation For Speech Synthesis (2019)7.81
- End-to-end Text-to-speech Using Latent Duration Based On VQ-VAE (2020)6.77