Controllable Prosody Generation With Partial Inputs
2023 Β· Dan Andrei Iliescu, Devang Savita Ram Mohan, Tian Huey Teh, et al.
Abstract
We address the problem of human-in-the-loop control for generating prosody in the context of text-to-speech synthesis. Controlling prosody is challenging because existing generative models lack an efficient interface through which users can modify the output quickly and precisely. To solve this, we introduce a novel framework whereby the user provides partial inputs and the generative model generates the missing features. We propose a model that is specifically designed to encode partial prosodic features and output complete audio. We show empirically that our model displays two essential qualities of a human-in-the-loop control mechanism: efficiency and robustness. With even a very small number of input values (~4), our model enables users to improve the quality of the output significantly in terms of listener preference (4:1).
Authors
(none)
Tags
Stats
Related papers
- Semi-supervised Generative Modeling For Controllable Speech Synthesis (2019)0.00
- Controllable Speech Synthesis By Learning Discrete Phoneme-level Prosodic Representations (2022)6.34
- Prosodic Parameter Manipulation In TTS Generated Speech For Controlled Speech Generation (2024)0.00
- Controllable Neural Text-to-speech Synthesis Using Intuitive Prosodic Features (2020)11.76
- Hierarchical Prosody Modeling And Control In Non-autoregressive Parallel Neural TTS (2021)8.35
- Prosody-controllable Spontaneous TTS With Neural Hmms (2022)8.09
- Dynamic Prosody Generation For Speech Synthesis Using Linguistics-driven Acoustic Embedding Selection (2019)7.81
- Adversarial Learning Of Intermediate Acoustic Feature For End-to-end Lightweight Text-to-speech (2022)0.00