Analysis And Assessment Of Controllability Of An Expressive Deep Learning-based TTS System
2021 Β· NoΓ© Tits, Kevin El Haddad, Thierry Dutoit
Abstract
In this paper, we study the controllability of an Expressive TTS system trained on a dataset for a continuous control. The dataset is the Blizzard 2013 dataset based on audiobooks read by a female speaker containing a great variability in styles and expressiveness. Controllability is evaluated with both an objective and a subjective experiment. The objective assessment is based on a measure of correlation between acoustic features and the dimensions of the latent space representing expressiveness. The subjective assessment is based on a perceptual experiment in which users are shown an interface for Controllable Expressive TTS and asked to retrieve a synthetic utterance whose expressiveness subjectively corresponds to that a reference utterance.
Authors
(none)
Tags
Stats
Related papers
- A Methodology For Controlling The Emotional Expressiveness In Synthetic Speech -- A Deep Learning Approach (2019)5.84
- Towards Controllable Speech Synthesis In The Era Of Large Language Models: A Systematic Survey (2024)4.75
- Visualization And Interpretation Of Latent Spaces For Controlling Expressive Speech Synthesis Through Audio Analysis (2019)10.07
- Fine-grained Emotional Control Of Text-to-speech: Learning To Rank Inter- And Intra-class Emotion Intensities (2023)6.77
- Description-based Controllable Text-to-speech With Cross-lingual Voice Control (2024)2.26
- Semi-supervised Learning For Continuous Emotional Intensity Controllable Speech Synthesis With Disentangled Representations (2022)0.00
- Text-driven Emotional Style Control And Cross-speaker Style Transfer In Neural TTS (2022)7.81
- Prosody-controllable Spontaneous TTS With Neural Hmms (2022)8.09