The Theory Behind Controllable Expressive Speech Synthesis: A Cross-disciplinary Approach
2019 Β· NoΓ© Tits, Kevin El Haddad, Thierry Dutoit
Abstract
As part of the Human-Computer Interaction field, Expressive speech synthesis is a very rich domain as it requires knowledge in areas such as machine learning, signal processing, sociology, psychology. In this Chapter, we will focus mostly on the technical side. From the recording of expressive speech to its modeling, the reader will have an overview of the main paradigms used in this field, through some of the most prominent systems and methods. We explain how speech can be represented and encoded with audio features. We present a history of the main methods of Text-to-Speech synthesis: concatenative, parametric and statistical parametric speech synthesis. Finally, we focus on the last one, with the last techniques modeling Text-to-Speech synthesis as a sequence-to-sequence problem. This enables the use of Deep Learning blocks such as Convolutional and Recurrent Neural Networks as well as Attention Mechanism. The last part of the Chapter intends to assemble the different aspects of the
Authors
(none)
Tags
Stats
Related papers
- An Overview Of Affective Speech Synthesis And Conversion In The Deep Learning Era (2022)14.11
- Visualization And Interpretation Of Latent Spaces For Controlling Expressive Speech Synthesis Through Audio Analysis (2019)10.07
- A Methodology For Controlling The Emotional Expressiveness In Synthetic Speech -- A Deep Learning Approach (2019)5.84
- Towards Controllable Speech Synthesis In The Era Of Large Language Models: A Systematic Survey (2024)4.75
- Deep Encoder-decoder Models For Unsupervised Learning Of Controllable Speech Synthesis (2018)0.00
- Msemotts: Multi-scale Emotion Transfer, Prediction, And Control For Emotional Speech Synthesis (2022)13.97
- PROEMO: Prompt-driven Text-to-speech Synthesis Based On Emotion And Intensity Control (2025)0.00
- Gtr-voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis (2024)0.00