A Methodology For Controlling The Emotional Expressiveness In Synthetic Speech -- A Deep Learning Approach
2019 Β· NoΓ© Tits
Abstract
In this project, we aim to build a Text-to-Speech system able to produce speech with a controllable emotional expressiveness. We propose a methodology for solving this problem in three main steps. The first is the collection of emotional speech data. We discuss the various formats of existing datasets and their usability in speech generation. The second step is the development of a system to automatically annotate data with emotion/expressiveness features. We compare several techniques using transfer learning to extract such a representation through other tasks and propose a method to visualize and interpret the correlation between vocal and emotional features. The third step is the development of a deep learning-based system taking text and emotion/expressiveness as input and producing speech as output. We study the impact of fine tuning from a neutral TTS towards an emotional TTS in terms of intelligibility and perception of the emotion.
Authors
(none)
Tags
Stats
Related papers
- PROEMO: Prompt-driven Text-to-speech Synthesis Based On Emotion And Intensity Control (2025)0.00
- Fine-grained Emotional Control Of Text-to-speech: Learning To Rank Inter- And Intra-class Emotion Intensities (2023)6.77
- Emospeech: A Corpus Of Emotionally Rich And Contextually Detailed Speech Annotations (2024)0.00
- Msemotts: Multi-scale Emotion Transfer, Prediction, And Control For Emotional Speech Synthesis (2022)13.97
- Emotional Dimension Control In Language Model-based Text-to-speech: Spanning A Broad Spectrum Of Human Emotions (2024)0.00
- Exploring Transfer Learning For Low Resource Emotional TTS (2019)0.00
- Generative Emotional AI For Speech Emotion Recognition: The Case For Synthetic Emotional Speech Augmentation (2023)11.19
- Semi-supervised Learning For Continuous Emotional Intensity Controllable Speech Synthesis With Disentangled Representations (2022)0.00