Emospeech: A Corpus Of Emotionally Rich And Contextually Detailed Speech Annotations
2024 Β· Weizhen Bian, Yubo Zhou, Kaitai Zhang, et al.
Abstract
Advances in text-to-speech (TTS) technology have significantly improved the quality of generated speech, closely matching the timbre and intonation of the target speaker. However, due to the inherent complexity of human emotional expression, the development of TTS systems capable of controlling subtle emotional differences remains a formidable challenge. Existing emotional speech databases often suffer from overly simplistic labelling schemes that fail to capture a wide range of emotional states, thus limiting the effectiveness of emotion synthesis in TTS applications. To this end, recent efforts have focussed on building databases that use natural language annotations to describe speech emotions. However, these approaches are costly and require more emotional depth to train robust systems. In this paper, we propose a novel process aimed at building databases by systematically extracting emotion-rich speech segments and annotating them with detailed natural language descriptions throug
Authors
(none)
Tags
Stats
Related papers
- A Methodology For Controlling The Emotional Expressiveness In Synthetic Speech -- A Deep Learning Approach (2019)5.84
- EMOVIE: A Mandarin Emotion Speech Dataset With A Simple Emotional Text-to-speech Model (2021)0.00
- Emosphere-tts: Emotional Style And Intensity Modeling Via Spherical Emotion Vector For Controllable Emotional Text-to-speech (2024)10.35
- Emotional Dimension Control In Language Model-based Text-to-speech: Spanning A Broad Spectrum Of Human Emotions (2024)0.00
- Msemotts: Multi-scale Emotion Transfer, Prediction, And Control For Emotional Speech Synthesis (2022)13.97
- Generative Emotional AI For Speech Emotion Recognition: The Case For Synthetic Emotional Speech Augmentation (2023)11.19
- Making Social Platforms Accessible: Emotion-aware Speech Generation With Integrated Text Analysis (2024)4.52
- EMNS /imz/ Corpus: An Emotive Single-speaker Dataset For Narrative Storytelling In Games, Television And Graphic Novels (2023)0.00