Expressive Prompting: Improving Emotion Intensity And Speaker Consistency In Zero-shot TTS
2024 Β· Haoyu Wang, Chunyu Qiang, Tianrui Wang, et al.
Abstract
Recent advancements in speech synthesis have enabled large language model (LLM)-based systems to perform zero-shot generation with controllable content, timbre, speaker identity, and emotion through input prompts. As a result, these models heavily rely on prompt design to guide the generation process. However, existing prompt selection methods often fail to ensure that prompts contain sufficiently stable speaker identity cues and appropriate emotional intensity indicators, which are crucial for expressive speech synthesis. To address this challenge, we propose a two-stage prompt selection strategy specifically designed for expressive speech synthesis. In the static stage (before synthesis), we first evaluate prompt candidates using pitch-based prosodic features, perceptual audio quality, and text-emotion coherence scores evaluated by an LLM. We further assess the candidates under a specific TTS model by measuring character error rate, speaker similarity, and emotional similarity betwee
Authors
(none)
Tags
Stats
Related papers
- PROEMO: Prompt-driven Text-to-speech Synthesis Based On Emotion And Intensity Control (2025)0.00
- Generating Speakers By Prompting Listener Impressions For Pre-trained Multi-speaker Text-to-speech Systems (2024)3.58
- Improving Language Model-based Zero-shot Text-to-speech Synthesis With Multi-scale Acoustic Prompts (2023)3.58
- An Empirical Study Of Speech Language Models For Prompt-conditioned Speech Synthesis (2024)0.00
- Wav2prompt: End-to-end Speech Prompt Generation And Tuning For LLM In Zero And Few-shot Learning (2024)0.00
- Plug-and-play Emotion Graphs For Compositional Prompting In Zero-shot Speech Emotion Recognition (2025)0.00
- Prompttts++: Controlling Speaker Identity In Prompt-based Text-to-speech Using Natural Language Descriptions (2023)9.23
- UMETTS: A Unified Framework For Emotional Text-to-speech Synthesis With Multimodal Prompts (2024)5.24