Emphasis Rendering For Conversational Text-to-speech With Multi-modal Multi-scale Context Modeling
2024 Β· Rui Liu, Zhenqi Jia, Jie Yang, et al.
Abstract
Conversational Text-to-Speech (CTTS) aims to accurately express an utterance with the appropriate style within a conversational setting, which attracts more attention nowadays. While recognizing the significance of the CTTS task, prior studies have not thoroughly investigated speech emphasis expression, which is essential for conveying the underlying intention and attitude in human-machine interaction scenarios, due to the scarcity of conversational emphasis datasets and the difficulty in context understanding. In this paper, we propose a novel Emphasis Rendering scheme for the CTTS model, termed ER-CTTS, that includes two main components: 1) we simultaneously take into account textual and acoustic contexts, with both global and local semantic modeling to understand the conversation context comprehensively; 2) we deeply integrate multi-modal and multi-scale context to learn the influence of context on the emphasis expression of the current utterance. Finally, the inferred emphasis feat
Authors
(none)
Tags
Stats
Related papers
- M2-CTTS: End-to-end Multi-scale Multi-modal Conversational Text-to-speech Synthesis (2023)8.35
- EE-TTS: Emphatic Expressive TTS With Linguistic Information (2023)2.26
- Fctalker: Fine And Coarse Grained Context Modeling For Expressive Conversational Speech Synthesis (2022)2.86
- Emotion Rendering For Conversational Speech Synthesis With Heterogeneous Graph-based Context Modeling (2023)13.15
- Enhancing Speaking Styles In Conversational Text-to-speech Synthesis With Graph-based Multi-modal Context Modeling (2021)0.00
- Msemotts: Multi-scale Emotion Transfer, Prediction, And Control For Emotional Speech Synthesis (2022)13.97
- Msstyletts: Multi-scale Style Modeling With Hierarchical Context Information For Expressive Speech Synthesis (2023)6.77
- Multi-scale Accent Modeling And Disentangling For Multi-speaker Multi-accent Text-to-speech Synthesis (2024)2.26