M2-CTTS: End-to-end Multi-scale Multi-modal Conversational Text-to-speech Synthesis
2023 Β· Jinlong Xue, Yayue Deng, Fengping Wang, et al.
Abstract
Conversational text-to-speech (TTS) aims to synthesize speech with proper prosody of reply based on the historical conversation. However, it is still a challenge to comprehensively model the conversation, and a majority of conversational TTS systems only focus on extracting global information and omit local prosody features, which contain important fine-grained information like keywords and emphasis. Moreover, it is insufficient to only consider the textual features, and acoustic features also contain various prosody information. Hence, we propose M2-CTTS, an end-to-end multi-scale multi-modal conversational text-to-speech system, aiming to comprehensively utilize historical conversation and enhance prosodic expression. More specifically, we design a textual context module and an acoustic context module with both coarse-grained and fine-grained modeling. Experimental results demonstrate that our model mixed with fine-grained context information and additionally considering acoustic fea
Authors
(none)
Tags
Stats
Related papers
- Emphasis Rendering For Conversational Text-to-speech With Multi-modal Multi-scale Context Modeling (2024)0.00
- Fctalker: Fine And Coarse Grained Context Modeling For Expressive Conversational Speech Synthesis (2022)2.86
- MHTTS: Fast Multi-head Text-to-speech For Spontaneous Speech With Imperfect Transcription (2022)0.00
- MM-TTS: Multi-modal Prompt Based Style Transfer For Expressive Text-to-speech Synthesis (2023)8.60
- Copycat2: A Single Model For Multi-speaker TTS And Many-to-many Fine-grained Prosody Transfer (2022)5.24
- Enhancing Speaking Styles In Conversational Text-to-speech Synthesis With Graph-based Multi-modal Context Modeling (2021)0.00
- Ecat: An End-to-end Model For Multi-speaker TTS & Many-to-many Fine-grained Prosody Transfer (2023)0.00
- Hmm-based Data Augmentation For E2E Systems For Building Conversational Speech Synthesis Systems (2022)0.00