Joint Multi-scale Cross-lingual Speaking Style Transfer With Bidirectional Attention Mechanism For Automatic Dubbing
2023 Β· Jingbei Li, Sipan Li, Ping Chen, et al.
Abstract
Automatic dubbing, which generates a corresponding version of the input speech in another language, could be widely utilized in many real-world scenarios such as video and game localization. In addition to synthesizing the translated scripts, automatic dubbing needs to further transfer the speaking style in the original language to the dubbed speeches to give audiences the impression that the characters are speaking in their native tongue. However, state-of-the-art automatic dubbing systems only model the transfer on duration and speaking rate, neglecting the other aspects in speaking style such as emotion, intonation and emphasis which are also crucial to fully perform the characters and speech understanding. In this paper, we propose a joint multi-scale cross-lingual speaking style transfer framework to simultaneously model the bidirectional speaking style transfer between languages at both global (i.e. utterance level) and local (i.e. word level) scales. The global and local speakin
Authors
(none)
Tags
Stats
Related papers
- Towards Expressive Video Dubbing With Multiscale Multimodal Context Interaction (2024)4.52
- Large-scale Multilingual Audio Visual Dubbing (2020)0.00
- Dubwise: Video-guided Speech Duration Control In Multimodal Llm-based Text-to-speech For Dubbing (2024)3.58
- Styles2st: Zero-shot Style Transfer For Direct Speech-to-speech Translation (2023)0.00
- Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios (2021)6.77
- Improving Prosody For Cross-speaker Style Transfer By Semi-supervised Style Extractor And Hierarchical Modeling In Speech Synthesis (2023)7.50
- Transplantation Of Conversational Speaking Style With Interjections In Sequence-to-sequence Speech Synthesis (2022)0.00
- Speech-to-speech Translation With Discrete-unit-based Style Transfer (2023)0.00