Fluenteditor: Text-based Speech Editing By Considering Acoustic And Prosody Consistency
2023 · Rui Liu, Jiatian Xi, Ziyue Jiang, et al.
Abstract
Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by modifying the input text transcript instead of the audio itself. Despite much progress in neural network-based TSE techniques, the current techniques have focused on reducing the difference between the generated speech segment and the reference target in the editing region, ignoring its local and global fluency in the context and original utterance. To maintain the speech fluency, we propose a fluency speech editing model, termed \textit\{FluentEditor\}, by considering fluency-aware training criterion in the TSE training. Specifically, the \textit\{acoustic consistency constraint\} aims to smooth the transition between the edited region and its neighboring acoustic segments consistent with the ground truth, while the \textit\{prosody consistency constraint\} seeks to ensure that the prosody attributes within the edited regions remain consistent with the overall style of the original utte
Authors
(none)
Tags
Stats
Related papers
- Fluenteditor2: Text-based Speech Editing By Modeling Multi-scale Acoustic And Prosody Consistency (2024)3.95
- Diffeditor: Enhancing Speech Editing With Semantic Enrichment And Acoustic Consistency (2024)0.00
- Towards Zero-shot Text-based Voice Editing Using Acoustic Context Conditioning, Utterance Embeddings, And Reference Encoders (2022)0.00
- Fluentspeech: Stutter-oriented Automatic Speech Editing With Context-aware Diffusion Models (2023)12.13
- Improving Multi-speaker TTS Prosody Variance With A Residual Encoder And Normalizing Flows (2021)0.00
- Editspeech: A Text Based Speech Editing System Using Partial Inference And Bidirectional Fusion (2021)9.92
- Dynamic Prosody Generation For Speech Synthesis Using Linguistics-driven Acoustic Embedding Selection (2019)7.81
- Editts: Score-based Editing For Controllable Text-to-speech (2021)10.07