Editspeech: A Text Based Speech Editing System Using Partial Inference And Bidirectional Fusion
2021 Β· Daxin Tan, Liqun Deng, Yu Ting Yeung, et al.
Abstract
This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation in speech quality and naturalness. The EditSpeech system is developed upon a neural text-to-speech (NTTS) synthesis framework. Partial inference and bidirectional fusion are proposed to effectively incorporate the contextual information related to the edited region and achieve smooth transition at both left and right boundaries. Distortion introduced to the unmodified parts of the utterance is alleviated. The EditSpeech system is developed and evaluated on English and Chinese in multi-speaker scenarios. Objective and subjective evaluation demonstrate that EditSpeech outperforms a few baseline systems in terms of low spectral distortion and preferred speech quality. Audio samples are available online for demonstration https://daxintan-cuhk.gi
Authors
(none)
Tags
Stats
Related papers
- Diffeditor: Enhancing Speech Editing With Semantic Enrichment And Acoustic Consistency (2024)0.00
- Editts: Score-based Editing For Controllable Text-to-speech (2021)10.07
- Fluentspeech: Stutter-oriented Automatic Speech Editing With Context-aware Diffusion Models (2023)12.13
- Voiceshop: A Unified Speech-to-speech Framework For Identity-preserving Zero-shot Voice Editing (2024)0.00
- Partialedit: Identifying Partial Deepfakes In The Era Of Neural Speech Editing (2025)0.00
- Detecting The Undetectable: Assessing The Efficacy Of Current Spoof Detection Methods Against Seamless Speech Edits (2025)0.00
- Ssr-speech: Towards Stable, Safe And Robust Zero-shot Text-based Speech Editing And Synthesis (2024)2.26
- Fluenteditor: Text-based Speech Editing By Considering Acoustic And Prosody Consistency (2023)7.18