Fluentspeech: Stutter-oriented Automatic Speech Editing With Context-aware Diffusion Models
2023 Β· Ziyue Jiang, Qian Yang, Jialong Zuo, et al.
Abstract
Stutter removal is an essential scenario in the field of speech editing. However, when the speech recording contains stutters, the existing text-based speech editing approaches still suffer from: 1) the over-smoothing problem in the edited speech; 2) lack of robustness due to the noise introduced by stutter; 3) to remove the stutters, users are required to determine the edited region manually. To tackle the challenges in stutter removal, we propose FluentSpeech, a stutter-oriented automatic speech editing model. Specifically, 1) we propose a context-aware diffusion model that iteratively refines the modified mel-spectrogram with the guidance of context features; 2) we introduce a stutter predictor module to inject the stutter information into the hidden sequence; 3) we also propose a stutter-oriented automatic speech editing (SASE) dataset that contains spontaneous speech recordings with time-aligned stutter labels to train the automatic stutter localization model. Experimental results
Authors
(none)
Tags
Stats
Related papers
- Diffeditor: Enhancing Speech Editing With Semantic Enrichment And Acoustic Consistency (2024)0.00
- Stutter-solver: End-to-end Multi-lingual Dysfluency Detection (2024)5.24
- Rfm-editing: Rectified Flow Matching For Text-guided Audio Editing (2025)0.00
- Editspeech: A Text Based Speech Editing System Using Partial Inference And Bidirectional Fusion (2021)9.92
- Storm: A Diffusion-based Stochastic Regeneration Model For Speech Enhancement And Dereverberation (2022)15.43
- Language Translation, And Change Of Accent For Speech-to-speech Task Using Diffusion Model (2025)0.00
- Cold Diffusion For Speech Enhancement (2022)11.85
- Fluenteditor: Text-based Speech Editing By Considering Acoustic And Prosody Consistency (2023)7.18