Rfm-editing: Rectified Flow Matching For Text-guided Audio Editing
2025 Β· Liting Gao, Yi Yuan, Yaru Chen, et al.
Abstract
Diffusion models have shown remarkable progress in text-to-audio generation. However, text-guided audio editing remains in its early stages. This task focuses on modifying the target content within an audio signal while preserving the rest, thus demanding precise localization and faithful editing according to the text prompt. Existing training-based and zero-shot methods that rely on full-caption or costly optimization often struggle with complex editing or lack practicality. In this work, we propose a novel end-to-end efficient rectified flow matching-based diffusion framework for audio editing, and construct a dataset featuring overlapping multi-event audio to support training and benchmarking in complex scenarios. Experiments show that our model achieves faithful semantic alignment without requiring auxiliary captions or masks, while maintaining competitive editing quality across metrics.
Authors
(none)
Tags
Stats
Related papers
- Voiceflow: Efficient Text-to-speech With Rectified Flow Matching (2023)0.00
- Flashaudio: Rectified Flows For Fast And High-fidelity Text-to-audio Generation (2024)5.13
- Fluentspeech: Stutter-oriented Automatic Speech Editing With Context-aware Diffusion Models (2023)12.13
- Diffeditor: Enhancing Speech Editing With Semantic Enrichment And Acoustic Consistency (2024)0.00
- Aadiff: Audio-aligned Video Synthesis With Text-to-image Diffusion (2023)0.00
- Real-time Streamable Generative Speech Restoration With Flow Matching (2025)0.00
- Controlaudio: Tackling Text-guided, Timing-indicated And Intelligible Audio Generation Via Progressive Diffusion Modeling (2025)0.00
- Flowavse: Efficient Audio-visual Speech Enhancement With Conditional Flow Matching (2024)0.00