Ancogen: Analysis, Control And Generation Of Speech With A Masked Autoencoder
2025 Β· Samir Sadok, Simon Leglaive, Laurent Girin, et al.
Abstract
This article introduces AnCoGen, a novel method that leverages a masked autoencoder to unify the analysis, control, and generation of speech signals within a single model. AnCoGen can analyze speech by estimating key attributes, such as speaker identity, pitch, content, loudness, signal-to-noise ratio, and clarity index. In addition, it can generate speech from these attributes and allow precise control of the synthesized speech by modifying them. Extensive experiments demonstrated the effectiveness of AnCoGen across speech analysis-resynthesis, pitch estimation, pitch modification, and speech enhancement.
Authors
(none)
Tags
Stats
Related papers
- Audiogen: Textually Guided Audio Generation (2022)0.00
- Analysis By Adversarial Synthesis -- A Novel Approach For Speech Vocoding (2019)3.58
- Maskgct: Zero-shot Text-to-speech With Masked Generative Codec Transformer (2024)7.98
- A Post Auto-regressive GAN Vocoder Focused On Spectrum Fracture (2022)0.00
- Maskedspeech: Context-aware Speech Synthesis With Masking Strategy (2022)4.52
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85
- Voice Command Generation Using Progressive Wavegans (2019)0.00
- Channel-aware Domain-adaptive Generative Adversarial Network For Robust Speech Recognition (2024)4.52