Amphion: An Open-source Audio, Music And Speech Generation Toolkit
2023 Β· Xueyao Zhang, Liumeng Xue, Yicheng Gu, et al.
Abstract
Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that includes diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, allowing both beginners and seasoned researchers to kick-start their projects with relative ease. The initial release of Amphion v0.1 supports a range of tasks including Text to Speech (TTS), Text to Audio (TTA), and Singing Voice Conversion (SVC), supplemented by essential components like data preprocessing, state-of-the-art vocoders, and evaluation metrics. This paper presents a high-level overview of Amphion. Amphion is open-sourced at https://github.com/open-mmlab/Amphion.
Authors
(none)
Tags
Stats
Code
Related papers
- An Automated End-to-end Open-source Software For High-quality Text-to-speech Dataset Generation (2024)0.00
- Audio-agent: Leveraging Llms For Audio Generation, Editing And Composition (2024)0.00
- Audiox: A Unified Framework For Anything-to-audio Generation (2025)0.00
- Audiolm: A Language Modeling Approach To Audio Generation (2022)18.91
- Audio-omni: Extending Multi-modal Understanding To Versatile Audio Generation And Editing (2026)0.00
- Mntts: An Open-source Mongolian Text-to-speech Synthesis Dataset And Accompanied Baseline (2022)5.24
- Empowering Global Voices: A Data-efficient, Phoneme-tone Adaptive Approach To High-fidelity Speech Synthesis (2025)0.00
- Fireredtts: A Foundation Text-to-speech Framework For Industry-level Generative Speech Applications (2024)0.00