Speechdialoguefactory: Generating High-quality Speech Dialogue Data To Accelerate Your Speech-llm Development
2025 Β· Minghan Wang, Ye Bai, Yuxia Wang, et al.
Abstract
High-quality speech dialogue datasets are crucial for Speech-LLM development, yet existing acquisition methods face significant limitations. Human recordings incur high costs and privacy concerns, while synthetic approaches often lack conversational authenticity. To address these challenges, we introduce \textsc\{SpeechDialogueFactory\}, a production-ready framework for generating natural speech dialogues efficiently. Our solution employs a comprehensive pipeline including metadata generation, dialogue scripting, paralinguistic-enriched utterance simulation, and natural speech synthesis with voice cloning. Additionally, the system provides an interactive UI for detailed sample inspection and a high-throughput batch synthesis mode. Evaluations show that dialogues generated by our system achieve a quality comparable to human recordings while significantly reducing production costs. We release our work as an open-source toolkit, alongside example datasets available in English and Chinese,
Authors
(none)
Tags
Stats
Related papers
- Dialogueagents: A Hybrid Agent-based Speech Synthesis Framework For Multi-party Dialogue (2025)1.69
- A Framework For Synthetic Audio Conversations Generation Using Large Language Models (2024)3.58
- Generating Data With Text-to-speech And Large-language Models For Conversational Speech Recognition (2024)6.34
- Sd-eval: A Benchmark Dataset For Spoken Dialogue Understanding Beyond Words (2024)11.32
- Speechllm-as-judges: Towards General And Interpretable Speech Quality Evaluation (2025)2.60
- Vocalbench: Benchmarking The Vocal Conversational Abilities For Speech Interaction Models (2025)0.00
- Dailytalk: Spoken Dialogue Dataset For Conversational Text-to-speech (2022)0.00
- SLIDE: Integrating Speech Language Model With LLM For Spontaneous Spoken Dialogue Generation (2025)2.26