Speechrole: A Large-scale Dataset And Benchmark For Evaluating Speech Role-playing Agents
2025 Β· Changhao Jiang, Jiajun Sun, Yifei Cao, et al.
Abstract
Speech is essential for realistic role-playing, yet existing work on role-playing agents largely centers on text, leaving Speech Role-Playing Agents (SRPAs) underexplored and without systematic evaluation. We introduce SpeechRole, a unified framework for developing and assessing SRPAs. SpeechRole-Data contains 98 roles and 111k speech-to-speech conversations with rich timbre and prosodic variation, providing large-scale resources for training SRPAs. SpeechRole-Eval offers a multidimensional benchmark that directly evaluates generated speech, preserving paralinguistic cues and measuring interaction ability, speech expressiveness, and role-playing fidelity. Experiments show that end-to-end SRPAs such as GPT-4o Audio achieve strong fluency and naturalness, but remain limited in prosody consistency and emotion appropriateness. In contrast, current open-source end-to-end models exhibit substantial performance gaps across multiple evaluation dimensions. Cascaded and end-to-end systems achiev
Authors
(none)
Tags
Stats
Related papers
- Audiorole: An Audio Dataset For Character Role-playing In Large Language Models (2025)0.00
- S2s-arena, Evaluating Speech2speech Protocols On Instruction Following With Paralinguistic Information (2025)0.00
- Sd-eval: A Benchmark Dataset For Spoken Dialogue Understanding Beyond Words (2024)11.32
- Paras2s: Benchmarking And Aligning Spoken Language Models For Paralinguistic-aware Speech-to-speech Interaction (2025)0.00
- Vocalbench: Benchmarking The Vocal Conversational Abilities For Speech Interaction Models (2025)0.00
- Speechcolab Leaderboard: An Open-source Platform For Automatic Speech Recognition Evaluation (2024)9.05
- Speaker Verification In Agent-generated Conversations (2024)0.00
- Dialogueagents: A Hybrid Agent-based Speech Synthesis Framework For Multi-party Dialogue (2025)1.69