Mpceval: A Benchmark For Multi-party Conversation Generation
2026 Β· Minxing Zhang, Yi Yang, Zhuofan Jia, et al.
Abstract
Multi-party conversation generation, such as smart reply and collaborative assistants, is an increasingly important capability of generative AI, yet its evaluation remains a critical bottleneck. Compared to two-party dialogue, multi-party settings introduce distinct challenges, including complex turn-taking, role-dependent speaker behavior, long-range conversational structure, and multiple equally valid continuations. Accordingly, we introduce MPCEval, a task-aware evaluation and benchmarking suite for multi-party conversation generation. MPCEval decomposes generation quality into speaker modeling, content quality, and speaker--content consistency, and explicitly distinguishes local next-turn prediction from global full-conversation generation. It provides novel, quantitative, reference-free, and reproducible metrics that scale across datasets and models. We apply MPCEval to diverse public and real-world datasets and evaluate modern generation methods alongside human-authored conversat
Authors
(none)
Tags
Stats
Related papers
- Mtavg-bench: A Comprehensive Benchmark For Evaluating Multi-talker Dialogue-centric Audio-video Generation (2026)0.00
- VCB Bench: An Evaluation Benchmark For Audio-grounded Large Language Model Conversational Agents (2025)0.00
- Dialogueagents: A Hybrid Agent-based Speech Synthesis Framework For Multi-party Dialogue (2025)1.69
- Vocalbench: Benchmarking The Vocal Conversational Abilities For Speech Interaction Models (2025)0.00
- Sd-eval: A Benchmark Dataset For Spoken Dialogue Understanding Beyond Words (2024)11.32
- Dynamic-superb Phase-2: A Collaboratively Expanding Benchmark For Measuring The Capabilities Of Spoken Language Models With 180 Tasks (2024)4.61
- Speechrole: A Large-scale Dataset And Benchmark For Evaluating Speech Role-playing Agents (2025)1.91
- Multimodal Large Language Models For End-to-end Affective Computing: Benchmarking And Boosting With Generative Knowledge Prompting (2025)0.00