Sd-eval: A Benchmark Dataset For Spoken Dialogue Understanding Beyond Words
2024 Β· Junyi Ao, Yuancheng Wang, Xiaohai Tian, et al.
Abstract
Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, including speech. Although these models can be adept at recognizing and analyzing speech, they often fall short of generating appropriate responses. We argue that this is due to the lack of principles on task definition and model development, which requires open-source datasets and metrics suitable for model evaluation. To bridge the gap, we present SD-Eval, a benchmark dataset aimed at multidimensional evaluation of spoken dialogue understanding and generation. SD-Eval focuses on paralinguistic and environmental information and includes 7,303 utterances, amounting to 8.76 hours of speech data. The da
Authors
(none)
Tags
Stats
Related papers
- Vocalbench: Benchmarking The Vocal Conversational Abilities For Speech Interaction Models (2025)0.00
- E-chat: Emotion-sensitive Spoken Dialogue System With Large Language Models (2023)7.50
- Spokenwoz: A Large-scale Speech-text Benchmark For Spoken Task-oriented Dialogue Agents (2023)2.26
- MMSU: A Massive Multi-task Spoken Language Understanding And Reasoning Benchmark (2025)2.29
- Audiobench: A Universal Benchmark For Audio Large Language Models (2024)10.21
- Speechdialoguefactory: Generating High-quality Speech Dialogue Data To Accelerate Your Speech-llm Development (2025)0.00
- Speechrole: A Large-scale Dataset And Benchmark For Evaluating Speech Role-playing Agents (2025)1.91
- Dailytalk: Spoken Dialogue Dataset For Conversational Text-to-speech (2022)0.00