S2s-arena, Evaluating Speech2speech Protocols On Instruction Following With Paralinguistic Information
2025 Β· Feng Jiang, Zhiyu Lin, Fan Bu, et al.
Abstract
The rapid development of large language models (LLMs) has brought significant attention to speech models, particularly recent progress in speech2speech protocols supporting speech input and output. However, the existing benchmarks adopt automatic text-based evaluators for evaluating the instruction following ability of these models lack consideration for paralinguistic information in both speech understanding and generation. To address these issues, we introduce S2S-Arena, a novel arena-style S2S benchmark that evaluates instruction-following capabilities with paralinguistic information in both speech-in and speech-out across real-world tasks. We design 154 samples that fused TTS and live recordings in four domains with 21 tasks and manually evaluate existing popular speech models in an arena-style manner. The experimental results show that: (1) in addition to the superior performance of GPT-4o, the speech model of cascaded ASR, LLM, and TTS outperforms the jointly trained model after
Authors
(none)
Tags
Stats
Related papers
- Paras2s: Benchmarking And Aligning Spoken Language Models For Paralinguistic-aware Speech-to-speech Interaction (2025)0.00
- Speechrole: A Large-scale Dataset And Benchmark For Evaluating Speech Role-playing Agents (2025)1.91
- Desta2: Developing Instruction-following Speech Language Model Without Speech Instruction-tuning Data (2024)8.82
- Vocalbench: Benchmarking The Vocal Conversational Abilities For Speech Interaction Models (2025)0.00
- Larabench: Benchmarking Arabic AI With Large Language Models (2023)6.77
- Dynamic-superb Phase-2: A Collaboratively Expanding Benchmark For Measuring The Capabilities Of Spoken Language Models With 180 Tasks (2024)4.61
- Audiobench: A Universal Benchmark For Audio Large Language Models (2024)10.21
- Evaluating Text-to-speech Synthesis From A Large Discrete Token-based Speech Language Model (2024)0.00