Paras2s: Benchmarking And Aligning Spoken Language Models For Paralinguistic-aware Speech-to-speech Interaction
2025 Β· Shu-Wen Yang, Ming Tu, Andy T. Liu, et al.
Abstract
Speech-to-Speech (S2S) models have shown promising dialogue capabilities, but their ability to handle paralinguistic cues - such as emotion, tone, and speaker attributes - and to respond appropriately in both content and style remains under-explored. Progress is further hindered by the scarcity of high-quality and expressive demonstrations. To address this, we introduce a new reinforcement learning (RL) framework for paralinguistic-aware S2S, ParaS2S, which evaluates and optimizes both response content and speaking style directly at the waveform level. We first construct ParaS2SBench, a benchmark that evaluates the naturalness of input-output pairs in terms of content and speaking style using expressive and challenging queries. For the automatic judge, we propose a PolyTone training strategy and a multi-stage framework, preventing the style hallucination of end-to-end audio LLM judging. Our judge correlates well with human preferences and is scalable, enabling the model to interact and
Authors
(none)
Tags
Stats
Related papers
- S2s-arena, Evaluating Speech2speech Protocols On Instruction Following With Paralinguistic Information (2025)0.00
- Vocalbench: Benchmarking The Vocal Conversational Abilities For Speech Interaction Models (2025)0.00
- A Unit-based System And Dataset For Expressive Direct Speech-to-speech Translation (2025)2.26
- SLM-S2ST: A Multimodal Language Model For Direct Speech-to-speech Translation (2025)0.00
- Desta2: Developing Instruction-following Speech Language Model Without Speech Instruction-tuning Data (2024)8.82
- Speechrole: A Large-scale Dataset And Benchmark For Evaluating Speech Role-playing Agents (2025)1.91
- Align-slm: Textless Spoken Language Models With Reinforcement Learning From AI Feedback (2024)7.16
- ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, And Datasets (2024)4.52