Audiorole: An Audio Dataset For Character Role-playing In Large Language Models
2025 Β· Wenyu Li, Xiaoqi Jiao, Yi Chang, et al.
Abstract
The creation of high-quality multimodal datasets remains fundamental for advancing role-playing capabilities in large language models (LLMs). While existing works predominantly focus on text-based persona simulation, Audio Role-Playing (ARP) presents unique challenges due to the need for synchronized alignment of semantic content and vocal characteristics. To address this gap, we propose AudioRole, a meticulously curated dataset from 13 TV series spanning 1K+ hours with 1M+ character-grounded dialogues, providing synchronized audio-text pairs annotated with speaker identities and contextual metadata. In addition, to demonstrate the effectiveness of the dataset, we introduced ARP-Eval, a dual-aspect evaluation framework that assesses both response quality and role fidelity. Empirical validation showing GLM-4-Voice trained on AudioRole (which we called ARP-Model) achieve an average Acoustic Personalization score of 0.31, significantly outperforming the original GLM-4-voice and the more p
Authors
(none)
Tags
Stats
Related papers
- Speechrole: A Large-scale Dataset And Benchmark For Evaluating Speech Role-playing Agents (2025)1.91
- Auto-acd: A Large-scale Dataset For Audio-language Representation Learning (2023)10.74
- Towards Holistic Evaluation Of Large Audio-language Models: A Comprehensive Survey (2026)9.75
- Audiotoolagent: An Agentic Framework For Audio-language Models (2025)2.60
- Audiolm: A Language Modeling Approach To Audio Generation (2022)18.91
- Speaker Verification In Agent-generated Conversations (2024)0.00
- Audiosetcaps: An Enriched Audio-caption Dataset Using Automated Generation Pipeline With Large Audio And Language Models (2024)13.44
- Audiobench: A Universal Benchmark For Audio Large Language Models (2024)10.21