Speechagents: Human-communication Simulation With Multi-modal Multi-agent Systems
2024 Β· Dong Zhang, Zhaowei Li, Pengyu Wang, et al.
Abstract
Human communication is a complex and diverse process that not only involves multiple factors such as language, commonsense, and cultural backgrounds but also requires the participation of multimodal information, such as speech. Large Language Model (LLM)-based multi-agent systems have demonstrated promising performance in simulating human society. Can we leverage LLM-based multi-agent systems to simulate human communication? However, current LLM-based multi-agent systems mainly rely on text as the primary medium. In this paper, we propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication. SpeechAgents utilizes multi-modal LLM as the control center for individual agent and employes multi-modal signals as the medium for exchanged messages among agents. Additionally, we propose Multi-Agent Tuning to enhance the multi-agent capabilities of LLM without compromising general abilities. To strengthen and evaluate the effectiveness of human com
Authors
(none)
Tags
Stats
Related papers
- Dialogueagents: A Hybrid Agent-based Speech Synthesis Framework For Multi-party Dialogue (2025)1.69
- Recent Advances In Speech Language Models: A Survey (2024)14.64
- Audiotoolagent: An Agentic Framework For Audio-language Models (2025)2.60
- Towards Achieving Human Parity On End-to-end Simultaneous Speech Translation Via LLM Agent (2024)0.00
- Speaker Verification In Agent-generated Conversations (2024)0.00
- Agent-based Modular Learning For Multimodal Emotion Recognition In Human-agent Systems (2025)0.00
- Spoken Conversational Agents With Large Language Models (2025)0.00
- Large Language Model Can Transcribe Speech In Multi-talker Scenarios With Versatile Instructions (2024)11.23