End-to-end Optimization Of Llm-driven Multi-agent Search Systems Via Heterogeneous-group-based Reinforcement Learning
2025 Β· Guanzhong Chen, Shaoxiong Yang, Chao Li, et al.
Abstract
Large language models (LLMs) are versatile, yet their deployment in complex real-world settings is limited by static knowledge cutoffs and the difficulty of producing controllable behavior within a single inference. Multi-agent search systems (MASS), which coordinate specialized LLM agents equipped with search tools, mitigate these issues via task decomposition and retrieval-augmented problem solving. However, optimizing LLMs for agent-specific roles remains labor-intensive with prompt engineering or supervised fine-tuning, motivating automated end-to-end training. Existing multi-agent reinforcement learning (MARL) methods such as Multi-Agent Proximal Policy Optimization (MAPPO) typically depend on large critic networks to evaluate joint actions, leading to instability and high memory costs. We introduce Multi-Agent Heterogeneous Group Policy Optimization (MHGPO), which updates policies by estimating relative advantages across heterogeneous groups of multi-agent rollouts, shifting the
Authors
(none)
Tags
Stats
Related papers
- Language-driven Coordination And Learning In Multi-agent Simulation Environments (2025)0.00
- Heterogeneous Multi-agent Reinforcement Learning For Zero-shot Scalable Collaboration (2024)6.34
- Representation Learning For Efficient Deep Multi-agent Reinforcement Learning (2024)0.00
- Tompo: Training LLM Strategic Decision Making From A Multi-agent Perspective (2025)0.00
- Heterogeneous Multi-robot Reinforcement Learning (2023)6.77
- Multi-agent Constrained Policy Optimisation (2021)0.00
- Agent-pro: Learning To Evolve Via Policy-level Reflection And Optimization (2024)9.59
- LERO: Llm-driven Evolutionary Framework With Hybrid Rewards And Enhanced Observation For Multi-agent Reinforcement Learning (2025)3.58