LERO: Llm-driven Evolutionary Framework With Hybrid Rewards And Enhanced Observation For Multi-agent Reinforcement Learning
2025 Β· Yuan Wei, Xiaohan Shan, Jianmin Li
Abstract
Multi-agent reinforcement learning (MARL) faces two critical bottlenecks distinct from single-agent RL: credit assignment in cooperative tasks and partial observability of environmental states. We propose LERO, a framework integrating Large language models (LLMs) with evolutionary optimization to address these MARL-specific challenges. The solution centers on two LLM-generated components: a hybrid reward function that dynamically allocates individual credit through reward decomposition, and an observation enhancement function that augments partial observations with inferred environmental context. An evolutionary algorithm optimizes these components through iterative MARL training cycles, where top-performing candidates guide subsequent LLM generations. Evaluations in Multi-Agent Particle Environments (MPE) demonstrate LERO's superiority over baseline methods, with improved task performance and training efficiency.
Authors
(none)
Tags
Stats
Related papers
- End-to-end Optimization Of Llm-driven Multi-agent Search Systems Via Heterogeneous-group-based Reinforcement Learning (2025)0.00
- Language-driven Coordination And Learning In Multi-agent Simulation Environments (2025)0.00
- Evolutionary Reinforcement Learning For Sample-efficient Multiagent Coordination (2019)0.00
- Locality Matters: A Scalable Value Decomposition Approach For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- LIGS: Learnable Intrinsic-reward Generation Selection For Multi-agent Learning (2021)0.00
- YOLO-MARL: You Only LLM Once For Multi-agent Reinforcement Learning (2024)0.00
- Bierl: A Meta Evolutionary Reinforcement Learning Framework Via Bilevel Optimization (2023)2.26
- MARSHAL: Incentivizing Multi-agent Reasoning Via Self-play With Strategic Llms (2025)0.00