Coevolving With The Other You: Fine-tuning LLM With Sequential Cooperative Multi-agent Reinforcement Learning
2024 Β· Hao Ma, Tianyi Hu, Zhiqiang Pu, et al.
Abstract
Reinforcement learning (RL) has emerged as a pivotal technique for fine-tuning large language models (LLMs) on specific tasks. However, prevailing RL fine-tuning methods predominantly rely on PPO and its variants. Though these algorithms are effective in general RL settings, they often exhibit suboptimal performance and vulnerability to distribution collapse when applied to the fine-tuning of LLMs. In this paper, we propose CORY, extending the RL fine-tuning of LLMs to a sequential cooperative multi-agent reinforcement learning framework, to leverage the inherent coevolution and emergent capabilities of multi-agent systems. In CORY, the LLM to be fine-tuned is initially duplicated into two autonomous agents: a pioneer and an observer. The pioneer generates responses based on queries, while the observer generates responses using both the queries and the pioneer's responses. The two agents are trained together. During training, the agents exchange roles periodically, fostering cooperatio
Authors
(none)
Tags
Stats
Related papers
- LERO: Llm-driven Evolutionary Framework With Hybrid Rewards And Enhanced Observation For Multi-agent Reinforcement Learning (2025)3.58
- Comas: Co-evolving Multi-agent Systems Via Interaction Rewards (2025)0.00
- YOLO-MARL: You Only LLM Once For Multi-agent Reinforcement Learning (2024)0.00
- Discovering Multiagent Learning Algorithms With Large Language Models (2026)2.05
- End-to-end Optimization Of Llm-driven Multi-agent Search Systems Via Heterogeneous-group-based Reinforcement Learning (2025)0.00
- Proagent: Building Proactive Cooperative Agents With Large Language Models (2023)12.74
- Language-driven Coordination And Learning In Multi-agent Simulation Environments (2025)0.00
- Training Agents With Weakly Supervised Feedback From Large Language Models (2024)0.00