Training Agents With Weakly Supervised Feedback From Large Language Models
2024 Β· Dihong Gong, Pu Lu, Zelong Wang, et al.
Abstract
Large Language Models (LLMs) offer a promising basis for creating agents that can tackle complex tasks through iterative environmental interaction. Existing methods either require these agents to mimic expert-provided trajectories or rely on definitive environmental feedback for reinforcement learning which limits their application to specific scenarios like gaming or code generation. This paper introduces a novel training method for LLM-based agents using weakly supervised signals from a critic LLM, bypassing the need for expert trajectories or definitive feedback. Our agents are trained in iterative manner, where they initially generate trajectories through environmental interaction. Subsequently, a critic LLM selects a subset of good trajectories, which are then used to update the agents, enabling them to generate improved trajectories in the next iteration. Extensive tests on the API-bank dataset show consistent improvement in our agents' capabilities and comparable performance to
Authors
(none)
Tags
Stats
Related papers
- Language Agents With Reinforcement Learning For Strategic Play In The Werewolf Game (2023)0.00
- Agentevolver: Towards Efficient Self-evolving Agent System (2025)0.00
- Guiding Reinforcement Learning Using Uncertainty-aware Large Language Models (2024)0.00
- Reinforcement Learning From LLM Feedback To Counteract Goal Misgeneralization (2024)0.00
- Proagent: Building Proactive Cooperative Agents With Large Language Models (2023)12.74
- Agent-pro: Learning To Evolve Via Policy-level Reflection And Optimization (2024)9.59
- From Laws To Motivation: Guiding Exploration Through Law-based Reasoning And Rewards (2024)0.00
- SAC-GLAM: Improving Online RL For LLM Agents With Soft Actor-critic And Hindsight Relabeling (2024)0.00