True Knowledge Comes From Practice: Aligning Llms With Embodied Environments Via Reinforcement Learning
2024 Β· Weihao Tan, Wentao Zhang, Shanqi Liu, et al.
Abstract
Despite the impressive performance across numerous tasks, large language models (LLMs) often fail in solving simple decision-making tasks due to the misalignment of the knowledge in LLMs with environments. On the contrary, reinforcement learning (RL) agents learn policies from scratch, which makes them always align with environments but difficult to incorporate prior knowledge for efficient explorations. To narrow the gap, we propose TWOSOME, a novel general online framework that deploys LLMs as decision-making agents to efficiently interact and align with embodied environments via RL without requiring any prepared datasets or prior knowledge of the environments. Firstly, we query the joint probabilities of each valid action with LLMs to form behavior policies. Then, to enhance the stability and robustness of the policies, we propose two normalization methods and summarize four prompt design principles. Finally, we design a novel parameter-efficient training architecture where the acto
Authors
(none)
Tags
Stats
Related papers
- Think In Games: Learning To Reason In Games Via Reinforcement Learning With Large Language Models (2025)0.00
- Tompo: Training LLM Strategic Decision Making From A Multi-agent Perspective (2025)0.00
- SAC-GLAM: Improving Online RL For LLM Agents With Soft Actor-critic And Hindsight Relabeling (2024)0.00
- A Survey On Enhancing Reinforcement Learning In Complex Environments: Insights From Human And LLM Feedback (2024)0.00
- Language Agents With Reinforcement Learning For Strategic Play In The Werewolf Game (2023)0.00
- Mental Modeling Of Reinforcement Learning Agents By Language Models (2024)0.00
- From Laws To Motivation: Guiding Exploration Through Law-based Reasoning And Rewards (2024)0.00
- Training Agents With Weakly Supervised Feedback From Large Language Models (2024)0.00