Guiding Reinforcement Learning Using Uncertainty-aware Large Language Models
2024 Β· Maryam Shoaeinaeini, Brent Harrison
Abstract
Human guidance in reinforcement learning (RL) is often impractical for large-scale applications due to high costs and time constraints. Large Language Models (LLMs) offer a promising alternative to mitigate RL sample inefficiency and potentially replace human trainers. However, applying LLMs as RL trainers is challenging due to their overconfidence and less reliable solutions in sequential tasks. We address this limitation by introducing a calibrated guidance system that uses Monte Carlo Dropout to enhance LLM advice reliability by assessing prediction variances from multiple forward passes. Additionally, we develop a novel RL policy shaping method based on dynamic model average entropy to adjust the LLM's influence on RL policies according to guidance uncertainty. This approach ensures robust RL training by relying on reliable LLM guidance. To validate our contributions, we conduct extensive experiments in a Minigrid environment with three goals in varying environment sizes. The resul
Authors
(none)
Tags
Stats
Related papers
- GHPO: Adaptive Guidance For Stable And Efficient LLM Reinforcement Learning (2025)0.00
- Reinforcement Learning From LLM Feedback To Counteract Goal Misgeneralization (2024)0.00
- DYSTIL: Dynamic Strategy Induction With Large Language Models For Reinforcement Learning (2025)0.00
- Training Agents With Weakly Supervised Feedback From Large Language Models (2024)0.00
- Llm-explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven By Large Language Models (2025)0.00
- Zero-shot Model-based Reinforcement Learning Using Large Language Models (2024)0.00
- A Survey On Enhancing Reinforcement Learning In Complex Environments: Insights From Human And LLM Feedback (2024)0.00
- Remax: A Simple, Effective, And Efficient Reinforcement Learning Method For Aligning Large Language Models (2023)0.00