CAMEL: Continuous Action Masking Enabled By Large Language Models For Reinforcement Learning
2025 Β· Yanxiao Zhao, Yangge Qian, Jingyang Shan, et al.
Abstract
Reinforcement learning (RL) in continuous action spaces encounters persistent challenges, such as inefficient exploration and convergence to suboptimal solutions. To address these limitations, we propose CAMEL, a novel framework integrating LLM-generated suboptimal policies into the RL training pipeline. CAMEL leverages dynamic action masking and an adaptive epsilon-masking mechanism to guide exploration during early training stages while gradually enabling agents to optimize policies independently. At the core of CAMEL lies the integration of Python-executable suboptimal policies generated by LLMs based on environment descriptions and task objectives. Although simplistic and hard-coded, these policies offer valuable initial guidance for RL agents. To effectively utilize these priors, CAMEL employs masking-aware optimization to dynamically constrain the action space based on LLM outputs. Additionally, epsilon-masking gradually reduces reliance on LLM-generated guidance, enabling agents
Authors
(none)
Tags
Stats
Related papers
- Excluding The Irrelevant: Focusing Reinforcement Learning Through Continuous Action Masking (2024)4.52
- Lifelong Reinforcement Learning With Modulating Masks (2022)0.00
- CAMMARL: Conformal Action Modeling In Multi Agent Reinforcement Learning (2023)0.00
- Action Mapping For Reinforcement Learning In Continuous Environments With Constraints (2024)0.00
- SAC-GLAM: Improving Online RL For LLM Agents With Soft Actor-critic And Hindsight Relabeling (2024)0.00
- Zero-shot Model-based Reinforcement Learning Using Large Language Models (2024)0.00
- Llm-explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven By Large Language Models (2025)0.00
- Centralized Cooperative Exploration Policy For Continuous Control Tasks (2023)0.00