Learn A Flexible Exploration Model For Parameterized Action Markov Decision Processes
2025 Β· Zijian Wang, Bin Wang, Mingwen Shao, et al.
Abstract
Hybrid action models are widely considered an effective approach to reinforcement learning (RL) modeling. The current mainstream method is to train agents under Parameterized Action Markov Decision Processes (PAMDPs), which performs well in specific environments. Unfortunately, these models either exhibit drastic low learning efficiency in complex PAMDPs or lose crucial information in the conversion between raw space and latent space. To enhance the learning efficiency and asymptotic performance of the agent, we propose a model-based RL (MBRL) algorithm, FLEXplore. FLEXplore learns a parameterized-action-conditioned dynamics model and employs a modified Model Predictive Path Integral control. Unlike conventional MBRL algorithms, we carefully design the dynamics loss function and reward smoothing process to learn a loose yet flexible model. Additionally, we use the variational lower bound to maximize the mutual information between the state and the hybrid action, enhancing the explorati
Authors
(none)
Tags
Stats
Related papers
- Active Exploration In Markov Decision Processes (2019)0.00
- Model-based Exploration In Monitored Markov Decision Processes (2025)0.00
- Centralized Model And Exploration Policy For Multi-agent RL (2021)0.00
- Smart Exploration In Reinforcement Learning Using Bounded Uncertainty Models (2025)0.00
- Optimal Horizon-free Reward-free Exploration For Linear Mixture Mdps (2023)0.00
- Planning With Exploration: Addressing Dynamics Bottleneck In Model-based Reinforcement Learning (2020)0.00
- Efficient Model-based Reinforcement Learning Through Optimistic Policy Search And Planning (2020)0.00
- Lazy-mdps: Towards Interpretable Reinforcement Learning By Learning When To Act (2022)0.00