Deep Reinforcement Learning Behavioral Mode Switching Using Optimal Control Based On A Latent Space Objective
2024 · Sindre Benjamin Remman, Bjørn Andreas Kristiansen, Anastasios M. Lekkas
Abstract
In this work, we use optimal control to change the behavior of a deep reinforcement learning policy by optimizing directly in the policy's latent space. We hypothesize that distinct behavioral patterns, termed behavioral modes, can be identified within certain regions of a deep reinforcement learning policy's latent space, meaning that specific actions or strategies are preferred within these regions. We identify these behavioral modes using latent space dimension-reduction with \ac*\{pacmap\}. Using the actions generated by the optimal control procedure, we move the system from one behavioral mode to another. We subsequently utilize these actions as a filter for interpreting the neural network policy. The results show that this approach can impose desired behavioral modes in the policy, demonstrated by showing how a failed episode can be made successful and vice versa using the lunar lander reinforcement learning environment.
Authors
(none)
Tags
Stats
Related papers
- Learnable Behavior Control: Breaking Atari Human World Records Via Sample-efficient Behavior Selection (2023)0.00
- Categorical Policies: Multimodal Policy Learning And Exploration In Continuous Control (2025)0.00
- Latent Space Policies For Hierarchical Reinforcement Learning (2018)0.00
- Specialized Deep Residual Policy Safe Reinforcement Learning-based Controller For Complex And Continuous State-action Spaces (2023)4.52
- Policy Optimization In A Noisy Neighborhood: On Return Landscapes In Continuous Control (2023)0.00
- Reinforcement Learning With A Focus On Adjusting Policies To Reach Targets (2024)0.00
- Improved Exploration Through Latent Trajectory Optimization In Deep Deterministic Policy Gradient (2019)0.00
- Behavior-guided Actor-critic: Improving Exploration Via Learning Policy Behavior Representation For Deep Reinforcement Learning (2021)0.00