Meta Sac-lag: Towards Deployable Safe Reinforcement Learning Via Metagradient-based Hyperparameter Tuning
2024 Β· Homayoun Honari, Amir Mehdi Soufi Enayati, Mehran Ghafarian Tamizi, et al.
Abstract
Safe Reinforcement Learning (Safe RL) is one of the prevalently studied subcategories of trial-and-error-based methods with the intention to be deployed on real-world systems. In safe RL, the goal is to maximize reward performance while minimizing constraints, often achieved by setting bounds on constraint functions and utilizing the Lagrangian method. However, deploying Lagrangian-based safe RL in real-world scenarios is challenging due to the necessity of threshold fine-tuning, as imprecise adjustments may lead to suboptimal policy convergence. To mitigate this challenge, we propose a unified Lagrangian-based model-free architecture called Meta Soft Actor-Critic Lagrangian (Meta SAC-Lag). Meta SAC-Lag uses meta-gradient optimization to automatically update the safety-related hyperparameters. The proposed method is designed to address safe exploration and threshold adjustment with minimal hyperparameter tuning requirement. In our pipeline, the inner parameters are updated through the
Authors
(none)
Tags
Stats
Related papers
- Metatrace Actor-critic: Online Step-size Tuning By Meta-gradient Descent For Reinforcement Learning Control (2018)0.00
- Evolving Pareto-optimal Actor-critic Algorithms For Generalizability And Stability (2022)0.00
- Context-aware Safe Reinforcement Learning For Non-stationary Environments (2021)9.76
- Meta-gradient Reinforcement Learning With An Objective Discovered Online (2020)0.00
- Model-based Safe Deep Reinforcement Learning Via A Constrained Proximal Policy Optimization Algorithm (2022)5.24
- Concurrent Learning Of Policy And Unknown Safety Constraints In Reinforcement Learning (2024)0.00
- Conservative And Adaptive Penalty For Model-based Safe Reinforcement Learning (2021)0.00
- Omnisafe: An Infrastructure For Accelerating Safe Reinforcement Learning Research (2023)0.00