Model-based Safe Deep Reinforcement Learning Via A Constrained Proximal Policy Optimization Algorithm
2022 Β· Ashish Kumar Jayant, Shalabh Bhatnagar
Abstract
During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents perform a significant number of random exploratory steps. In the real world, this can limit the practicality of these algorithms as it can lead to potentially dangerous behavior. Hence safe exploration is a critical issue in applying RL algorithms in the real world. This problem has been recently well studied under the Constrained Markov Decision Process (CMDP) Framework, where in addition to single-stage rewards, an agent receives single-stage costs or penalties as well depending on the state transitions. The prescribed cost functions are responsible for mapping undesirable behavior at any given time-step to a scalar value. The goal then is to find a feasible policy that maximizes reward returns while constraining the cost returns to be below a prescribed threshold during training as well as deployment. We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transiti
Authors
(none)
Tags
Stats
Related papers
- Safety Modulation: Enhancing Safety In Reinforcement Learning Through Cost-modulated Rewards (2025)0.00
- DOPE: Doubly Optimistic And Pessimistic Exploration For Safe Reinforcement Learning (2021)0.00
- Concurrent Learning Of Policy And Unknown Safety Constraints In Reinforcement Learning (2024)0.00
- Conservative And Adaptive Penalty For Model-based Safe Reinforcement Learning (2021)0.00
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- Multi-agent Constrained Policy Optimisation (2021)0.00
- CRPO: A New Approach For Safe Reinforcement Learning With Convergence Guarantee (2020)0.00
- Safe Reinforcement Learning For Constrained Markov Decision Processes With Stochastic Stopping Time (2024)2.26