Optimal Transport-guided Safety In Temporal Difference Reinforcement Learning
2025 Β· Zahra Shahrooei, Ali Baheri
Abstract
The primary goal of reinforcement learning is to develop decision-making policies that prioritize optimal performance, frequently without considering safety. In contrast, safe reinforcement learning seeks to reduce or avoid unsafe behavior. This paper views safety as taking actions with more predictable consequences under environment stochasticity and introduces a temporal difference algorithm that uses optimal transport theory to quantify the uncertainty associated with actions. By integrating this uncertainty score into the decision-making objective, the agent is encouraged to favor actions with more predictable outcomes. We theoretically prove that our algorithm leads to a reduction in the probability of visiting unsafe states. We evaluate the proposed algorithm on several case studies in the presence of various forms of environment uncertainty. The results demonstrate that our method not only provides safer behavior but also maintains the performance. A Python implementation of our
Authors
(none)
Tags
Stats
Related papers
- Optimal Transport Perturbations For Safe Reinforcement Learning With Robustness Guarantees (2023)0.00
- Provably Optimal Reinforcement Learning Under Safety Filtering (2025)0.00
- Concurrent Learning Of Policy And Unknown Safety Constraints In Reinforcement Learning (2024)0.00
- Joint Learning Of Policy With Unknown Temporal Constraints For Safe Reinforcement Learning (2023)0.00
- Context-aware Safe Reinforcement Learning For Non-stationary Environments (2021)9.76
- Actsafe: Active Exploration With Safety Constraints For Reinforcement Learning (2024)0.00
- Safe Reinforcement Learning For Constrained Markov Decision Processes With Stochastic Stopping Time (2024)2.26
- Safe Continual Reinforcement Learning In Non-stationary Environments (2026)12.89