Safe Reinforcement Learning In Tensor Reproducing Kernel Hilbert Space
2023 Β· Xiaoyuan Cheng, Boli Chen, Liz Varga, et al.
Abstract
This paper delves into the problem of safe reinforcement learning (RL) in a partially observable environment with the aim of achieving safe-reachability objectives. In traditional partially observable Markov decision processes (POMDP), ensuring safety typically involves estimating the belief in latent states. However, accurately estimating an optimal Bayesian filter in POMDP to infer latent states from observations in a continuous state space poses a significant challenge, largely due to the intractable likelihood. To tackle this issue, we propose a stochastic model-based approach that guarantees RL safety almost surely in the face of unknown system dynamics and partial observation environments. We leveraged the Predictive State Representation (PSR) and Reproducing Kernel Hilbert Space (RKHS) to represent future multi-step observations analytically, and the results in this context are provable. Furthermore, we derived essential operators from the kernel Bayes' rule, enabling the recurs
Authors
(none)
Tags
Stats
Related papers
- On The Robustness Of Safe Reinforcement Learning Under Observational Perturbations (2022)0.00
- DOPE: Doubly Optimistic And Pessimistic Exploration For Safe Reinforcement Learning (2021)0.00
- Safe Reinforcement Learning Via Projection On A Safe Set: How To Achieve Optimality? (2020)0.00
- Provably Optimal Reinforcement Learning Under Safety Filtering (2025)0.00
- Provably Efficient Reinforcement Learning In Partially Observable Dynamical Systems (2022)0.00
- Representation Of Reinforcement Learning Policies In Reproducing Kernel Hilbert Spaces (2020)0.00
- Safe Continual Reinforcement Learning In Non-stationary Environments (2026)12.89
- Hierarchical Framework For Interpretable And Probabilistic Model-based Safe Reinforcement Learning (2023)0.00