Enforcing Almost-sure Reachability In Pomdps
2020 Β· Sebastian Junges, Nils Jansen, Sanjit A. Seshia
Abstract
Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.
Authors
(none)
Tags
Stats
Related papers
- Sparse Tree Search Optimality Guarantees In Pomdps With Continuous Observation Spaces (2019)5.84
- Statistical Tractability Of Off-policy Evaluation Of History-dependent Policies In Pomdps (2025)0.00
- Sequential Monte Carlo For Policy Optimization In Continuous Pomdps (2025)0.00
- Near-optimal Partially Observable Reinforcement Learning With Partial Online State Information (2023)0.00
- Finite-state Controllers For (hidden-model) Pomdps Using Deep Reinforcement Learning (2026)0.00
- Agent-state Based Policies In Pomdps: Beyond Belief-state Mdps (2024)0.00
- Model-based Learning Of Near-optimal Finite-window Policies In Pomdps (2026)0.00
- A Spectral Approach To Off-policy Evaluation For Pomdps (2021)0.00