Inferring Probabilistic Reward Machines From Non-markovian Reward Processes For Reinforcement Learning
2021 Β· Taylor Dohmen, Noah Topper, George Atia, et al.
Abstract
The success of reinforcement learning in typical settings is predicated on Markovian assumptions on the reward signal by which an agent learns optimal policies. In recent years, the use of reward machines has relaxed this assumption by enabling a structured representation of non-Markovian rewards. In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning. However, these reward machines cannot capture the semantics of stochastic reward signals. In this paper, we make progress on this front by introducing probabilistic reward machines (PRMs) as a representation of non-Markovian stochastic rewards. We present an algorithm to learn PRMs from the underlying decision process and prove results around its correctness and convergence.
Authors
(none)
Tags
Stats
Related papers
- Learning Reward Machines: A Study In Partially Observable Reinforcement Learning (2021)0.00
- Reinforcement Learning With Reward Machines In Stochastic Games (2023)0.00
- Learning Non-markovian Reward Models In Mdps (2020)0.00
- Decentralized Graph-based Multi-agent Reinforcement Learning Using Reward Machines (2021)0.00
- Learning Robust Reward Machines From Noisy Labels (2024)0.00
- Joint Learning Of Reward Machines And Policies In Environments With Partially Known Semantics (2022)3.58
- Markov Abstractions For PAC Reinforcement Learning In Non-markov Decision Processes (2022)0.00
- A Hierarchical Bayesian Approach To Inverse Reinforcement Learning With Symbolic Reward Machines (2022)0.00