Joint Learning Of Reward Machines And Policies In Environments With Partially Known Semantics
2022 Β· Christos Verginis, Cevahir Koprulu, Sandeep Chinchali, et al.
Abstract
We study the problem of reinforcement learning for a task encoded by a reward machine. The task is defined over a set of properties in the environment, called atomic propositions, and represented by Boolean variables. One unrealistic assumption commonly used in the literature is that the truth values of these propositions are accurately known. In real situations, however, these truth values are uncertain since they come from sensors that suffer from imperfections. At the same time, reward machines can be difficult to model explicitly, especially when they encode complicated tasks. We develop a reinforcement-learning algorithm that infers a reward machine that encodes the underlying task while learning how to execute it, despite the uncertainties of the propositions' truth values. In order to address such uncertainties, the algorithm maintains a probabilistic estimate about the truth value of the atomic propositions; it updates this estimate according to new sensory measurements that ar
Authors
(none)
Tags
Stats
Related papers
- Learning Reward Machines: A Study In Partially Observable Reinforcement Learning (2021)0.00
- Inferring Probabilistic Reward Machines From Non-markovian Reward Processes For Reinforcement Learning (2021)0.00
- Learning Task Automata For Reinforcement Learning Using Hidden Markov Models (2022)2.26
- Reinforcement Learning With Reward Machines In Stochastic Games (2023)0.00
- A Hierarchical Bayesian Approach To Inverse Reinforcement Learning With Symbolic Reward Machines (2022)0.00
- Learning Interpretable Policies In Hindsight-observable Pomdps Through Partially Supervised Reinforcement Learning (2024)2.26
- Learning Symbolic Representations For Reinforcement Learning Of Non-markovian Behavior (2023)0.00
- Deep Decentralized Multi-task Multi-agent Reinforcement Learning Under Partial Observability (2017)0.00