Learning Task Automata For Reinforcement Learning Using Hidden Markov Models
2022 Β· Alessandro Abate, Yousif Almulla, James Fox, et al.
Abstract
Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification, especially when the environment's dynamics are only partially known. This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state `task automata' from episodes of agent experience within unknown environments. We leverage two key algorithmic insights. First, we learn a product MDP, a model composed of the specification's automaton and the environment's MDP (both initially unknown), by treating the product MDP as a partially observable MDP and using the well-known Baum-Welch algorithm for learning hidden Markov models. Second, we propose a novel method for distilling the task automaton (assumed to be a deterministic finite automaton) from the learnt product MDP. Our learnt task automaton enables the dec
Authors
(none)
Tags
Stats
Related papers
- Learning Reward Machines: A Study In Partially Observable Reinforcement Learning (2021)0.00
- Learning Symbolic Representations For Reinforcement Learning Of Non-markovian Behavior (2023)0.00
- Estimating Disentangled Belief About Hidden State And Hidden Task For Meta-rl (2021)0.00
- Learning Non-markovian Reward Models In Mdps (2020)0.00
- Unifying Task Specification In Reinforcement Learning (2016)0.00
- Joint Learning Of Reward Machines And Policies In Environments With Partially Known Semantics (2022)3.58
- Provable Multi-task Reinforcement Learning: A Representation Learning Framework With Low Rank Rewards (2026)0.00
- Autonomous Extraction Of A Hierarchical Structure Of Tasks In Reinforcement Learning, A Sequential Associate Rule Mining Approach (2018)0.00