Efficient PAC Reinforcement Learning In Regular Decision Processes
2021 Β· Alessandro Ronca, Giuseppe de Giacomo
Abstract
Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.
Authors
(none)
Tags
Stats
Related papers
- Markov Abstractions For PAC Reinforcement Learning In Non-markov Decision Processes (2022)0.00
- Omega-regular Decision Processes (2023)0.00
- Contextual Decision Processes With Low Bellman Rank Are Pac-learnable (2016)0.00
- Beyond No Regret: Instance-dependent PAC Reinforcement Learning (2021)0.00
- Regular Decision Processes For Grid Worlds (2021)0.00
- Performative Reinforcement Learning With Linear Markov Decision Process (2024)0.00
- A PAC Learning Algorithm For LTL And Omega-regular Objectives In Mdps (2023)3.58
- Temporal Regularization In Markov Decision Process (2018)0.00