A Method For The Online Construction Of The Set Of States Of A Markov Decision Process Using Answer Set Programming
2017 Β· Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Paulo E. Santos, et al.
Abstract
Non-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named \{\em Online ASP for MDP\} (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with a changing environment. oASP(MDP) updates previously obtained policies, learnt by means of Reinforcement Learning (RL), using rules that represent the domain changes observed by the agent. These rules represent a set of domain constraints that are processed as ASP programs reducing the search space. Results show that oASP(MDP) is capable of finding solutions for problems in non-stationary domains without interfering with the action-value function approximation process.
Authors
(none)
Tags
Stats
Related papers
- Act As You Learn: Adaptive Decision-making In Non-stationary Markov Decision Processes (2024)0.00
- Dynamic Regret Of Online Markov Decision Processes (2022)0.00
- Robust Anytime Learning Of Markov Decision Processes (2022)0.00
- Online Reinforcement Learning In Markov Decision Process Using Linear Programming (2023)3.58
- Configurable Markov Decision Processes (2018)0.00
- Agent-state Based Policies In Pomdps: Beyond Belief-state Mdps (2024)0.00
- Decision Making In Non-stationary Environments With Policy-augmented Search (2024)0.00
- OCMDP: Observation-constrained Markov Decision Process (2024)0.00