Non-stationary Markov Decision Processes, A Worst-case Approach Using Model-based Reinforcement Learning, Extended Version
2019 Β· Erwan Lecarpentier, Emmanuel Rachelson
Abstract
This work tackles the problem of robust zero-shot planning in non-stationary stochastic environments. We study Markov Decision Processes (MDPs) evolving over time and consider Model-Based Reinforcement Learning algorithms in this setting. We make two hypotheses: 1) the environment evolves continuously with a bounded evolution rate; 2) a current model is known at each decision epoch but not its evolution. Our contribution can be presented in four points. 1) we define a specific class of MDPs that we call Non-Stationary MDPs (NSMDPs). We introduce the notion of regular evolution by making an hypothesis of Lipschitz-Continuity on the transition and reward functions w.r.t. time; 2) we consider a planning agent using the current model of the environment but unaware of its future evolution. This leads us to consider a worst-case method where the environment is seen as an adversarial agent; 3) following this approach, we propose the Risk-Averse Tree-Search (RATS) algorithm, a zero-shot Model-
Authors
(none)
Tags
Stats
Related papers
- Act As You Learn: Adaptive Decision-making In Non-stationary Markov Decision Processes (2024)0.00
- Robust Anytime Learning Of Markov Decision Processes (2022)0.00
- Model-based Exploration In Monitored Markov Decision Processes (2025)0.00
- Solving Robust Mdps Through No-regret Dynamics (2023)0.00
- Dynamic Regret Of Online Markov Decision Processes (2022)0.00
- Parameterized Mdps And Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework (2020)8.60
- Efficient Learning In Non-stationary Linear Markov Decision Processes (2020)6.77
- Decision Making In Non-stationary Environments With Policy-augmented Monte Carlo Tree Search (2022)0.00