Learning And Planning For Time-varying Mdps Using Maximum Likelihood Estimation
2019 Β· Melkior Ornik, Ufuk Topcu
Abstract
This paper proposes a formal approach to online learning and planning for agents operating in a priori unknown, time-varying environments. The proposed method computes the maximally likely model of the environment, given the observations about the environment made by an agent earlier in the system run and assuming knowledge of a bound on the maximal rate of change of system dynamics. Such an approach generalizes the estimation method commonly used in learning algorithms for unknown Markov decision processes with time-invariant transition probabilities, but is also able to quickly and correctly identify the system dynamics following a change. Based on the proposed method, we generalize the exploration bonuses used in learning for time-invariant Markov decision processes by introducing a notion of uncertainty in a learned time-varying model, and develop a control policy for time-varying Markov decision processes based on the exploitation and exploration trade-off. We demonstrate the prop
Authors
(none)
Tags
Stats
Related papers
- Provably Efficient Maximum Entropy Exploration (2018)0.00
- Parameterized Mdps And Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework (2020)8.60
- Non-stationary Markov Decision Processes, A Worst-case Approach Using Model-based Reinforcement Learning, Extended Version (2019)0.00
- Planning And Learning In Average Risk-aware Mdps (2025)0.00
- Information-theoretic Methods For Planning And Learning In Partially Observable Markov Decision Processes (2016)0.00
- Value-biased Maximum Likelihood Estimation For Model-based Reinforcement Learning In Discounted Linear Mdps (2023)0.00
- Learning And Planning In Average-reward Markov Decision Processes (2020)0.00
- Minimum-delay Adaptation In Non-stationary Reinforcement Learning Via Online High-confidence Change-point Detection (2021)0.00