Markov Decision Processes With Continuous Side Information
2017 Β· Aditya Modi, Nan Jiang, Satinder Singh, et al.
Abstract
We consider a reinforcement learning (RL) setting in which the agent interacts with a sequence of episodic MDPs. At the start of each episode the agent has access to some side-information or context that determines the dynamics of the MDP for that episode. Our setting is motivated by applications in healthcare where baseline measurements of a patient at the start of a treatment episode form the context that may provide information about how the patient might respond to treatment decisions. We propose algorithms for learning in such Contextual Markov Decision Processes (CMDPs) under an assumption that the unobserved MDP parameters vary smoothly with the observed context. We also give lower and upper PAC bounds under the smoothness assumption. Because our lower bound has an exponential dependence on the dimension, we consider a tractable linear setting where the context is used to create linear combinations of a finite set of MDPs. For the linear setting, we give a PAC learning algorithm
Authors
(none)
Tags
Stats
Related papers
- No-regret Exploration In Contextual Reinforcement Learning (2019)0.00
- PAC Bounds For Imitation And Model-based Batch Learning Of Contextual Markov Decision Processes (2020)0.00
- Reinforcement Learning In Presence Of Discrete Markovian Context Evolution (2022)0.00
- Inverse Reinforcement Learning In Contextual Mdps (2019)8.82
- Robust Anytime Learning Of Markov Decision Processes (2022)0.00
- Square-root Regret Bounds For Continuous-time Episodic Markov Decision Processes (2022)2.26
- Causal Markov Decision Processes: Learning Good Interventions Efficiently (2021)0.00
- Contextual Decision Processes With Low Bellman Rank Are Pac-learnable (2016)0.00