Causal Markov Decision Processes: Learning Good Interventions Efficiently
2021 Β· Yangyi Lu, Amirhossein Meisami, Ambuj Tewari
Abstract
We introduce causal Markov Decision Processes (C-MDPs), a new formalism for sequential decision making which combines the standard MDP formulation with causal structures over state transition and reward functions. Many contemporary and emerging application areas such as digital healthcare and digital marketing can benefit from modeling with C-MDPs due to the causal mechanisms underlying the relationship between interventions and states/rewards. We propose the causal upper confidence bound value iteration (C-UCBVI) algorithm that exploits the causal structure in C-MDPs and improves the performance of standard reinforcement learning algorithms that do not take causal knowledge into account. We prove that C-UCBVI satisfies an \(\tilde\{O\}(HS\sqrt\{ZT\})\) regret bound, where \(T\) is the the total time steps, \(H\) is the episodic horizon, and \(S\) is the cardinality of the state space. Notably, our regret bound does not scale with the size of actions/interventions (\(A\)), but only sca
Authors
(none)
Tags
Stats
Related papers
- Towards Causal Model-based Policy Optimization (2025)0.00
- Provably Efficient Ucb-type Algorithms For Learning Predictive State Representations (2023)0.00
- Learning Causal State Representations Of Partially Observable Environments (2019)0.00
- Markov Decision Processes With Continuous Side Information (2017)0.00
- Towards Intervention-centric Causal Reasoning In Learning Agents (2020)0.00
- Robust Anytime Learning Of Markov Decision Processes (2022)0.00
- Bayesian Learning Of The Optimal Action-value Function In A Markov Decision Process (2025)0.00
- A Provably-efficient Model-free Algorithm For Constrained Markov Decision Processes (2021)0.00