Towards Causal Model-based Policy Optimization
2025 Β· Alberto Caron, Vasilios Mavroudis, Chris Hicks
Abstract
Real-world decision-making problems are often marked by complex, uncertain dynamics that can shift or break under changing conditions. Traditional Model-Based Reinforcement Learning (MBRL) approaches learn predictive models of environment dynamics from queried trajectories and then use these models to simulate rollouts for policy optimization. However, such methods do not account for the underlying causal mechanisms that govern the environment, and thus inadvertently capture spurious correlations, making them sensitive to distributional shifts and limiting their ability to generalize. The same naturally holds for model-free approaches. In this work, we introduce Causal Model-Based Policy Optimization (C-MBPO), a novel framework that integrates causal learning into the MBRL pipeline to achieve more robust, explainable, and generalizable policy learning algorithms. Our approach centers on first inferring a Causal Markov Decision Process (C-MDP) by learning a local Structural Causal Mod
Authors
(none)
Tags
Stats
Related papers
- Learning By Doing: An Online Causal Reinforcement Learning Framework With Causal-aware Policy (2024)1.56
- Learning Nonlinear Causal Reductions To Explain Reinforcement Learning Policies (2025)0.00
- Conservative Dual Policy Optimization For Efficient Model-based Reinforcement Learning (2022)0.00
- Towards Empowerment Gain Through Causal Structure Learning In Model-based RL (2025)0.00
- Policy Optimization With Model-based Explorations (2018)5.84
- How To Fine-tune The Model: Unified Model Shift And Model Bias Policy Optimization (2023)0.00
- Causal Reinforcement Learning Using Observational And Interventional Data (2021)0.00
- When To Trust Your Model: Model-based Policy Optimization (2019)0.00