A Relative Ignorability Framework For Decision-relevant Observability In Control Theory And Reinforcement Learning
2025 Β· Marylena Bleile, Minh-Nhat Phung, Minh-Binh Tran
Abstract
Sequential decision-making systems routinely operate with missing or incomplete data. Classical reinforcement learning theory, which is commonly used to solve sequential decision problems, assumes Markovian observability, which may not hold under partial observability. Causal inference paradigms formalise ignorability of missingness. We show these views can be unified and generalized in order to guarantee Q-learning convergence even when the Markov property fails. To do so, we introduce the concept of relative ignorability. Relative ignorability is a graphical-causal criterion which refines the requirements for accurate decision-making based on incomplete data. Theoretical results and simulations both reveal that non-Markovian stochastic processes whose missingness is relatively ignorable with respect to causal estimands can still be optimized using standard Reinforcement Learning algorithms. These results expand the theoretical foundations of safe, data-efficient AI to real-world envi
Authors
(none)
Tags
Stats
Related papers
- Learning Causal States Under Partial Observability And Perturbation (2025)0.00
- Reinforcement Learning Under Partial Observability Guided By Learned Environment Models (2022)6.34
- Quantifying First-order Markov Violations In Noisy Reinforcement Learning: A Causal Discovery Approach (2025)0.00
- Active Inference And Reinforcement Learning: A Unified Inference On Continuous State And Action Spaces Under Partial Observability (2022)5.84
- Probabilistic Inverse Optimal Control For Non-linear Partially Observable Systems Disentangles Perceptual Uncertainty And Behavioral Costs (2023)0.00
- Modeling The Effects Of Environmental And Perceptual Uncertainty Using Deterministic Reinforcement Learning Dynamics With Partial Observability (2021)9.59
- Benchmarking Partial Observability In Reinforcement Learning With A Suite Of Memory-improvable Domains (2025)0.00
- Mitigating Partial Observability In Sequential Decision Processes Via The Lambda Discrepancy (2024)0.00