Quantifying First-order Markov Violations In Noisy Reinforcement Learning: A Causal Discovery Approach
2025 Β· Naveen Mysore
Abstract
Reinforcement learning (RL) methods frequently assume that each new observation completely reflects the environment's state, thereby guaranteeing Markovian (one-step) transitions. In practice, partial observability or sensor/actuator noise often invalidates this assumption. This paper proposes a systematic methodology for detecting such violations, combining a partial correlation-based causal discovery process (PCMCI) with a novel Markov Violation score (MVS). The MVS measures multi-step dependencies that emerge when noise or incomplete state information disrupts the Markov property. Classic control tasks (CartPole, Pendulum, Acrobot) serve as examples to illustrate how targeted noise and dimension omissions affect both RL performance and measured Markov consistency. Surprisingly, even substantial observation noise sometimes fails to induce strong multi-lag dependencies in certain domains (e.g., Acrobot). In contrast, dimension-dropping investigations show that excluding some state v
Authors
(none)
Tags
Stats
Related papers
- Learning Causal States Under Partial Observability And Perturbation (2025)0.00
- Reinforcement Learning With Perturbed Rewards (2018)13.74
- Learning Nonlinear Causal Reductions To Explain Reinforcement Learning Policies (2025)0.00
- Mutual Information Tracks Policy Coherence In Reinforcement Learning (2025)0.00
- Exploring The Training Robustness Of Distributional Reinforcement Learning Against Noisy State Observations (2021)0.00
- Rate Or Fate? Rlv\(^\varepsilon\)r: Reinforcement Learning With Verifiable Noisy Rewards (2026)0.00
- Reccover: Detecting Causal Confusion For Explainable Reinforcement Learning (2022)0.00
- A Relative Ignorability Framework For Decision-relevant Observability In Control Theory And Reinforcement Learning (2025)0.00