Mitigating Partial Observability In Sequential Decision Processes Via The Lambda Discrepancy
2024 Β· Cameron Allen, Aaron Kirtland, Ruo Yu Tao, et al.
Abstract
Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, without requiring access to -- or knowledge of -- an underlying, unobservable state space. Our metric, the \(\lambda\)-discrepancy, is the difference between two distinct temporal difference (TD) value estimates, each computed using TD(\(\lambda\)) with a different value of \(\lambda\). Since TD(\(\lambda\{=\}0\)) makes an implicit Markov assumption and TD(\(\lambda\{=\}1\)) does not, a discrepancy between these estimates is a potential indicator of a non-Markovian state representation. Indeed, we prove that the \(\lambda\)-discrepancy is exactly zero for all Markov decision processes and almost
Authors
(none)
Tags
Stats
Related papers
- Learning Causal States Under Partial Observability And Perturbation (2025)0.00
- Provably Efficient Reinforcement Learning In Partially Observable Dynamical Systems (2022)0.00
- A Relative Ignorability Framework For Decision-relevant Observability In Control Theory And Reinforcement Learning (2025)0.00
- Modeling The Effects Of Environmental And Perceptual Uncertainty Using Deterministic Reinforcement Learning Dynamics With Partial Observability (2021)9.59
- Benchmarking Partial Observability In Reinforcement Learning With A Suite Of Memory-improvable Domains (2025)0.00
- Near-optimal Partially Observable Reinforcement Learning With Partial Online State Information (2023)0.00
- Act-then-measure: Reinforcement Learning For Partially Observable Environments With Active Measuring (2023)3.58
- Reinforcement Learning Under Partial Observability Guided By Learned Environment Models (2022)6.34