On Overfitting And Asymptotic Bias In Batch Reinforcement Learning With Partial Observability
2017 Β· Vincent Francois-Lavet, Guillaume Rabusseau, Joelle Pineau, et al.
Abstract
This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. This analysis relies on expressing the quality of a state representation by bounding L1 error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations, both on synthetic POMDPs and on a large-scale POMDP in the context of smartgrids, with real-world data. Finally, similarly to known results in the fully observable setting, we also briefly discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias a
Authors
(none)
Tags
Stats
Related papers
- Unbiased Asymmetric Reinforcement Learning Under Partial Observability (2021)2.26
- Near-optimal Partially Observable Reinforcement Learning With Partial Online State Information (2023)0.00
- Benchmarking Partial Observability In Reinforcement Learning With A Suite Of Memory-improvable Domains (2025)0.00
- Belief States For Cooperative Multi-agent Reinforcement Learning Under Partial Observability (2025)0.00
- Provable Representation With Efficient Planning For Partial Observable Reinforcement Learning (2023)0.00
- Pessimism In The Face Of Confounders: Provably Efficient Offline Reinforcement Learning In Partially Observable Markov Decision Processes (2022)0.00
- Provable Partially Observable Reinforcement Learning With Privileged Information (2024)2.26
- Provably Efficient Reinforcement Learning In Partially Observable Dynamical Systems (2022)0.00