Partially Observable Reinforcement Learning With Memory Traces
2025 Β· Onno Eberhard, Michael Muehlebach, Claire Vernade
Abstract
Partially observable environments present a considerable computational challenge in reinforcement learning due to the need to consider long histories. Learning with a finite window of observations quickly becomes intractable as the window length grows. In this work, we introduce memory traces. Inspired by eligibility traces, these are compact representations of the history of observations in the form of exponential moving averages. We prove sample complexity bounds for the problem of offline on-policy evaluation that quantify the return errors achieved with memory traces for the class of Lipschitz continuous value estimates. We establish a close connection to the window approach, and demonstrate that, in certain environments, learning with memory traces is significantly more sample efficient. Finally, we underline the effectiveness of memory traces empirically in online reinforcement learning experiments for both value prediction and control.
Authors
(none)
Tags
Stats
Related papers
- The Act Of Remembering: A Study In Partially Observable Reinforcement Learning (2020)0.00
- Trajectory-aware Eligibility Traces For Off-policy Reinforcement Learning (2023)0.00
- Provably Efficient Reinforcement Learning In Partially Observable Dynamical Systems (2022)0.00
- Recall Traces: Backtracking Models For Efficient Reinforcement Learning (2018)0.00
- Benchmarking Partial Observability In Reinforcement Learning With A Suite Of Memory-improvable Domains (2025)0.00
- Improving The Efficiency Of Off-policy Reinforcement Learning By Accounting For Past Decisions (2021)0.00
- Reinforcement Learning Under Partial Observability Guided By Learned Environment Models (2022)6.34
- Prioritized Trajectory Replay: A Replay Memory For Data-driven Reinforcement Learning (2023)0.00