Memoryless Policy Iteration For Episodic Pomdps
2025 Β· Roy van Zuijlen, Duarte Antunes
Abstract
Memoryless and finite-memory policies offer a practical alternative for solving partially observable Markov decision processes (POMDPs), as they operate directly in the output space rather than in the high-dimensional belief space. However, extending classical methods such as policy iteration to this setting remains difficult; the output process is non-Markovian, making policy-improvement steps interdependent across stages. We introduce a new family of monotonically improving policy-iteration algorithms that alternate between single-stage output-based policy improvements and policy evaluations according to a prescribed periodic pattern. We show that this family admits optimal patterns that maximize a natural computational-efficiency index, and we identify the simplest pattern with minimal period. Building on this structure, we further develop a model-free variant that estimates values from data and learns memoryless policies directly. Across several POMDPs examples, our method achieves
Authors
(none)
Tags
Stats
Related papers
- Scaling Internal-state Policy-gradient Methods For Pomdps (2025)0.00
- Sequential Monte Carlo For Policy Optimization In Continuous Pomdps (2025)0.00
- Posterior Sampling-based Online Learning For Episodic Pomdps (2023)0.00
- Recurrent Natural Policy Gradient For Pomdps (2024)0.00
- Statistical Tractability Of Off-policy Evaluation Of History-dependent Policies In Pomdps (2025)0.00
- How Memory Architecture Affects Learning In A Simple POMDP: The Two-hypothesis Testing Problem (2021)0.00
- Optimistic Policy Optimization Is Provably Efficient In Non-stationary Mdps (2021)0.00
- Model-based Learning Of Near-optimal Finite-window Policies In Pomdps (2026)0.00