Provably Efficient Reinforcement Learning In Partially Observable Dynamical Systems

Abstract

We study Reinforcement Learning for partially observable dynamical systems using function approximation. We propose a new \textit\{Partially Observable Bilinear Actor-Critic framework\}, that is general enough to include models such as observable tabular Partially Observable Markov Decision Processes (POMDPs), observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition. Under this framework, we propose an actor-critic style algorithm that is capable of performing agnostic policy learning. Given a policy class that consists of memory based policies (that look at a fixed-length window of recent observations), and a value function class that consists of functions taking both memory and future observations as inputs, our algorithm learns to compete against the best memory-based policy in the given policy class. For certain examples such as un

Provably Efficient Reinforcement Learning In Partially Observable Dynamical Systems

Abstract

Authors

Tags

Stats

Related papers