Model-free Low-rank Reinforcement Learning Via Leveraged Entry-wise Matrix Estimation
2024 Β· Stefan Stojanovic, Yassir Jedra, Alexandre Proutiere
Abstract
We consider the problem of learning an \(\epsilon\)-optimal policy in controlled dynamical systems with low-rank latent structure. For this problem, we present LoRa-PI (Low-Rank Policy Iteration), a model-free learning algorithm alternating between policy improvement and policy evaluation steps. In the latter, the algorithm estimates the low-rank matrix corresponding to the (state, action) value function of the current policy using the following two-phase procedure. The entries of the matrix are first sampled uniformly at random to estimate, via a spectral method, the leverage scores of its rows and columns. These scores are then used to extract a few important rows and columns whose entries are further sampled. The algorithm exploits these new samples to complete the matrix estimation using a CUR-like method. For this leveraged matrix estimation procedure, we establish entry-wise guarantees that remarkably, do not depend on the coherence of the matrix but only on its spikiness. These
Authors
(none)
Tags
Stats
Related papers
- Matrix Estimation For Offline Reinforcement Learning With Low-rank Structure (2023)0.00
- Multilinear Tensor Low-rank Approximation For Policy-gradient Methods In Reinforcement Learning (2025)0.00
- Improved Sample Complexity For Reward-free Reinforcement Learning Under Low-rank Mdps (2023)0.00
- Simplifying Model-based RL: Learning Representations, Latent-space Models, And Policies With One Objective (2022)0.00
- Uncertainty-aware Low-rank Q-matrix Estimation For Deep Reinforcement Learning (2021)0.00
- Learning Adversarial Low-rank Markov Decision Processes With Unknown Transition And Full-information Feedback (2023)0.00
- Efficient Learning In Non-stationary Linear Markov Decision Processes (2020)6.77
- Model-free Representation Learning And Exploration In Low-rank Mdps (2021)0.00