Multilinear Tensor Low-rank Approximation For Policy-gradient Methods In Reinforcement Learning
2025 Β· Sergio Rozada, Hoi-To Wai, Antonio G. Marques
Abstract
Reinforcement learning (RL) aims to estimate the action to take given a (time-varying) state, with the goal of maximizing a cumulative reward function. Predominantly, there are two families of algorithms to solve RL problems: value-based and policy-based methods, with the latter designed to learn a probabilistic parametric policy from states to actions. Most contemporary approaches implement this policy using a neural network (NN). However, NNs usually face issues related to convergence, architectural suitability, hyper-parameter selection, and underutilization of the redundancies of the state-action representations (e.g. locally similar states). This paper postulates multi-linear mappings to efficiently estimate the parameters of the RL policy. More precisely, we leverage the PARAFAC decomposition to design tensor low-rank policies. The key idea involves collecting the policy parameters into a tensor and leveraging tensor-completion techniques to enforce low rank. We establish theoret
Authors
(none)
Tags
Stats
Related papers
- Tensor And Matrix Low-rank Value-function Approximation In Reinforcement Learning (2022)7.81
- Matrix Estimation For Offline Reinforcement Learning With Low-rank Structure (2023)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Model-free Low-rank Reinforcement Learning Via Leveraged Entry-wise Matrix Estimation (2024)0.00
- Policy Gradient For Reinforcement Learning With General Utilities (2022)0.00
- Variational Policy Gradient Method For Reinforcement Learning With General Utilities (2020)0.00
- Batch Reinforcement Learning With A Nonparametric Off-policy Policy Gradient (2020)0.00
- Model-free Policy Learning With Reward Gradients (2021)0.00