Linear Function Approximation As A Computationally Efficient Method To Solve Classical Reinforcement Learning Challenges
2024 Β· Hari Srikanth
Abstract
Neural Network based approximations of the Value function make up the core of leading Policy Based methods such as Trust Regional Policy Optimization (TRPO) and Proximal Policy Optimization (PPO). While this adds significant value when dealing with very complex environments, we note that in sufficiently low State and action space environments, a computationally expensive Neural Network architecture offers marginal improvement over simpler Value approximation methods. We present an implementation of Natural Actor Critic algorithms with actor updates through Natural Policy Gradient methods. This paper proposes that Natural Policy Gradient (NPG) methods with Linear Function Approximation as a paradigm for value approximation may surpass the performance and speed of Neural Network based models such as TRPO and PPO within these environments. Over Reinforcement Learning benchmarks Cart Pole and Acrobot, we observe that our algorithm trains much faster than complex neural network architecture
Authors
(none)
Tags
Stats
Related papers
- Cautiously Optimistic Policy Optimization And Exploration With Linear Function Approximation (2021)0.00
- The Role Of Lookahead And Approximate Policy Evaluation In Reinforcement Learning With Linear Value Function Approximation (2021)0.00
- Convergent Actor-critic Algorithms Under Off-policy Training And Function Approximation (2018)0.00
- Provably Efficient Reinforcement Learning With Linear Function Approximation (2019)11.76
- Learning Value Functions In Deep Policy Gradients Using Residual Variance (2020)0.00
- Adaptive Approximate Policy Iteration (2020)0.00
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00
- Blending MPC & Value Function Approximation For Efficient Reinforcement Learning (2020)0.00