Recursive Least Squares Advantage Actor-critic Algorithms
2022 Β· Yuan Wang, Chunyuan Zhang, Tianzong Yu, et al.
Abstract
As an important algorithm in deep reinforcement learning, advantage actor critic (A2C) has been widely succeeded in both discrete and continuous control tasks with raw pixel inputs, but its sample efficiency still needs to improve more. In traditional reinforcement learning, actor-critic algorithms generally use the recursive least squares (RLS) technology to update the parameter of linear function approximators for accelerating their convergence speed. However, A2C algorithms seldom use this technology to train deep neural networks (DNNs) for improving their sample efficiency. In this paper, we propose two novel RLS-based A2C algorithms and investigate their performance. Both proposed algorithms, called RLSSA2C and RLSNA2C, use the RLS method to train the critic network and the hidden layers of the actor network. The main difference between them is at the policy learning step. RLSSA2C uses an ordinary first-order gradient descent algorithm and the standard policy gradient to learn the
Authors
(none)
Tags
Stats
Related papers
- A Single-loop Deep Actor-critic Algorithm For Constrained Reinforcement Learning With Provable Convergence (2023)5.24
- Local Advantage Actor-critic For Robust Multi-agent Deep Reinforcement Learning (2021)7.81
- Broad Critic Deep Actor Reinforcement Learning For Continuous Control (2024)0.00
- Langevin Soft Actor-critic: Efficient Exploration Through Uncertainty-driven Critic Learning (2025)0.00
- Efficient Exploration In Deep Reinforcement Learning: A Novel Bayesian Actor-critic Algorithm (2024)0.00
- Value Improved Actor Critic Algorithms (2024)0.00
- Adviser-actor-critic: Eliminating Steady-state Error In Reinforcement Learning Control (2025)0.00
- Actor-critic Reinforcement Learning With Phased Actor (2024)0.00