Zeroth-order Actor-critic: An Evolutionary Framework For Sequential Decision Problems
2022 Β· Yuheng Lei, Yao Lyu, Guojian Zhan, et al.
Abstract
Evolutionary algorithms (EAs) have shown promise in solving sequential decision problems (SDPs) by simplifying them to static optimization problems and searching for the optimal policy parameters in a zeroth-order way. While these methods are highly versatile, they often suffer from high sample complexity due to their ignorance of the underlying temporal structures. In contrast, reinforcement learning (RL) methods typically formulate SDPs as Markov Decision Process (MDP). Although more sample efficient than EAs, RL methods are restricted to differentiable policies and prone to getting stuck in local optima. To address these issues, we propose a novel evolutionary framework Zeroth-Order Actor-Critic (ZOAC). We propose to use step-wise exploration in parameter space and theoretically derive the zeroth-order policy gradient. We further utilize the actor-critic architecture to effectively leverage the Markov property of SDPs and reduce the variance of gradient estimators. In each iteration
Authors
(none)
Tags
Stats
Related papers
- Actor-critic Reinforcement Learning With Phased Actor (2024)0.00
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- A Single-loop Deep Actor-critic Algorithm For Constrained Reinforcement Learning With Provable Convergence (2023)5.24
- Learning Sampling Policy For Faster Derivative Free Optimization (2021)0.00
- Ancestral Reinforcement Learning: Unifying Zeroth-order Optimization And Genetic Algorithms For Reinforcement Learning (2024)0.00
- Actor-dual-critic Dynamics For Zero-sum And Identical-interest Stochastic Games (2026)0.00
- Compatible Gradient Approximations For Actor-critic Algorithms (2024)0.00
- Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor (2018)0.00