Mean Actor Critic
2017 Β· Cameron Allen, Kavosh Asadi, Melrose Roderick, et al.
Abstract
We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. We prove that this approach reduces variance in the policy gradient estimate relative to traditional actor-critic methods. We show empirical results on two control domains and on six Atari games, where MAC is competitive with state-of-the-art policy search algorithms.
Authors
(none)
Tags
Stats
Related papers
- How To Learn A Useful Critic? Model-based Action-gradient-estimator Policy Optimization (2020)0.00
- FACMAC: Factored Multi-agent Centralised Policy Gradients (2020)0.00
- Multi-preference Actor Critic (2019)0.00
- Decomposed Soft Actor-critic Method For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Guide Actor-critic For Continuous Control (2017)0.00
- Actor-critic Reinforcement Learning With Phased Actor (2024)0.00
- Actor Critic Learning Algorithms For Mean-field Control With Moment Neural Networks (2023)0.00
- Local Advantage Actor-critic For Robust Multi-agent Deep Reinforcement Learning (2021)7.81