A2C Is A Special Case Of PPO
2022 Β· Shengyi Huang, Anssi Kanervisto, Antonin Raffin, et al.
Abstract
Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years. A common understanding is that A2C and PPO are separate algorithms because PPO's clipped objective appears significantly different than A2C's objective. In this paper, however, we show A2C is a special case of PPO. We present theoretical justifications and pseudocode analysis to demonstrate why. To validate our claim, we conduct an empirical experiment using \texttt\{Stable-baselines3\}, showing A2C and PPO produce the \textit\{exact\} same models when other settings are controlled.
Authors
(none)
Tags
Stats
Related papers
- The Surprising Effectiveness Of PPO In Cooperative, Multi-agent Games (2021)0.00
- A Comparative Study Of Deep Reinforcement Learning Models: DQN Vs PPO Vs A2C (2024)0.00
- AM-PPO: (advantage) Alpha-modulation With Proximal Policy Optimization (2025)0.00
- Revisiting Design Choices In Proximal Policy Optimization (2020)0.00
- Truly Proximal Policy Optimization (2019)0.00
- What's Behind Ppo's Collapse In Long-cot? Value Optimization Holds The Secret (2025)0.00
- Proximal Policy Optimization Algorithms (2017)0.00
- PPO-CMA: Proximal Policy Optimization With Covariance Matrix Adaptation (2018)0.00