Overestimation, Overfitting, And Plasticity In Actor-critic: The Bitter Lesson Of Reinforcement Learning
2024 · Michal Nauman, Michał Bortkiewicz, Piotr Miłoś, et al.
Abstract
Recent advancements in off-policy Reinforcement Learning (RL) have significantly improved sample efficiency, primarily due to the incorporation of various forms of regularization that enable more gradient update steps than traditional agents. However, many of these techniques have been tested in limited settings, often on tasks from single simulation benchmarks and against well-known algorithms rather than a range of regularization approaches. This limits our understanding of the specific mechanisms driving RL improvements. To address this, we implemented over 60 different off-policy agents, each integrating established regularization techniques from recent state-of-the-art algorithms. We tested these agents across 14 diverse tasks from 2 simulation benchmarks, measuring training metrics related to overestimation, overfitting, and plasticity loss -- issues that motivate the examined regularization techniques. Our findings reveal that while the effectiveness of a specific regularization
Authors
(none)
Tags
Stats
Related papers
- Regularization Matters In Policy Optimization (2019)2.68
- B3C: A Minimalist Approach To Offline Multi-agent Reinforcement Learning (2025)0.00
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- ACE : Off-policy Actor-critic With Causality-aware Entropy Regularization (2024)0.00
- Doubly Robust Off-policy Actor-critic Algorithms For Reinforcement Learning (2019)0.00
- What Matters In On-policy Reinforcement Learning? A Large-scale Empirical Study (2020)0.00
- Mitigating Planner Overfitting In Model-based Reinforcement Learning (2018)0.00
- Discriminator-actor-critic: Addressing Sample Inefficiency And Reward Bias In Adversarial Imitation Learning (2018)0.00