Good Actions Succeed, Bad Actions Generalize: A Case Study On Why RL Generalizes Better
2025 Β· Meng Song
Abstract
Supervised learning (SL) and reinforcement learning (RL) are both widely used to train general-purpose agents for complex tasks, yet their generalization capabilities and underlying mechanisms are not yet fully understood. In this paper, we provide a direct comparison between SL and RL in terms of zero-shot generalization. Using the Habitat visual navigation task as a testbed, we evaluate Proximal Policy Optimization (PPO) and Behavior Cloning (BC) agents across two levels of generalization: state-goal pair generalization within seen environments and generalization to unseen environments. Our experiments show that PPO consistently outperforms BC across both zero-shot settings and performance metrics-success rate and SPL. Interestingly, even though additional optimal training data enables BC to match PPO's zero-shot performance in SPL, it still falls significantly behind in success rate. We attribute this to a fundamental difference in how models trained by these algorithms generalize:
Authors
(none)
Tags
Stats
Related papers
- The Principle Of Unchanged Optimality In Reinforcement Learning Generalization (2019)0.00
- The Generalization Gap In Offline Reinforcement Learning (2023)0.00
- Assessing Generalization In Deep Reinforcement Learning (2018)0.00
- Can Agents Run Relay Race With Strangers? Generalization Of RL To Out-of-distribution Trajectories (2023)0.00
- Generalizing Skills With Semi-supervised Reinforcement Learning (2016)0.00
- Procedural Generalization By Planning With Self-supervised World Models (2021)0.00
- On The Power Of Pre-training For Generalization In RL: Provable Benefits And Hardness (2022)0.00
- Measuring And Characterizing Generalization In Deep Reinforcement Learning (2018)9.76