Can Agents Run Relay Race With Strangers? Generalization Of RL To Out-of-distribution Trajectories
2023 Β· Li-Cheng Lan, Huan Zhang, Cho-Jui Hsieh
Abstract
In this paper, we define, evaluate, and improve the ``relay-generalization'' performance of reinforcement learning (RL) agents on the out-of-distribution ``controllable'' states. Ideally, an RL agent that generally masters a task should reach its goal starting from any controllable state of the environment instead of memorizing a small set of trajectories. For example, a self-driving system should be able to take over the control from humans in the middle of driving and must continue to drive the car safely. To practically evaluate this type of generalization, we start the test agent from the middle of other independently well-trained *stranger* agents' trajectories. With extensive experimental evaluation, we show the prevalence of *generalization failure* on controllable states from stranger agents. For example, in the Humanoid environment, we observed that a well-trained Proximal Policy Optimization (PPO) agent, with only 3.9% failure rate during regular testing, failed on 81.6% of t
Authors
(none)
Tags
Stats
Related papers
- Good Actions Succeed, Bad Actions Generalize: A Case Study On Why RL Generalizes Better (2025)0.00
- Bad Habits: Policy Confounding And Out-of-trajectory Generalization In RL (2023)0.00
- On The Power Of Pre-training For Generalization In RL: Provable Benefits And Hardness (2022)0.00
- The Role Of Pretrained Representations For The OOD Generalization Of Reinforcement Learning Agents (2021)0.00
- Assessing Generalization In Deep Reinforcement Learning (2018)0.00
- Rethinking Out-of-distribution Detection For Reinforcement Learning: Advancing Methods For Evaluation And Detection (2024)2.26
- Goal Misgeneralization In Deep Reinforcement Learning (2021)0.00
- Measuring And Characterizing Generalization In Deep Reinforcement Learning (2018)9.76