Implicit Under-parameterization Inhibits Data-efficient Deep Reinforcement Learning
2020 Β· Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, et al.
Abstract
We identify an implicit under-parameterization phenomenon in value-based deep RL methods that use bootstrapping: when value functions, approximated using deep neural networks, are trained with gradient descent using iterated regression onto target values generated by previous instances of the value network, more gradient updates decrease the expressivity of the current value network. We characterize this loss of expressivity via a drop in the rank of the learned value network features, and show that this typically corresponds to a performance drop. We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings. We formally analyze this phenomenon and show that it results from a pathological interaction between bootstrapping and gradient-based optimization. We further show that mitigating implicit under-parameterization by controlling rank collapse can improve performance.
Authors
(none)
Tags
Stats
Related papers
- DR3: Value-based Deep Reinforcement Learning Requires Explicit Regularization (2021)0.00
- Dissecting Deep RL With High Update Ratios: Combatting Value Divergence (2024)0.00
- Target Networks And Over-parameterization Stabilize Off-policy Bootstrapping With Function Approximation (2024)0.00
- An Information-theoretic Optimality Principle For Deep Reinforcement Learning (2017)0.00
- Improving Deep Reinforcement Learning By Reducing The Chain Effect Of Value And Policy Churn (2024)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Pessimistic Bootstrapping For Uncertainty-driven Offline Reinforcement Learning (2022)0.00
- Parameter-free Reduction Of The Estimation Bias In Deep Reinforcement Learning For Deterministic Policy Gradients (2021)0.00