On The Expressivity Of Neural Networks For Deep Reinforcement Learning
2019 Β· Kefan Dong, Yuping Luo, Tengyu Ma
Abstract
We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, \(Q\)-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal \(Q\)-functions and policies are much more complex than the dynamics. We hypothesize many real-world MDPs also have a similar property. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak \(Q\)-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the t
Authors
(none)
Tags
Stats
Related papers
- On The Model-based Stochastic Value Gradient For Continuous Reinforcement Learning (2020)0.00
- Simplifying Model-based RL: Learning Representations, Latent-space Models, And Policies With One Objective (2022)0.00
- PC-MLP: Model-based Reinforcement Learning With Policy Cover Guided Exploration (2021)0.00
- Rethinking Model-based, Policy-based, And Value-based Reinforcement Learning Via The Lens Of Representation Complexity (2023)2.26
- Multi-timescale Ensemble Q-learning For Markov Decision Process Policy Optimization (2024)6.34
- DQN With Model-based Exploration: Efficient Learning On Environments With Sparse Rewards (2019)0.00
- Three Pathways To Neurosymbolic Reinforcement Learning With Interpretable Model And Policy Networks (2024)0.00
- Sample Complexity Of Nonparametric Off-policy Evaluation On Low-dimensional Manifolds Using Deep Networks (2022)0.00