Diminishing Return Of Value Expansion Methods In Model-based Reinforcement Learning
2023 Β· Daniel Palenicek, Michael Lutter, Joao Carvalho, et al.
Abstract
Model-based reinforcement learning is one approach to increase sample efficiency. However, the accuracy of the dynamics model and the resulting compounding error over modelled trajectories are commonly regarded as key limitations. A natural question to ask is: How much more sample efficiency can be gained by improving the learned dynamics models? Our paper empirically answers this question for the class of model-based value expansion methods in continuous control problems. Value expansion methods should benefit from increased model accuracy by enabling longer rollout horizons and better value function approximations. Our empirical study, which leverages oracle dynamics models to avoid compounding model errors, shows that (1) longer horizons increase sample efficiency, but the gain in improvement decreases with each additional expansion step, and (2) the increased model accuracy only marginally increases the sample efficiency compared to learned models with identical horizons. Therefore
Authors
(none)
Tags
Stats
Related papers
- Model-based Value Estimation For Efficient Model-free Reinforcement Learning (2018)0.00
- Sample-efficient Reinforcement Learning With Stochastic Ensemble Value Expansion (2018)0.00
- Efficient And Robust Reinforcement Learning With Uncertainty-based Value Expansion (2019)0.00
- Planning With Exploration: Addressing Dynamics Bottleneck In Model-based Reinforcement Learning (2020)0.00
- On The Model-based Stochastic Value Gradient For Continuous Reinforcement Learning (2020)0.00
- Efficient Exploration In Continuous-time Model-based Reinforcement Learning (2023)0.00
- Learning To Combat Compounding-error In Model-based Reinforcement Learning (2019)0.00
- Disentangling Dynamics And Returns: Value Function Decomposition With Future Prediction (2019)0.00