Dynamic Update-to-data Ratio: Minimizing World Model Overfitting
2023 Β· Nicolai Dorka, Tim Welschehold, Wolfram Burgard
Abstract
Early stopping based on the validation set performance is a popular approach to find the right balance between under- and overfitting in the context of supervised learning. However, in reinforcement learning, even for supervised sub-problems such as world model learning, early stopping is not applicable as the dataset is continually evolving. As a solution, we propose a new general method that dynamically adjusts the update to data (UTD) ratio during training based on under- and overfitting detection on a small subset of the continuously collected experience not used for training. We apply our method to DreamerV2, a state-of-the-art model-based reinforcement learning algorithm, and evaluate it on the DeepMind Control Suite and the Atari \(100\)k benchmark. The results demonstrate that one can better balance under- and overestimation by adjusting the UTD ratio with our approach compared to the default setting in DreamerV2 and that it is competitive with an extensive hyperparameter searc
Authors
(none)
Tags
Stats
Related papers
- MAD-TD: Model-augmented Data Stabilizes High Update Ratio RL (2024)0.00
- The Effectiveness Of World Models For Continual Reinforcement Learning (2022)0.00
- Dissecting Deep RL With High Update Ratios: Combatting Value Divergence (2024)0.00
- Meta-reinforcement Learning With Discrete World Models For Adaptive Load Balancing (2025)0.00
- Dynamic Learning Rate For Deep Reinforcement Learning: A Bandit Approach (2024)0.00
- World Model Agents With Change-based Intrinsic Motivation (2025)0.00
- Dreamerv3-xp: Optimizing Exploration Through Uncertainty Estimation (2025)0.00
- DODT: Enhanced Online Decision Transformer Learning Through Dreamer's Actor-critic Trajectory Forecasting (2024)0.00