Stealing That Free Lunch: Exposing The Limits Of Dyna-style Reinforcement Learning
2024 Β· Brett Barkley, David Fridovich-Keil
Abstract
Dyna-style off-policy model-based reinforcement learning (DMBRL) algorithms are a family of techniques for generating synthetic state transition data and thereby enhancing the sample efficiency of off-policy RL algorithms. This paper identifies and investigates a surprising performance gap observed when applying DMBRL algorithms across different benchmark environments with proprioceptive observations. We show that, while DMBRL algorithms perform well in OpenAI Gym, their performance can drop significantly in DeepMind Control Suite (DMC), even though these settings offer similar tasks and identical physics backends. Modern techniques designed to address several key issues that arise in these settings do not provide a consistent improvement across all environments, and overall our results show that adding synthetic rollouts to the training process -- the backbone of Dyna-style algorithms -- significantly degrades performance across most DMC environments. Our findings contribute to a deep
Authors
(none)
Tags
Stats
Related papers
- A Forensic Analysis Of Synthetic Data In RL: Diagnosing And Solving Algorithmic Failures In Model-based Policy Optimization (2025)0.00
- On The Mistaken Assumption Of Interchangeable Deep Reinforcement Learning Implementations (2025)0.00
- Deep Reinforcement Learning In A Handful Of Trials Using Probabilistic Dynamics Models (2018)0.00
- Behavioral Priors And Dynamics Models: Improving Performance And Domain Transfer In Offline RL (2021)0.00
- Trade-off On Sim2real Learning: Real-world Learning Faster Than Simulations (2020)3.58
- The Ladder In Chaos: A Simple And Effective Improvement To General DRL Algorithms By Policy Path Trimming And Boosting (2023)0.00
- Understanding The Performance Gap In Preference Learning: A Dichotomy Of RLHF And DPO (2025)0.00
- Control-optimized Deep Reinforcement Learning For Artificially Intelligent Autonomous Systems (2025)0.00