A Forensic Analysis Of Synthetic Data In RL: Diagnosing And Solving Algorithmic Failures In Model-based Policy Optimization
2025 Β· Brett Barkley, David Fridovich-Keil
Abstract
Synthetic data is a core component of data-efficient Dyna-style model-based reinforcement learning, yet it can also degrade performance. We study when it helps, where it fails, and why, and we show that addressing the resulting failure modes enables policy improvement that was previously unattainable. We focus on Model-Based Policy Optimization (MBPO), which performs actor and critic updates using synthetic action counterfactuals. Despite reports of strong and generalizable sample-efficiency gains in OpenAI Gym, recent work shows that MBPO often underperforms its model-free counterpart, Soft Actor-Critic (SAC), in the DeepMind Control Suite (DMC). Although both suites involve continuous control with proprioceptive robots, this shift leads to sharp performance losses across seven challenging DMC tasks, with MBPO failing in cases where claims of generalization from Gym would imply success. This reveals how environment-specific assumptions can become implicitly encoded into algorithm desi
Authors
(none)
Tags
Stats
Related papers
- Stealing That Free Lunch: Exposing The Limits Of Dyna-style Reinforcement Learning (2024)0.00
- On Effective Scheduling Of Model-based Reinforcement Learning (2021)0.00
- Towards Causal Model-based Policy Optimization (2025)0.00
- Model-free \(\mu\) Synthesis Via Adversarial Reinforcement Learning (2021)0.00
- Multi-objective Model-based Policy Search For Data-efficient Learning With Sparse Rewards (2018)0.00
- When To Trust Your Model: Model-based Policy Optimization (2019)0.00
- Co-adaptation Of Algorithmic And Implementational Innovations In Inference-based Deep Reinforcement Learning (2021)0.00
- Overcoming Model Bias For Robust Offline Deep Reinforcement Learning (2020)11.58