The Surprising Difficulty of Search in Model-Based Reinforcement Learning

Wei-Di Chang·Mikael Henaff·Brandon Amos·Gregory Dudek·Scott Fujimoto·2026

arXiv:2601.21306 ↗Google Scholar ↗Semantic Scholar ↗

Abstract

arXiv:2601.21306v2 Announce Type: replace Abstract: This paper investigates search in model-based reinforcement learning (RL). Conventional wisdom holds that long-term predictions and compounding errors are the primary obstacles for model-based RL. We challenge this view, showing that search is not a drop-in replacement for a learned policy. Surprisingly, we find that search can harm performance even when the model is highly accurate. Instead, we show that mitigating overestimation bias matters more than improving model or value function accuracy. Building on this insight, we identify that taking the minimum over an ensemble of value functions effectively addresses this bias and enables effective search, achieving state-of-the-art performance across multiple popular benchmark domains.

Abstract

Related papers