Maximum Likelihood Reinforcement Learning
2026 Β· Fahim Tajwar, Guanning Zeng, Yueer Zhou, et al.
Abstract
Reinforcement learning is the method of choice to train models in sampling-based setups with binary outcome feedback, such as navigation, code generation, and mathematical problem solving. In such settings, models implicitly induce a likelihood over correct rollouts. However, we observe that reinforcement learning does not maximize this likelihood, and instead optimizes only a lower-order approximation. Inspired by this observation, we introduce Maximum Likelihood Reinforcement Learning (MaxRL), a sampling-based framework to approximate maximum likelihood using reinforcement learning techniques. MaxRL addresses the challenges of non-differentiable sampling by defining a compute-indexed family of sample-based objectives that interpolate between standard reinforcement learning and exact maximum likelihood as additional sampling compute is allocated. The resulting objectives admit a simple, unbiased policy-gradient estimator and converge to maximum likelihood optimization in the infinite-
Authors
(none)
Tags
Stats
Related papers
- Harnessing The Power Of Reinforcement Learning For Adaptive MCMC (2025)0.00
- Algorithmic Framework For Model-based Deep Reinforcement Learning With Theoretical Guarantees (2018)0.00
- Sample-efficient Reinforcement Learning Is Feasible For Linearly Realizable Mdps With Limited Revisiting (2021)0.00
- Simplifying Model-based RL: Learning Representations, Latent-space Models, And Policies With One Objective (2022)0.00
- Control-oriented Model-based Reinforcement Learning With Implicit Differentiation (2021)5.84
- Maxinforl: Boosting Exploration In Reinforcement Learning Through Information Gain Maximization (2024)0.00
- Sampling Attacks On Meta Reinforcement Learning: A Minimax Formulation And Complexity Analysis (2022)0.00
- Conservative Optimistic Policy Optimization Via Multiple Importance Sampling (2021)0.00