Awesome Reinforcement Learning

📄Papers 🧭Topics 🔥Trending 🗺️Map 🏆Leaderboards 🎓Learn 🤖Ask AI

⋯More

👥Authors 📚Reading Packs 📊Datasets 🛠️Tools 📰News 📝Blogs ✉️Newsletter 🔖Saved

← all topics overview

Offline RL

loading…

Stay Updated

Submit a paper · Privacy · Terms

© 2026 Awesome Papers.

Awesome Offline RL — curated papers, datasets & benchmarks · Awesome Reinforcement Learning

← all topics overview

Awesome Offline RL

Offline RL is one of the most active areas in Awesome Reinforcement Learning — 623 papers in this collection, evaluated on datasets like BeyondAIME, BrowseComp, IMO-AnswerBench. A strong starting point is "Single-Rollout Asynchronous Optimization for Agentic Reinforcement Learning".

Datasets & benchmarks

BeyondAIME1 paper · 🤗

BrowseComp1 paper · 🤗

IMO-AnswerBench1 paper · 🤗

Humanoid Motion Imagery (HMI)1 paper

Jira REST v-31 paper

MIMIC-IV1 paper

Key papers

60 papers · trending (default)numbers = 🔥 heat

Single-Rollout Asynchronous Optimization for Agentic Reinforcement Learning (2026)
Zhenyu Hou et al.
10.35
Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents (2026)
Changdae Oh et al.
8.17
PERRY: Policy Evaluation with Confidence Intervals using Auxiliary Data (2025)
Aishwarya Mandyam et al.
4.93
Learning Generalizable Skill Policy with Data-Efficient Unsupervised RL (2026)
Jongchan Park et al.
4.39
Loss Smoothing for Stable Adaptation Under Distribution Shift (2026)
Darshan Patil et al.
4.39
Beyond Next-Token Prediction: An RLVR Proof of Concept for Tool-Use Agents on Atlassian Workflows (2026)
Karthikeya Aditya Vissa et al.
4.39
Selective Timestep Weighting and Advantage-Based Replay for Sample-Efficient Diffusion RLHF (2026)
Eric Zhu et al.
4.39
When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning? (2026)
Xuanfei Ren et al.
4.33
Reinforcement Learning Foundation Models Should Already Be A Thing (2026)
Abdelrahman Zighem et al.
4.33
Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning (2026)
Xiaoyue Xu et al.
4.33
Forecasting what Matters: Decision-Focused RL for Controlled EV Charging with Unknown Departure Times (2026)
Giuseppe Gabriele et al.
4.33
Weight-Space Geometry of Offline Reasoning Training (2026)
Aleksandr Nikolich et al.
4.33
Offline Reinforcement Learning for Warehouse SLAM Throughput Control (2026)
Tina Dongxu Li et al.
4.33
Towards Scalable Multi-Task Reinforcement Learning with Large Decision Models (2026)
Thibaut Kulak
4.33
Beyond One-Size-Fits-All: Diagnosis-Driven Online Reinforcement Learning with Offline Priors (2026)
Guozheng Ma et al.
4.33
RAMAC: Multimodal Risk-Aware Offline Reinforcement Learning and the Role of Behavior Regularization (2025)
Kai Fukazawa et al.
2.56
Task-guided IRL In Pomdps That Scales (2022)
Franck Djeumou, Christian Ellis, Murat Cubuktepe, et al.
2.26
Beyond Pessimism: Offline Learning in KL-regularized Games (2026)
Yuheng Zhang et al.
1.83
Long-horizon Rollout Via Dynamics Diffusion For Offline Reinforcement Learning (2024)
Hanye Zhao, Xiaoshen Han, Zhengbang Zhu, et al.
1.81
StagePilot: A Deep Reinforcement Learning Agent for Stage-Controlled Cybergrooming Simulation (2026)
Heajun An et al.
1.72
On the Role of Computation in Reinforcement Learning (2026)
Raj Ghugare et al.
1.72
Resource-Conscious RL Algorithms for Deep Brain Stimulation (2026)
Arkaprava Gupta et al.
1.67
Safe Exploration via Policy Priors (2026)
Manuel Wendl et al.
1.67
Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach (2025)
Sebastian Reboul and H\'el\`ene Halconruy and Randal Douc
1.50
Revisiting Actor-Critic Methods in Discrete Action Off-Policy Reinforcement Learning (2025)
Reza Asad et al.
1.44
Failure Modes of Maximum Entropy RLHF (2025)
\"Omer Veysel \c{C}a\u{g}atan and Bar{\i}\c{s} Akg\"un
1.44
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction (2025)
Lujie Yang et al.
1.44
Offline Behavioral Data Selection (2025)
Shiye Lei, Zhihao Cheng, Dacheng Tao
1.28
Offline Critic-guided Diffusion Policy For Multi-user Delay-constrained Scheduling (2025)
Zhuoran Li, Ruishuo Chen, Hai Zhong, et al.
1.28
Horizon Reduction As Information Loss In Offline Reinforcement Learning (2025)
Uday Kumar Nidadala, Venkata Bhumika Guthi
1.28
One Policy but Many Worlds: A Scalable Unified Policy for Versatile Humanoid Locomotion (2025)
Yahao Fan et al.
1.22
Efficient Reinforcement Learning by Guiding Generalist World Models with Non-Curated Data (2025)
Yi Zhao et al.
1.06
Federated Offline Policy Learning (2023)
Aldo Gael Carranza, Susan Athey
0.00
Beyond Conservatism: Diffusion Policies In Offline Multi-agent Reinforcement Learning (2023)
Zhuoran Li, Ling Pan, Longbo Huang
0.00
The Curse Of Passive Data Collection In Batch Reinforcement Learning (2021)
Chenjun Xiao, Ilbin Lee, Bo Dai, et al.
0.00
Q($\lambda$) with Off-Policy Corrections (2016)
Anna Harutyunyan and Marc G. Bellemare and Tom Stepleton and Remi Munos
—
Investigating practical linear temporal difference learning (2016)
Adam White et al.
—
Algorithms for Batch Hierarchical Reinforcement Learning (2016)
Tiancheng Zhao et al.
—
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning (2016)
Philip S. Thomas and Emma Brunskill
—
Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics (2016)
Michael Herman et al.
—
Learning Purposeful Behaviour in the Absence of Rewards (2016)
Marlos C. Machado and Michael Bowling
—
Model-Free Imitation Learning with Policy Optimization (2016)
Jonathan Ho et al.
—
Difference of Convex Functions Programming Applied to Control with Expert Data (2016)
Bilal Piot et al.
—
Safe and Efficient Off-Policy Reinforcement Learning (2016)
R\'emi Munos et al.
—
Policy Networks with Two-Stage Training for Dialogue Systems (2016)
Mehdi Fatemi et al.
—
Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation (2016)
Josiah P. Hanna et al.
—
Learning from Conditional Distributions via Dual Embeddings (2016)
Bo Dai et al.
—
Guided Policy Search as Approximate Mirror Descent (2016)
William Montgomery et al.
—
A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward (2016)
S.A. Murphy et al.
—
Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay (2016)
Ionel-Alexandru Hosu et al.
—
Neuroevolution-Based Inverse Reinforcement Learning (2016)
Karan K. Budhraja and Tim Oates
—
Density Matching Reward Learning (2016)
Sungjoon Choi et al.
—
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems (2016)
Zachary C. Lipton et al.
—
Modelling Stock-market Investors as Reinforcement Learning Agents [Correction] (2016)
Alvin Pastore et al.
—
Decentralized Non-communicating Multiagent Collision Avoidance with Deep Reinforcement Learning (2016)
Yu Fan Chen et al.
—
Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates (2016)
Shixiang Gu and Ethan Holly and Timothy Lillicrap and Sergey Levine
—
Particle Swarm Optimization for Generating Interpretable Fuzzy Reinforcement Learning Policies (2016)
Daniel Hein et al.
—
Learning Runtime Parameters in Computer Systems with Delayed Experience Injection (2016)
Michael Schaarschmidt et al.
—
Sample Efficient Actor-Critic with Experience Replay (2016)
Ziyu Wang et al.
—
Combining policy gradient and Q-learning (2016)
Brendan O'Donoghue et al.
—