Reinforcement Learning Roadmap

From Foundations to Frontiers in Reinforcement Learning

Embark on an exciting journey through the world of Reinforcement Learning (RL), where intelligent agents learn to make decisions through interaction with their environments. This roadmap guides you from foundational concepts to advanced applications, ensuring a deep understanding of both theory and practice. Get ready to explore the fascinating intersections of AI, statistics, and multi-agent systems!

Foundations of Reinforcement Learning

What is Reinforcement Learning?

Reinforcement Learning is a paradigm where agents learn to make decisions by receiving rewards or penalties based on their actions. This concept is crucial as it lays the groundwork for understanding how agents can autonomously improve their performance in complex environments.

Papers (6 top-engaged this week)

An Introduction to Deep Reinforcement Learning

Vincent Francois-Lavet et al. · 2018 · 1464 citations

Meta-Gradient Reinforcement Learning

Zhongwen Xu et al. · 2018 · 96 citations

Reinforcement Learning Algorithms: An Overview And Classification

Fadi Almahamid, Katarina Grolinger · 2022 · 91 citations

Model-based Reinforcement Learning: A Survey

Thomas M. Moerland et al. · 2020 · 82 citations

Reinforcement Learning And Its Connections With Neuroscience And Psychology

Ajay Subramanian, Sharad Chitlangia, Veeky Baths · 2020 · 42 citations

Learning Offline: Memory Replay In Biological And Artificial Reinforcement Learning

Emma L. Roscow, Raymond Chua, Rui Ponte Costa, et al. · 2021 · 35 citations

Tutorials, tools, and deep dives

Markov Decision Processes (MDPs)

MDPs provide a mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker. Understanding MDPs is essential for grasping how RL algorithms operate, as they define the environment in which agents learn.

Papers (6 top-engaged this week)

Model-based Reinforcement Learning: A Survey

Thomas M. Moerland et al. · 2020 · 82 citations

A unified view of entropy-regularized Markov decision processes

Gergely Neu and Anders Jonsson and Vicen\c{c} G\'omez · 2017 · 98 citations

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes

Nathan Kallus et al. · 2019 · 49 citations

Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs

Jianzhun Du et al. · 2020 · 25 citations

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Chen-Yu Wei et al. · 2019 · 19 citations

Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning

Ren\'e Carmona and Mathieu Lauri\`ere and Zongjun Tan · 2019 · 39 citations

Tutorials, tools, and deep dives

Value-Based Reinforcement Learning

Value-based methods focus on estimating the value of states or actions to make optimal decisions. This concept is pivotal as it introduces algorithms like Q-learning, which form the backbone of many RL applications.

Papers (6 top-engaged this week)

Meta-Gradient Reinforcement Learning

Zhongwen Xu et al. · 2018 · 96 citations

Information-Theoretic Considerations in Batch Reinforcement Learning

Jinglin Chen et al. · 2019 · 49 citations

Value Prediction Network

Junhyuk Oh et al. · 2017 · 42 citations

Statistical Inference Of The Value Function For Reinforcement Learning In Infinite Horizon Settings

C. Shi, S. Zhang, W. Lu, et al. · 2020 · 34 citations

CAQL: Continuous Action Q-Learning

Moonkyung Ryu et al. · 2019 · 14 citations

Self Punishment And Reward Backfill For Deep Q-learning

Mohammad Reza Bonyadi, Rui Wang, Maryam Ziaei · 2020 · 8 citations

Tutorials, tools, and deep dives

Policy Gradient Methods

Policy gradient methods optimize the policy directly, allowing for more complex action spaces and strategies. This approach is essential for continuous action spaces and provides a different perspective on how agents can learn to act optimally.

Papers (6 top-engaged this week)

Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Cathy Wu et al. · 2018 · 73 citations

Stein Variational Policy Gradient

Yang Liu et al. · 2017 · 65 citations

Policy Search In Continuous Action Domains: An Overview

Olivier Sigaud, Freek Stulp · 2018 · 49 citations

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

Philip S. Thomas and Emma Brunskill · 2017 · 43 citations

Phasic Policy Gradient

Karl Cobbe et al. · 2020 · 49 citations

Communication-efficient Policy Gradient Methods For Distributed Reinforcement Learning

Tianyi Chen, Kaiqing Zhang, Georgios B. Giannakis, et al. · 2018 · 54 citations

Tutorials, tools, and deep dives

Intermediate Concepts in RL

Exploration vs. Exploitation

The exploration-exploitation dilemma is a fundamental challenge in RL, where agents must balance exploring new actions to discover their rewards and exploiting known actions to maximize immediate rewards. Understanding this balance is key to developing effective RL strategies.

Papers (6 top-engaged this week)

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Haoran Tang et al. · 2016 · 197 citations

EX2: Exploration with Exemplar Models for Deep Reinforcement Learning

Justin Fu and John D. Co-Reyes et al. · 2017 · 61 citations

Exploration Versus Exploitation In Reinforcement Learning: A Stochastic Control Approach

Haoran Wang, Thaleia Zariphopoulou, Xunyu Zhou · 2018 · 19 citations

Conservative Safety Critics for Exploration

Homanga Bharadhwaj et al. · 2020 · 32 citations

Distributional Reinforcement Learning for Efficient Exploration

Borislav Mavrin et al. · 2019 · 30 citations

Reward-Free Exploration for Reinforcement Learning

Chi Jin et al. · 2020 · 26 citations

Tutorials, tools, and deep dives

Model-Based Reinforcement Learning

Model-based RL involves creating a model of the environment to predict outcomes and optimize learning. This approach can significantly improve sample efficiency and is particularly useful in environments where data is scarce.

Papers (6 top-engaged this week)

Benchmarking Model-Based Reinforcement Learning

Tingwu Wang et al. · 2019 · 242 citations

Model-Ensemble Trust-Region Policy Optimization

Thanard Kurutach et al. · 2018 · 218 citations

Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

Sanket Kamthe and Marc Peter Deisenroth · 2017 · 110 citations

Model-based Reinforcement Learning: A Survey

Thomas M. Moerland et al. · 2020 · 82 citations

SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

Marvin Zhang et al. · 2018 · 131 citations

Model-Based Reinforcement Learning via Meta-Policy Optimization

Ignasi Clavera et al. · 2018 · 117 citations

Tutorials, tools, and deep dives

Multi-Agent Reinforcement Learning

In multi-agent settings, multiple agents interact and learn simultaneously, leading to complex dynamics and strategies. This area is fascinating as it mirrors real-world scenarios, such as competitive games and collaborative tasks.

Papers (6 top-engaged this week)

Deep Reinforcement Learning For Multi-agent Systems: A Review Of Challenges, Solutions And Applications

Thanh Thi Nguyen, Ngoc Duy Nguyen, Saeid Nahavandi · 2018 · 1022 citations

Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications

Thanh Thi Nguyen et al. · 2018 · 1001 citations

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Jakob N. Foerster et al. · 2016 · 869 citations

Multi-agent Reinforcement Learning: A Selective Overview Of Theories And Algorithms

Kaiqing Zhang, Zhuoran Yang, Tamer Başar · 2019 · 817 citations

A Survey And Critique Of Multiagent Deep Reinforcement Learning

Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor · 2018 · 473 citations

A Review Of Cooperative Multi-agent Deep Reinforcement Learning

Afshin Oroojlooyjadid, Davood Hajinezhad · 2019 · 349 citations

Tutorials, tools, and deep dives

Offline Reinforcement Learning

Offline RL, or batch RL, allows agents to learn from previously collected data without further interaction with the environment. This is crucial in situations where real-time interaction is costly or dangerous, such as healthcare applications.

Papers (6 top-engaged this week)

Behavior Regularized Offline Reinforcement Learning

Yifan Wu et al. · 2019 · 249 citations

A Minimalist Approach to Offline Reinforcement Learning

Scott Fujimoto et al. · 2021 · 164 citations

Critic Regularized Regression

Ziyu Wang et al. · 2020 · 90 citations

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

Tatsuya Matsushima et al. · 2020 · 50 citations

Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

Noah Y. Siegel et al. · 2020 · 48 citations

When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?

Aviral Kumar et al. · 2022 · 17 citations

Tutorials, tools, and deep dives

Advanced Topics in Reinforcement Learning

Transfer Learning in Reinforcement Learning

Transfer learning in RL enables agents to leverage knowledge gained from one task to improve performance in another, related task. This is particularly valuable in environments where training from scratch is impractical.

Papers (6 top-engaged this week)

Transfer Learning In Deep Reinforcement Learning: A Survey

Zhuangdi Zhu, Kaixiang Lin, Anil K. Jain, et al. · 2020 · 616 citations

RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

Yan Duan et al. · 2016 · 501 citations

Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

Junhyuk Oh et al. · 2017 · 67 citations

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

Andr\'e Barreto et al. · 2019 · 38 citations

Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation

Shani Gamrian et al. · 2018 · 40 citations

Meta reinforcement learning as task inference

Jan Humplik et al. · 2019 · 63 citations

Tutorials, tools, and deep dives

Distributional Reinforcement Learning

Distributional RL focuses on modeling the distribution of returns rather than just the expected value. This approach can lead to more robust learning and better performance in uncertain environments.

Papers (6 top-engaged this week)

Distributional Reinforcement Learning With Quantile Regression

Will Dabney, Mark Rowland, Marc G. Bellemare, et al. · 2017 · 362 citations

Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors

Jingliang Duan et al. · 2020 · 296 citations

Implicit Quantile Networks for Distributional Reinforcement Learning

Will Dabney et al. · 2018 · 197 citations

An Analysis of Categorical Distributional Reinforcement Learning

Mark Rowland et al. · 2018 · 41 citations

A Comparative Analysis of Expected and Distributional Reinforcement Learning

Clare Lyle et al. · 2019 · 21 citations

Conservative Offline Distributional Reinforcement Learning

Yecheng Jason Ma et al. · 2021 · 15 citations

Tutorials, tools, and deep dives

Applications of Deep Reinforcement Learning

Deep RL has been successfully applied in various fields, from game AI to robotics. Understanding these applications can inspire innovative solutions and demonstrate the real-world impact of RL research.

Papers (6 top-engaged this week)

An Introduction to Deep Reinforcement Learning

Vincent Francois-Lavet et al. · 2018 · 1464 citations

Deep Reinforcement Learning For Multi-agent Systems: A Review Of Challenges, Solutions And Applications

Thanh Thi Nguyen, Ngoc Duy Nguyen, Saeid Nahavandi · 2018 · 1022 citations

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

Mel Vecerik et al. · 2017 · 510 citations

Deep Reinforcement Learning that Matters

Peter Henderson et al. · 2017 · 365 citations

Deep Reinforcement Learning and its Neuroscientific Implications

Matthew Botvinick et al. · 2020 · 215 citations

Relational Deep Reinforcement Learning

Vinicius Zambaldi et al. · 2018 · 159 citations

Tutorials, tools, and deep dives

Challenges in Reinforcement Learning

Despite its successes, RL faces numerous challenges, including sample inefficiency, stability, and scalability. Addressing these challenges is crucial for advancing the field and making RL more applicable in real-world scenarios.

Papers (6 top-engaged this week)

Deep Reinforcement Learning For Multi-agent Systems: A Review Of Challenges, Solutions And Applications

Thanh Thi Nguyen, Ngoc Duy Nguyen, Saeid Nahavandi · 2018 · 1022 citations

Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications

Thanh Thi Nguyen et al. · 2018 · 1001 citations

RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

Yan Duan et al. · 2016 · 501 citations

Robust Adversarial Reinforcement Learning

Lerrel Pinto et al. · 2017 · 383 citations

Challenges of Real-World Reinforcement Learning

Gabriel Dulac-Arnold et al. · 2019 · 255 citations

Behavior Regularized Offline Reinforcement Learning

Yifan Wu et al. · 2019 · 249 citations

Tutorials, tools, and deep dives

Paper lists auto-refresh weekly from live engagement data. Last refresh: 2026-06-08.