Awesome Reinforcement Learning

📄Papers 🧭Topics 🔥Trending 🗺️Map 🏆Leaderboards 🎓Learn 🤖Ask AI

⋯More

👥Authors 📚Reading Packs 📊Datasets 🛠️Tools 📰News 📝Blogs ✉️Newsletter 🔖Saved

← all topics overview

Exploration

loading…

Stay Updated

Submit a paper · Privacy · Terms

© 2026 Awesome Papers.

Awesome Exploration — curated papers, datasets & benchmarks · Awesome Reinforcement Learning

← all topics overview

Awesome Exploration

Exploration is one of the most active areas in Awesome Reinforcement Learning — 1,310 papers in this collection, evaluated on datasets like ALFWorld, ContextVul, DeepMind Control Suite. A strong starting point is "CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning".

Datasets & benchmarks

ALFWorld1 paper · 🤗

ContextVul1 paper

DeepMind Control Suite1 paper

LIBERO-Spatial1 paper

LiveCodeBench1 paper

Meta-World1 paper

nuScenes1 paper

Robotic Manipulation tasks1 paper

Key papers

60 papers · trending (default)numbers = 🔥 heat

CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning (2018)
C\'edric Colas et al.
12.25
Parallel Exploration Via Negatively Correlated Search (2019)
Peng Yang, Qi Yang, Ke Tang, et al.
8.60
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model (2020)
Gen Li et al.
7.83
CCLF: A Contrastive-curiosity-driven Learning Framework For Sample-efficient Reinforcement Learning (2022)
Chenyu Sun, Hangwei Qian, Chunyan Miao
7.16
SimKO: Simple Pass@K Policy Optimization (2025)
Ruotian Peng et al.
6.80
Design Space Exploration Of Approximate Computing Techniques With A Reinforcement Learning Approach (2023)
Sepide Saeedi, Alessandro Savino, Stefano di Carlo
5.84
Spotlight: Synergizing Seed Exploration and Spot GPUs for DiT RL Post-Training (2026)
Ruiqi Lai et al.
5.44
Playing 20 Question Game with Policy-Based Reinforcement Learning (2018)
Huang Hu et al.
5.24
Full Bayesian Reinforcement Learning via LF-IBIS (2026)
Stefano Masini et al.
5.01
Mathematical methods of reinforcement learning (2026)
Denis Belomestny et al.
5.01
Entropy Pacing Policy Optimization for Multi-Task Agentic Reinforcement Learning (2026)
Zetian Hu et al.
5.01
Diverse Exploration Via Conjugate Policies For Policy Gradient Methods (2019)
Andrew Cohen, Xingye Qiao, Lei Yu, et al.
4.52
Reinforcement Learning With Success Induced Task Prioritization (2022)
Maria Nesterova, Alexey Skrynnik, Aleksandr Panov
4.52
Learning Generalizable Skill Policy with Data-Efficient Unsupervised RL (2026)
Jongchan Park et al.
4.39
Task-Relevant Representation Decoupling for Visual Reinforcement Learning Generalization (2026)
Jinwen Wang et al.
4.39
Local Motion Matters: A Deconstruct-Recompose Paradigm for Reinforcement Learning Pre-training from Videos (2026)
Jinwen Wang et al.
4.39
From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training (2026)
Jinwen Wang et al.
4.39
Don't Let Gains FADE: Breaking Down Policy Gradient Weights in RL (2026)
Juliette Decugis et al.
4.39
Inertia-1: An Open Exploration of Wearable Motion Foundation Models (2026)
Zongzhe Xu et al.
4.39
UP: Unbounded Positive Asymmetric Optimization for Breaking the Exploration-Stability Dilemma (2026)
Chongyu Fan et al.
4.39
Progressive Crystallization: Turning Agent Exploration into Deterministic, Lower-Cost Workflows in Production (2026)
Arun Malik
4.39
HPG-Diff: Hierarchical physics-guided diffusion with differentiable connectivity constraints for topology optimization (2026)
Jinbo Yang et al.
4.39
RLVP: Penalize the Path, Reward the Outcome (2026)
Bojie Li et al.
4.39
Creativity from Friction: Human-AI Interaction for Exploratory Structural Design (2026)
Ricardo Maia Avelino et al.
4.39
Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier (2026)
Lorenz Wolf et al.
4.33
UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning (2026)
Mohamed Nabail et al.
4.33
ASALT: Adaptive State Alignment for Lateral Transfer in Multi-agent Reinforcement Learning (2026)
Anurag Akula et al.
4.33
Uncertainty-aware reinforcement learning for chemical language models (2026)
Borja Medina et al.
4.33
ExTra: Exploratory Trajectory Optimization for Language Model Reinforcement Learning (2026)
Wenyang Hu et al.
4.33
Beyond One-Size-Fits-All: Diagnosis-Driven Online Reinforcement Learning with Offline Priors (2026)
Guozheng Ma et al.
4.33
FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation (2026)
Shuyi Zhang et al.
4.33
Ornstein-uhlenbeck Adaptation As A Mechanism For Learning In Brains And Machines (2024)
Jesus Garcia Fernandez, Nasir Ahmad, Marcel van Gerven
3.58
Lightweight Safe Reinforcement Learning for End-to-End UAV Navigation (2026)
Shenghui Zhang et al.
3.51
Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling (2026)
Yiran Guo et al.
3.23
CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving (2025)
Xiaoji Zheng et al.
3.10
How Log-barrier Helps Exploration In Policy Optimization (2026)
Leonardo Cesani, Matteo Papini, Marcello Restelli
1.94
Regret-aware Policy Optimization: Environment-level Memory For Replay Suppression Under Delayed Harm (2026)
Prakul Sunil Hiremath
1.83
Contact Coverage-Guided Exploration for General-Purpose Dexterous Manipulation (2026)
Zixuan Liu et al.
1.78
AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models (2026)
Chengxuan Lu et al.
1.78
Dynamic Dual-Granularity Skill Bank for Agentic RL (2026)
Songjun Tu et al.
1.78
Segment to Focus: Guiding Latent Action Models in the Presence of Distractors (2026)
Hamza Adnan et al.
1.72
GraphDancer: Training LLMs to Explore and Reason over Graphs via Curriculum Reinforcement Learning (2026)
Yuyang Bai et al.
1.72
On the Role of Computation in Reinforcement Learning (2026)
Raj Ghugare et al.
1.72
ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation (2026)
Jingqi Zhou et al.
1.72
Online Learning In Mdps With Partially Adversarial Transitions And Losses (2026)
Ofir Schlisselberg, Tal Lancewicki, Yishay Mansour
1.72
Safe Exploration via Policy Priors (2026)
Manuel Wendl et al.
1.67
Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals (2026)
Octavio Pappalardo
1.67
Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($\lambda$,$\lambda$))-GA (2025)
Tai Nguyen et al.
1.61
VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization (2025)
Youpeng Li et al.
1.56
Finding Kissing Numbers with Game-theoretic Reinforcement Learning (2025)
Chengdong Ma et al.
1.56
Interpretable By Design: Query-specific Neural Modules For Explainable Reinforcement Learning (2025)
Mehrdad Zakershahrak
1.56
Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning (2025)
Rui Liu et al.
1.50
Q-Learning with Fine-Grained Gap-Dependent Regret (2025)
Haochen Zhang et al.
1.50
Using Reinforcement Learning to Optimize the Global and Local Crossing Number (2025)
Timo Brand et al.
1.44
How LLMs Learn to Reason: A Complex Network Perspective (2025)
Sihan Hu et al.
1.44
Exploring Large Action Sets With Hyperspherical Embeddings Using Von Mises-fisher Sampling (2025)
Walid Bendada, Guillaume Salha-Galvan, Romain Hennequin, et al.
1.33
Learning Safe, Constrained Policies Via Imitation Learning: Connection To Probabilistic Inference And A Naive Algorithm (2025)
George Papadopoulos, George A. Vouros
1.33
Q-learning With Posterior Sampling (2025)
Priyank Agrawal, Shipra Agrawal, Azmat Azati
1.28
Adaptable Hindsight Experience Replay For Search-based Learning (2025)
Alexandros Vazaios, Jannis Brugger, Cedric Derstroff, et al.
1.28
Beyond-expert Performance With Limited Demonstrations: Efficient Imitation Learning With Double Exploration (2025)
Heyang Zhao, Xingrui Yu, David M. Bossens, et al.
1.28