Awesome Reinforcement Learning

📄Papers 🧭Topics 🔥Trending 🗺️Map 🏆Leaderboards 🎓Learn 🤖Ask AI

⋯More

👥Authors 📚Reading Packs 📊Datasets 🛠️Tools 📰News 📝Blogs ✉️Newsletter 🎯Research Radar 🔖Saved

← all topics overview

Uncategorized

loading…

Stay Updated

E-Mail Digest 🎯 Research Radar

Submit a paper · Privacy · Terms

© 2026 Awesome Papers.

Awesome Uncategorized — curated papers, datasets & benchmarks · Awesome Reinforcement Learning

← all topics overview

Awesome Uncategorized

Uncategorized is one of the most active areas in Awesome Reinforcement Learning — 60 papers in this collection, evaluated on datasets like CIFAR-10, MATH-500, GPQA. A strong starting point is "First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training".

Datasets & benchmarks

CIFAR-102 papers · 🤗

MATH-5001 paper · 🤗

GPQA1 paper · 🤗

MathVista1 paper · 🤗

ProcessBench1 paper · 🤗

GPQA-Diamond1 paper · 🤗

PutnamBench1 paper · 🤗

WeMath1 paper · 🤗

CARLA1 paper · 🤗

6 industrial circuits1 paper

2D Brusselator1 paper

Key papers

60 papers · trending (default)numbers = 🔥 heat

First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training (2025)
Lai Wei et al.
8.00
Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope? (2025)
Michael Doherty et al.
6.12
Reinforcement Learning for Search Tree Size Minimization in Constraint Programming: New Results on Scheduling Benchmarks (2025)
Vil\'em Heinz et al.
4.58
Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning (2025)
Xiao Hu et al.
3.67
2 OLMo 2 Furious (2025)
Team OLMo et al.
3.53
Apriel-Nemotron-15B-Thinker (2025)
Shruthan Radhakrishna et al.
2.99
Process Reward Models That Think (2025)
Muhammad Khalifa et al.
2.76
PINN-DT: Optimizing Energy Consumption in Smart Building Using Hybrid Physics-Informed Neural Networks and Digital Twin Framework with Blockchain Security (2025)
Hajar Kazemi Naeini et al.
2.71
Edge AI-Powered Real-Time Decision-Making for Autonomous Vehicles in Adverse Weather Conditions (2025)
Milad Rahmati
2.71
Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures (2025)
Tushar Pandey et al.
2.65
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction (2025)
Yong Lin and Shange Tang and Bohan Lyu and Ziran Yang and Jui-Hui Chung and Haoyu Zhao and Lai Jiang and Yihan Geng and Jiawei Ge and Jingruo Sun and Jiayun Wu and Jiri Gesi and Ximing Lu and David Acuna and Kaiyu Yang and Hongzhou Lin and Yejin Choi and Danqi Chen and Sanjeev Arora and Chi Jin
2.45
Optimized Renewable Energy Planning MDP for Socially-Equitable Electricity Coverage in the US (2025)
Riya Kinnarkar et al.
1.44
Real-World Receptivity to Adaptive Mental Health Interventions: Findings from an In-the-Wild Study (2025)
Nilesh Kumar Sahu et al.
1.39
Machine Learning Algorithms for Improving Exact Classical Solvers in Mixed Integer Continuous Optimization (2025)
Morteza Kimiaei et al.
1.39
Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving (2025)
Tianyun Yang et al.
1.39
A Robust Pipeline for Differentially Private Federated Learning on Imbalanced Clinical Data using SMOTETomek and FedProx (2025)
Rodrigo Tertulino
1.39
MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph (2025)
Duzhen Zhang et al.
1.39
RewardRank: Optimizing True Learning-to-Rank Utility (2025)
Gaurav Bhatt et al.
1.39
TransLLM: A Unified Multi-Task Foundation Framework for Urban Transportation via Learnable Prompting (2025)
Jiaming Leng et al.
1.39
Curated Collaborative AI Edge with Network Data Analytics for B5G/6G Radio Access Networks (2025)
Sardar Jaffar Ali et al.
1.33
BlueLM-2.5-3B Technical Report (2025)
Baojiao Xiong et al.
1.33
The Tsetlin Machine Goes Deep: Logical Learning and Reasoning With Graphs (2025)
Ole-Christoffer Granmo and Youmna Abdelwahab and Per-Arne Andersen and Paul F. A. Clarke and Kunal Dumbre and Ylva Gr{\o}nnins{\ae}ter and Vojtech Halenka and Runar Helin and Lei Jiao and Ahmed Khalid and Rebekka Omslandseter and Rupsa Saha and Mayur Shende and Xuan Zhang
1.33
Optimization of Activity Batching Policies in Business Processes (2025)
Orlenys L\'opez-Pintado et al.
1.33
Directly Learning Stock Trading Strategies Through Profit Guided Loss Functions (2025)
Devroop Kar et al.
1.33
MoDeSuite: Robot Learning Task Suite for Benchmarking Mobile Manipulation with Deformable Objects (2025)
Yuying Zhang et al.
1.33
Interpretable reinforcement learning for heat pump control through asymmetric differentiable decision trees (2025)
Toon Van Puyvelde et al.
1.28
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem (2025)
Yubo Wang et al.
1.28
A Risk-Aware Reinforcement Learning Reward for Financial Trading (2025)
Uditansh Srivastava et al.
1.28
Online Job Assignment (2025)
Farbod Ekbatani et al.
1.28
MiniCPM4: Ultra-Efficient LLMs on End Devices (2025)
MiniCPM Team: Chaojun Xiao et al.
1.28
Noise tolerance via reinforcement: Learning a reinforced quantum dynamics (2025)
Abolfazl Ramezanpour
1.28
A Survey of State Representation Learning for Deep Reinforcement Learning (2025)
Ayoub Echchahed and Pablo Samuel Castro
1.28
Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage (2025)
Gavin Lee Goodship et al.
1.28
Spatially-Enhanced Temporal Fusion Transformer: Interpretable Multi-Output Prediction for Parametric Dynamical Systems with Time-Varying Inputs (2025)
Shuwen Sun et al.
1.22
A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction (2025)
Xiaoliang Chen (1) et al.
1.22
From Mind to Machine: The Rise of Manus AI as a Fully Autonomous Digital Agent (2025)
Minjie Shen et al.
1.22
PAPN: Proximity Attention Encoder and Pointer Network Decoder for Parcel Pickup Route Prediction (2025)
Hansi Denis et al.
1.22
Beyond $\tilde{O}(\sqrt{T})$ Constraint Violation for Online Convex Optimization with Adversarial Constraints (2025)
Abhishek Sinha et al.
1.22
FACET: Force-Adaptive Control via Impedance Reference Tracking for Legged Robots (2025)
Botian Xu et al.
1.22
An Identifiable Cost-Aware Causal Decision-Making Framework Using Counterfactual Reasoning (2025)
Ruichu Cai et al.
1.22
PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework (2025)
Abhineet Agarwal et al.
1.22
CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction (2025)
Jing Zou et al.
1.22
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning (2025)
Minwu Kim et al.
1.22
Graph-Supported Dynamic Algorithm Configuration for Multi-Objective Combinatorial Optimization (2025)
Robbert Reijnen et al.
1.22
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features (2025)
Zixuan Xie et al.
1.22
Lifted Forward Planning in Relational Factored Markov Decision Processes with Concurrent Actions (2025)
Florian Andreas Marwitz et al.
1.22
When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks? (2025)
Eleni Nisioti et al.
1.22
Statistical mechanics of extensive-width Bayesian neural networks near interpolation (2025)
Jean Barbier et al.
1.22
FastFlow: Early Yet Robust Network Flow Classification using the Minimal Number of Time-Series Packets (2025)
Rushi Jayeshkumar Babaria and Minzhao Lyu and Gustavo Batista and Vijay Sivaraman
1.17
Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval (2025)
Kidist Amde Mekonnen et al.
1.17
GraphEdge: Dynamic Graph Partition and Task Scheduling for GNNs Computing in Edge Network (2025)
Wenjing Xiao et al.
1.17
Regret Minimization for Piecewise Linear Rewards: Contracts, Auctions, and Beyond (2025)
Francesco Bacchiocchi et al.
1.11
MTS: A Deep Reinforcement Learning Portfolio Management Framework with Time-Awareness and Short-Selling (2025)
Fengchen Gu et al.
1.11
Censoring-Aware Tree-Based Reinforcement Learning for Estimating Dynamic Treatment Regimes with Censored Outcomes (2025)
Animesh Kumar Paul and Russell Greiner
1.11
Group Fairness in Multi-Task Reinforcement Learning (2025)
Kefan Song et al.
1.11
Carpe Diem: Critical Learning Period-Aware Contract-Based Incentives for Federated Learning (2025)
Thanh Linh Nguyen et al.
1.11
On-Off Systems with Strategic Customers (2025)
Yanwei Sun et al.
1.11
Robust Probabilistic Model Checking with Continuous Reward Domains (2025)
Xiaotong Ji et al.
1.06
Bag of Tricks for Inference-time Computation of LLM Reasoning (2025)
Fan Liu et al.
1.06
Analyzing the Ethical Logic of Eight Large Language Models (2025)
W. Russell Neuman et al.
1.00