Awesome Papers

Papers

Distributional Reinforcement Learning With Quantile Regression (2017)
Will Dabney, Mark Rowland, Marc G. Bellemare, et al.
19.20
Statistical Inference Of The Value Function For Reinforcement Learning In Infinite Horizon Settings (2020)
C. Shi, S. Zhang, W. Lu, et al.
13.14
Adaptive Trust Region Policy Optimization: Global Convergence And Faster Rates For Regularized Mdps (2019)
Lior Shani, Yonathan Efroni, Shie Mannor
12.10
Convergence Proof For Actor-critic Methods Applied To PPO And RUDDER (2020)
Markus Holzleitner, Lukas Gruber, José Arjona-Medina, et al.
11.67
Robust Reinforcement Learning: A Case Study In Linear Quadratic Regulation (2020)
Bo Pang, Zhong-Ping Jiang
11.19
Action Candidate Driven Clipped Double Q-learning For Discrete And Continuous Action Tasks (2022)
Haobo Jiang, Jin Xie, Jian Yang
10.61
Finite-sample Analysis Of Nonlinear Stochastic Approximation With Applications In Reinforcement Learning (2019)
Zaiwei Chen, Sheng Zhang, Thinh T. Doan, et al.
10.35
Revisiting State Augmentation Methods For Reinforcement Learning With Stochastic Delays (2021)
Somjit Nath, Mayank Baranwal, Harshad Khadilkar
10.35
Efficiently Breaking The Curse Of Horizon In Off-policy Evaluation With Double Reinforcement Learning (2019)
Nathan Kallus, Masatoshi Uehara
10.21
Achieving Zero Constraint Violation For Constrained Reinforcement Learning Via Primal-dual Approach (2021)
Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, et al.
9.59
Learning And Information In Stochastic Networks And Queues (2021)
Neil Walton, Kuang Xu
9.03
A reinforcement learning agent for maintenance of deteriorating systems with increasingly imperfect repairs (2025)
Alberto Pliego Marug\'an et al.
8.69
Rethinking The Discount Factor In Reinforcement Learning: A Decision Theoretic Approach (2019)
Silviu Pitis
8.60
Parameterized Mdps And Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework (2020)
Amber Srivastava, Srinivasa M Salapaka
8.60
Inexact Iterative Numerical Linear Algebra For Neural Network-based Spectral Estimation And Rare-event Prediction (2023)
John Strahan, Spencer C. Guo, Chatipat Lorpaiboon, et al.
8.35
Minimax Optimal Q Learning With Nearest Neighbors (2023)
Puning Zhao, Lifeng Lai
8.09
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model (2020)
Gen Li et al.
7.83
Logarithmic Regret For Episodic Continuous-time Linear-quadratic Reinforcement Learning Over A Finite-time Horizon (2020)
Matteo Basei, Xin Guo, Anran Hu, et al.
7.81
Computably Continuous Reinforcement-learning Objectives Are Pac-learnable (2023)
Cambridge Yang, Michael Littman, Michael Carbin
7.81
Approximating Euclidean By Imprecise Markov Decision Processes (2020)
Manfred Jaeger, Giorgio Bacci, Giovanni Bacci, et al.
7.50
Renewal Monte Carlo: Renewal Theory Based Reinforcement Learning (2018)
Jayakumar Subramanian, Aditya Mahajan
7.50
Deep Embedded Multiplicative DMD for Algebra-Preserving Koopman Learning (2026)
Kelan Gray et al.
7.38
Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning (2025)
Xinyu Liu et al.
7.35
An Online Prediction Algorithm For Reinforcement Learning With Linear Function Approximation Using Cross Entropy Method (2018)
Ajin George Joseph, Shalabh Bhatnagar
7.16
Entropic Regularization Of Markov Decision Processes (2019)
Boris Belousov, Jan Peters
6.77
Nonparametric Bellman Mappings For Reinforcement Learning: Application To Robust Adaptive Filtering (2024)
Yuki Akiyama, Minh Vu, Konstantinos Slavakis
6.34
$O(1/k)$ Finite-Time Bound for Non-Linear Two-Time-Scale Stochastic Approximation (2025)
Siddharth Chandak
6.28
ParetoPilot: Zero-Surrogate Offline Multi-Objective Optimization via Infer-Perturb-Guide Diffusion (2026)
Ruiqing Sun et al.
6.23
Uncertainty-Aware End-to-End Co-Design of Neural Network Processors: From Training and Mapping to Fabrication (2026)
Yuyang Du et al.
6.23
Pseudospectral Bounds for Transient Amplification in Coupled Gradient Descent (2026)
Ahanaf Hasan Ariq
5.89
PE-MHL: Physics-Encoded Modular Hybrid Layers for Scalable Learning of Complex Systems (2026)
Ismail Hassaballa et al.
5.89
A Geometric Characterization of the Stationary Plateau for Two-Layer Neural Networks (2026)
Tian Ding et al.
5.89
When Both Layers Learn: Training Dynamics of Representing Linear Models via ReLU Networks (2026)
Berk Tinaz et al.
5.89
Activation Steering of Video Generation Models via Reduced-Order Linear Optimal Control (2026)
Jihoon Hong et al.
5.89
Performance Dynamics And Termination Errors In Reinforcement Learning: A Unifying Perspective (2019)
Nikki Lijing Kuang, Clement H. C. Leung
5.84
On Generalized Bellman Equations And Temporal-difference Learning (2017)
Huizhen Yu, A. Rupam Mahmood, Richard S. Sutton
5.84
Semantic Constraint Synthesis for Adaptive Trajectory Optimization via Large Language Models (2026)
Eleanor Brosius et al.
5.49
Reinforcement Learning With Non-cumulative Objective (2023)
Wei Cui, Wei Yu
5.24
Differential Temporal Difference Learning (2018)
Adithya M. Devraj, Ioannis Kontoyiannis, Sean P. Meyn
5.24
Logarithmic Regret Bounds For Continuous-time Average-reward Markov Decision Processes (2022)
Xuefeng Gao, Xun Yu Zhou
5.24
Assumed Density Filtering Q-learning (2017)
Heejin Jeong, Clark Zhang, George J. Pappas, et al.
5.24
Stochastic Reinforcement Learning (2019)
Nikki Lijing Kuang, Clement H. C. Leung, Vienne W. K. Sung
5.24
Zeroth-order Actor-critic: An Evolutionary Framework For Sequential Decision Problems (2022)
Yuheng Lei, Yao Lyu, Guojian Zhan, et al.
5.24
Online Apprenticeship Learning (2021)
Lior Shani, Tom Zahavy, Shie Mannor
5.24
Nonlocal Mean Field Schr\"{o}dinger Bridge with Learned Interactions (2026)
Daisuke Inoue et al.
5.01
When Freshness Is Not Enough: Distribution-Aware Age of Information for Networked LQR Control (2026)
Abdullah Y. Etcibasi et al.
5.01
Near-Optimal Decentralized Stochastic Convex Optimization over Networks (2026)
Nitai Kluger et al.
5.01
RA-DCA: A Randomized Active-Set DCA for Directional Stationarity in Max-Structured DC Programs (2026)
Yi-Shuai Niu
4.95
Riemannian Archetypal Analysis: Interpretable non-linear data analysis on deformed star distributions (2026)
Willem Diepeveen et al.
4.95
Error estimates for tamed Euler and Randomized Euler schemes for SDEs with locally Lipschitz drift with applications to non-logconcave sampling and optimization (2026)
Iosif Lytras et al.
4.95