Awesome Statistics Theory
Statistics Theory is one of the most active areas in Awesome Reinforcement Learning β 60 papers in this collection, evaluated on datasets like German Family Panel. A strong starting point is "Learning In Restless Bandits Under Exogenous Global Markov Process".
Datasets & benchmarks
Key papers
- Learning In Restless Bandits Under Exogenous Global Markov Process (2021)Tomer Gafni, Michal Yemini, Kobi Cohen6.34
- Anytime-valid Off-policy Inference For Contextual Bandits (2022)Ian Waudby-Smith, Lili Wu, Aaditya Ramdas, et al.2.26
- Dynamic Decision-making Under Model Misspecification: A Stochastic Stability Approach (2026)Xinyu Dai, Daniel Chen, Yian Qian2.00
- Sparsity Is Necessary: Polynomial-time Stability For Agentic Llms In Large Action Spaces (2026)Angshul Majumdar2.00
- Notes On The Reward Representation Of Posterior Updates (2026)Pedro A. Ortega2.00
- Autocorrelation Effects In A Stochastic-process Model For Decision Making Via Time Series (2026)Tomoki Yamagami, Mikio Hasegawa, Takatomo Mihana, et al.2.00
- Asymptotically Optimal Regret In Communicating Markov Decision Processes (2025)Victor Boone1.33
- Scalable Policy Maximization Under Network Interference (2025)Aidan Gleich, Eric Laber, Alexander Volfovsky1.33
- Stability And Generalization For Bellman Residuals (2025)Enoch H. Kang, Kyoungseok Jang1.33
- Reinforcement Learning With Continuous Actions Under Unmeasured Confounding (2025)Yuhan Li, Eugene Han, Yifan Hu, et al.1.33
- LITE: Efficiently Estimating Gaussian Probability Of Maximality (2025)Nicolas Menet, Jonas HΓΌbotter, Parnian Kassraie, et al.1.33
- Bayesian Decision Making Around Experts (2025)Daniel Jarne Ornia, Joel Dyer, Nicholas Bishop, et al.1.33
- Gap-dependent Bounds For Federated \(q\)-learning (2025)Haochen Zhang, Zhong Zheng, Lingzhou Xue1.33
- Sampling Complexity Of TD And PPO In RKHS (2025)Lu Zou, Wendi Ren, Weizhong Zhang, et al.1.33
- Batched Nonparametric Bandits via k-Nearest Neighbor UCB (2025)Sakshi Arya1.28
- Markov Decision Processes With Noisy State Observation (2023)Amirhossein Afsharrad, Sanjay Lall0.00
- Mitigating Partial Observability In Sequential Decision Processes Via The Lambda Discrepancy (2024)Cameron Allen, Aaron Kirtland, Ruo Yu Tao, et al.0.00
- Proximal Reinforcement Learning: Efficient Off-policy Evaluation In Partially Observed Markov Decision Processes (2021)Andrew Bennett, Nathan Kallus0.00
- Contextual Bandits And Optimistically Universal Learning (2022)Moise Blanchard, Steve Hanneke, Patrick Jaillet0.00
- Mechanisms For A No-regret Agent: Beyond The Common Prior (2020)Modibo Camara, Jason Hartline, Aleck Johnsen0.00
- Optimal Cooperative Multiplayer Learning Bandits With Noisy Rewards And No Communication (2023)William Chang, Yuanhao Lu0.00
- Debiasing Samples From Online Learning Using Bootstrap (2021)Ningyuan Chen, Xuefeng Gao, Yi Xiong0.00
- Optimal Policies For Observing Time Series And Related Restless Bandit Problems (2017)Christopher R. Dance, Tomi Silander0.00
- Solving Non-rectangular Reward-robust Mdps Via Frequency Regularization (2023)Uri Gadot, Esther Derman, Navdeep Kumar, et al.0.00
- A Relaxed Technical Assumption For Posterior Sampling-based Reinforcement Learning For Control Of Unknown Linear Systems (2021)Mukul Gagrani, Sagar Sudhakara, Aditya Mahajan, et al.0.00
- Restless Bandit Problem With Rewards Generated By A Linear Gaussian Dynamical System (2024)Jonathan Gornet, Bruno Sinopoli0.00
- Learning For Bandits Under Action Erasures (2024)Osama Hanna, Merve Karakas, Lin F. Yang, et al.0.00
- Computing The Performance Of A New Adaptive Sampling Algorithm Based On The Gittins Index In Experiments With Exponential Rewards (2023)James K. He, SofΓa S. Villar, Lida Mavrogonatou0.00
- On The Convergence Rate Of Off-policy Policy Optimization Methods With Density-ratio Correction (2021)Jiawei Huang, Nan Jiang0.00
- Optimal Convergence Rate For Exact Policy Mirror Descent In Discounted Markov Decision Processes (2023)Emmeran Johnson, Ciara Pike-Burke, Patrick Rebeschini0.00
- Convergence Of Finite Memory Q-learning For Pomdps And Near Optimality Of Learned Policies Under Filter Stability (2021)Ali Devran Kara, Serdar Yuksel0.00
- Delegative Reinforcement Learning: Learning To Avoid Traps With A Little Help (2019)Vanessa Kosoy0.00
- Bayesian Policy Optimization For Model Uncertainty (2018)Gilwoo Lee, Brian Hou, Aditya Mandalika, et al.0.00
- Improved Algorithm For Adversarial Linear Mixture Mdps With Bandit Feedback And Unknown Transition (2024)Long-Fei Li, Peng Zhao, Zhi-Hua Zhou0.00
- Regret Minimization Experience Replay In Off-policy Reinforcement Learning (2021)Xu-Hui Liu, Zhenghai Xue, Jing-Cheng Pang, et al.0.00
- Provable General Function Class Representation Learning In Multitask Bandits And Mdps (2022)Rui Lu, Andrew Zhao, Simon S. Du, et al.0.00
- Uncertainty Representations In State-space Layers For Deep Reinforcement Learning Under Partial Observability (2024)Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, et al.0.00
- Online Learning In Mdps With Linear Function Approximation And Bandit Feedback (2020)Gergely Neu, Julia Olkhovskaya0.00
- On State Variables, Bandit Problems And Pomdps (2020)Warren B Powell0.00
- Logarithmic Smoothing For Pessimistic Off-policy Evaluation, Selection And Learning (2024)Otmane Sakhi, Imad Aouali, Pierre Alquier, et al.0.00
- Rate-optimal Policy Optimization For Linear Markov Decision Processes (2023)Uri Sherman, Alon Cohen, Tomer Koren, et al.0.00
- Quantifying The Sensitivity Of Inverse Reinforcement Learning To Misspecification (2024)Joar Skalse, Alessandro Abate0.00
- Provably Efficient Imitation Learning From Observation Alone (2019)Wen Sun, Anirudh Vemula, Byron Boots, et al.0.00
- You May Not Need Ratio Clipping In PPO (2022)Mingfei Sun, Vitaly Kurin, Guoqing Liu, et al.0.00
- Policy Optimization Through Approximate Importance Sampling (2019)Marcin B. Tomczak, Dongho Kim, Peter Vrancx, et al.0.00
- Off-policy Evaluation And Learning From Logged Bandit Feedback: Error Reduction Via Surrogate Policy (2018)Yuan Xie, Boyi Liu, Qiang Liu, et al.0.00
- Deceptive Kernel Function On Observations Of Discrete POMDP (2020)Zhili Zhang, Quanyan Zhu0.00
- Learning Adversarial Low-rank Markov Decision Processes With Unknown Transition And Full-information Feedback (2023)Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, et al.0.00
- Optimistic Policy Optimization Is Provably Efficient In Non-stationary Mdps (2021)Han Zhong, Zhongren Chen, Zhuoran Yang, et al.0.00
- GEC: A Unified Framework For Interactive Decision Making In MDP, POMDP, And Beyond (2022)Han Zhong, Wei Xiong, Sirui Zheng, et al.0.00
- A Theoretical Analysis Of Optimistic Proximal Policy Optimization In Linear Markov Decision Processes (2023)Han Zhong, Tong Zhang0.00
- Finite-sample Analysis For SARSA With Linear Function Approximation (2019)Shaofeng Zou, Tengyu Xu, Yingbin Liang0.00
- Static Pricing: Universal Guarantees for Reusable Resources (2019)Omar Besbes et al.β
- How to Hire Secretaries with Stochastic Departures (2019)Thomas Kesselheim et al.β
- On Thompson Sampling for Smoother-than-Lipschitz Bandits (2020)James A. Grant and David S. Leslieβ
- Adaptive Estimator Selection for Off-Policy Evaluation (2020)Yi Su et al.β
- Probability Learning based Tabu Search for the Budgeted Maximum Coverage
Problem (2020)Liwen Li et al.β
- Simple and optimal methods for stochastic variational inequalities, I:
operator extrapolation (2020)Georgios Kotsalis et al.β
- A Scalable MIP-based Method for Learning Optimal Multivariate Decision
Trees (2020)Haoran Zhu et al.β
- Online Stochastic Optimization with Wasserstein Based Non-stationarity (2020)Jiashuo Jiang et al.β