Information-theoretic Minimax Regret Bounds For Reinforcement Learning Based On Duality
2024 · Raghav Bongole, Amaury Gouverneur, Borja Rodríguez-Gálvez, et al.
Abstract
We study agents acting in an unknown environment where the agent's goal is to find a robust policy. We consider robust policies as policies that achieve high cumulative rewards for all possible environments. To this end, we consider agents minimizing the maximum regret over different environment parameters, leading to the study of minimax regret. This research focuses on deriving information-theoretic bounds for minimax regret in Markov Decision Processes (MDPs) with a finite time horizon. Building on concepts from supervised learning, such as minimum excess risk (MER) and minimax excess risk, we use recent bounds on the Bayesian regret to derive minimax regret bounds. Specifically, we establish minimax theorems and use bounds on the Bayesian regret to perform minimax regret analysis using these minimax theorems. Our contributions include defining a suitable minimax regret in the context of MDPs, finding information-theoretic bounds for it, and applying these bounds in various scenario
Authors
(none)
Tags
Stats
Related papers
- Regret Analysis In Deterministic Reinforcement Learning (2021)0.00
- Solving Robust Mdps Through No-regret Dynamics (2023)0.00
- Variance-aware Regret Bounds For Undiscounted Reinforcement Learning In Mdps (2018)0.00
- Local Differential Privacy For Regret Minimization In Reinforcement Learning (2020)0.00
- Sharp Variance-dependent Bounds In Reinforcement Learning: Best Of Both Worlds In Stochastic And Deterministic Environments (2023)0.00
- First-order Regret In Reinforcement Learning With Linear Function Approximation: A Robust Estimation Approach (2021)0.00
- Regret Bounds For Decentralized Learning In Cooperative Multi-agent Dynamical Systems (2020)0.00
- Refining Minimax Regret For Unsupervised Environment Design (2024)0.00