Convergence Guarantees For Deep Epsilon Greedy Policy Learning
2021 Β· Michael Rawson, Radu Balan
Abstract
Policy learning is a quickly growing area. As robotics and computers control day-to-day life, their error rate needs to be minimized and controlled. There are many policy learning methods and bandit methods with provable error rates that accompany them. We show an error or regret bound and convergence of the Deep Epsilon Greedy method which chooses actions with a neural network's prediction. We also show that Epsilon Greedy method regret upper bound is minimized with cubic root exploration. In experiments with the real-world dataset MNIST, we construct a nonlinear reinforcement learning problem. We witness how with either high or low noise, some methods do and some do not converge which agrees with our proof of convergence.
Authors
(none)
Tags
Stats
Related papers
- Guarantees For Epsilon-greedy Reinforcement Learning With Function Approximation (2022)0.00
- Global Convergence Of Policy Gradient Methods In Reinforcement Learning, Games And Control (2023)0.00
- Convergence And Optimality Of Policy Gradient Methods In Weakly Smooth Settings (2021)3.58
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- On The Convergence And Sample Complexity Analysis Of Deep Q-networks With \(\epsilon\)-greedy Exploration (2023)3.58
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00
- Never Worse, Mostly Better: Stable Policy Improvement In Deep Reinforcement Learning (2019)0.00
- On The Theory Of Policy Gradient Methods: Optimality, Approximation, And Distribution Shift (2019)0.00