Efficient Reinforcement Learning Via Initial Pure Exploration
2017 Β· Sudeep Raja Putta, Theja Tulabandhula
Abstract
In several realistic situations, an interactive learning agent can practice and refine its strategy before going on to be evaluated. For instance, consider a student preparing for a series of tests. She would typically take a few practice tests to know which areas she needs to improve upon. Based of the scores she obtains in these practice tests, she would formulate a strategy for maximizing her scores in the actual tests. We treat this scenario in the context of an agent exploring a fixed-horizon episodic Markov Decision Process (MDP), where the agent can practice on the MDP for some number of episodes (not necessarily known in advance) before starting to incur regret for its actions. During practice, the agent's goal must be to maximize the probability of following an optimal policy. This is akin to the problem of Pure Exploration (PE). We extend the PE problem of Multi Armed Bandits (MAB) to MDPs and propose a Bayesian algorithm called Posterior Sampling for Pure Exploration (PSPE
Authors
(none)
Tags
Stats
Related papers
- Fast Active Learning For Pure Exploration In Reinforcement Learning (2020)0.00
- Active Exploration In Markov Decision Processes (2019)0.00
- Conservative Exploration In Reinforcement Learning (2020)0.00
- Strategically Efficient Exploration In Competitive Multi-agent Reinforcement Learning (2021)0.00
- Sample Efficient Reinforcement Learning Via Model-ensemble Exploration And Exploitation (2021)0.00
- Provable Cooperative Multi-agent Exploration For Reward-free Mdps (2026)0.00
- Improved Bounds For Reward-agnostic And Reward-free Exploration (2026)0.00
- Never Give Up: Learning Directed Exploration Strategies (2020)0.00