A Provably Efficient Sample Collection Strategy For Reinforcement Learning
2020 Β· Jean Tarbouriech, Matteo Pirotta, Michal Valko, et al.
Abstract
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior. Whether we optimize for regret, sample complexity, state-space coverage or model estimation, we need to strike a different exploration-exploitation trade-off. In this paper, we propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that (adaptively) prescribes how many samples to collect at which states, as if it has access to a generative model (i.e., a simulator of the environment); 2) An "objective-agnostic" sample collection exploration strategy responsible for generating the prescribed samples as fast as possible. Building on recent methods for exploration in the stochastic shortest path problem, we first provide an algorithm that, given as input the number of samples \(b(s,a)\) needed in each state-action pair
Authors
(none)
Tags
Stats
Related papers
- Breaking The Sample Complexity Barrier To Regret-optimal Model-free Reinforcement Learning (2021)0.00
- Strategically Efficient Exploration In Competitive Multi-agent Reinforcement Learning (2021)0.00
- Decoupled Exploration And Exploitation Policies For Sample-efficient Reinforcement Learning (2021)0.00
- On Sample-efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, And Beyond (2024)0.00
- Robust On-policy Sampling For Data-efficient Policy Evaluation In Reinforcement Learning (2021)0.00
- Distributionally Robust Model-based Offline Reinforcement Learning With Near-optimal Sample Complexity (2022)0.00
- When Simple Exploration Is Sample Efficient: Identifying Sufficient Conditions For Random Exploration To Yield PAC RL Algorithms (2018)0.00
- Off-policy RL Algorithms Can Be Sample-efficient For Continuous Control Via Sample Multiple Reuse (2023)0.00