On Reward-free RL With Kernel And Neural Function Approximations: Single-agent MDP And Markov Game
2021 Β· Shuang Qiu, Jieping Ye, Zhaoran Wang, et al.
Abstract
To achieve sample efficiency in reinforcement learning (RL), it necessitates efficiently exploring the underlying environment. Under the offline setting, addressing the exploration challenge lies in collecting an offline dataset with sufficient coverage. Motivated by such a challenge, we study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. Then, given any extrinsic reward, the agent computes the policy via a planning algorithm with offline data collected in the exploration phase. Moreover, we tackle this problem under the context of function approximation, leveraging powerful function approximators. Specifically, we propose to explore via an optimistic variant of the value-iteration algorithm incorporating kernel and neural function approximations, where we adopt the associated exploration bonus as the exploration reward. Moreover, we design exploration and planning algorithms for both single-agent MDPs
Authors
(none)
Tags
Stats
Related papers
- Reward-free Model-based Reinforcement Learning With Linear Function Approximation (2021)0.00
- Nearly Minimax Optimal Offline Reinforcement Learning With Linear Function Approximation: Single-agent MDP And Markov Game (2022)0.00
- Optimal Horizon-free Reward-free Exploration For Linear Mixture Mdps (2023)0.00
- PC-MLP: Model-based Reinforcement Learning With Policy Cover Guided Exploration (2021)0.00
- Distributionally Robust Online Markov Game With Linear Function Approximation (2025)0.00
- Strategically Efficient Exploration In Competitive Multi-agent Reinforcement Learning (2021)0.00
- Improved Sample Complexity For Reward-free Reinforcement Learning Under Low-rank Mdps (2023)0.00
- Incentivize Without Bonus: Provably Efficient Model-based Online Multi-agent RL For Markov Games (2025)0.00