Optimal Horizon-free Reward-free Exploration For Linear Mixture Mdps
2023 Β· Junkai Zhang, Weitong Zhang, Quanquan Gu
Abstract
We study reward-free reinforcement learning (RL) with linear function approximation, where the agent works in two phases: (1) in the exploration phase, the agent interacts with the environment but cannot access the reward; and (2) in the planning phase, the agent is given a reward function and is expected to find a near-optimal policy based on samples collected in the exploration phase. The sample complexities of existing reward-free algorithms have a polynomial dependence on the planning horizon, which makes them intractable for long planning horizon RL problems. In this paper, we propose a new reward-free algorithm for learning linear mixture Markov decision processes (MDPs), where the transition probability can be parameterized as a linear combination of known feature mappings. At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward and a high-order moment estimator for the aleatoric and epistemic uncertainties. When the t
Authors
(none)
Tags
Stats
Related papers
- Reward-free Model-based Reinforcement Learning With Linear Function Approximation (2021)0.00
- Reinforcement Learning In Reward-mixing Mdps (2021)0.00
- Nearly Minimax Optimal Reinforcement Learning For Linear Markov Decision Processes (2022)0.00
- Reinforcement Learning For Infinite-horizon Average-reward Linear Mdps Via Approximation By Discounted-reward Mdps (2024)0.00
- Provable Cooperative Multi-agent Exploration For Reward-free Mdps (2026)0.00
- Reward-free RL Is No Harder Than Reward-aware RL In Linear Markov Decision Processes (2022)0.00
- Near-optimal Policy Optimization Algorithms For Learning Adversarial Linear Mixture Mdps (2021)0.00
- On Reward-free RL With Kernel And Neural Function Approximations: Single-agent MDP And Markov Game (2021)0.00