Randomized Exploration For Reinforcement Learning With Multinomial Logistic Function Approximation
2024 Β· Wooseong Cho, Taehyun Hwang, Joongkyu Lee, et al.
Abstract
We study reinforcement learning with multinomial logistic (MNL) function approximation where the underlying transition probability kernel of the Markov decision processes (MDPs) is parametrized by an unknown transition core with features of state and action. For the finite horizon episodic setting with inhomogeneous state transitions, we propose provably efficient algorithms with randomized exploration having frequentist regret guarantees. For our first algorithm, \(\texttt\{RRL-MNL\}\), we adapt optimistic sampling to ensure the optimism of the estimated value function with sufficient frequency. We establish that \(\texttt\{RRL-MNL\}\) achieves a \(\tilde\{O\}(\kappa^\{-1\} d^\{\frac\{3\}\{2\}\} H^\{\frac\{3\}\{2\}\} \sqrt\{T\})\) frequentist regret bound with constant-time computational cost per episode. Here, \(d\) is the dimension of the transition core, \(H\) is the horizon length, \(T\) is the total number of steps, and \(\kappa\) is a problem-dependent constant. Despite the simp
Authors
(none)
Tags
Stats
Related papers
- Model-based Reinforcement Learning With Multinomial Logistic Function Approximation (2022)2.26
- Provably Efficient Reinforcement Learning With Multinomial Logit Function Approximation (2024)0.00
- Nearly Minimax Optimal Reinforcement Learning For Linear Markov Decision Processes (2022)0.00
- Prior-dependent Analysis Of Posterior Sampling Reinforcement Learning With Function Approximation (2024)0.00
- Reward-free Model-based Reinforcement Learning With Linear Function Approximation (2021)0.00
- Optimal Horizon-free Reward-free Exploration For Linear Mixture Mdps (2023)0.00
- A Nearly Optimal And Low-switching Algorithm For Reinforcement Learning With General Function Approximation (2023)0.00
- Nonstationary Reinforcement Learning With Linear Function Approximation (2020)0.00