Provably Efficient Reinforcement Learning With Multinomial Logit Function Approximation
2024 Β· Long-Fei Li, Yu-Jie Zhang, Peng Zhao, et al.
Abstract
We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space. Despite its significant benefits, incorporating the non-linear function raises substantial challenges in both statistical and computational efficiency. The best-known result of Hwang and Oh [2023] has achieved an \(\widetilde\{\mathcal\{O\}\}(\kappa^\{-1\}dH^2\sqrt\{K\})\) regret upper bound, where \(\kappa\) is a problem-dependent quantity, \(d\) is the feature dimension, \(H\) is the episode length, and \(K\) is the number of episodes. However, we observe that \(\kappa^\{-1\}\) exhibits polynomial dependence on the number of reachable states, which can be as large as the state space size in the worst case and thus undermines the motivation for function approximation. Additionally, their method requires storing all historical data and the time complexity scales linearly with the episode count, which is computationally expensive. In th
Authors
(none)
Tags
Stats
Related papers
- Randomized Exploration For Reinforcement Learning With Multinomial Logistic Function Approximation (2024)0.00
- Model-based Reinforcement Learning With Multinomial Logistic Function Approximation (2022)2.26
- Nearly Minimax Optimal Reinforcement Learning For Linear Markov Decision Processes (2022)0.00
- Reward-free Model-based Reinforcement Learning With Linear Function Approximation (2021)0.00
- Prior-dependent Analysis Of Posterior Sampling Reinforcement Learning With Function Approximation (2024)0.00
- Improved Regret For Efficient Online Reinforcement Learning With Linear Function Approximation (2023)0.00
- Refined Regret For Adversarial Mdps With Linear Function Approximation (2023)0.00
- Cooperative Multi-agent Reinforcement Learning: Asynchronous Communication And Linear Function Approximation (2023)0.00