Quantum Algorithms For Reinforcement Learning With A Generative Model
2021 Β· Daochen Wang, Aarthi Sundaram, Robin Kothari, et al.
Abstract
Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward. A standard way to study this question abstractly is to ask how many samples an agent needs from the environment to learn an optimal policy for a \(\gamma\)-discounted Markov decision process (MDP). For such an MDP, we design quantum algorithms that approximate an optimal policy (\(\pi^*\)), the optimal value function (\(v^*\)), and the optimal \(Q\)-function (\(q^*\)), assuming the algorithms can access samples from the environment in quantum superposition. This assumption is justified whenever there exists a simulator for the environment; for example, if the environment is a video game or some other program. Our quantum algorithms, inspired by value iteration, achieve quadratic speedups over the best-possible classical sample complexities in the approximation accuracy (\(\epsilon\)) and two main parameters of the MDP: the effective time horizon (\(\frac\{1\}\{1-\gamma\}\)
Authors
(none)
Tags
Stats
Related papers
- A Bit Of Freedom Goes A Long Way: Classical And Quantum Algorithms For Reinforcement Learning Under A Generative Model (2025)0.00
- Quantum Policy Iteration Via Amplitude Estimation And Grover Search -- Towards Quantum Advantage For Reinforcement Learning (2022)0.00
- Quantum Natural Policy Gradients: Towards Sample-efficient Reinforcement Learning (2023)7.16
- Accelerating Quantum Reinforcement Learning With A Quantum Natural Policy Gradient Based Approach (2025)0.00
- On The Convergence Of Projective-simulation-based Reinforcement Learning In Markov Decision Processes (2019)8.35
- Exponential Improvements For Quantum-accessible Reinforcement Learning (2017)0.00
- Multi-agent Quantum Reinforcement Learning Using Evolutionary Optimization (2023)5.24
- Hybrid Quantum-classical Algorithm For Near-optimal Planning In Pomdps (2025)0.00