Quantum Policy Iteration Via Amplitude Estimation And Grover Search -- Towards Quantum Advantage For Reinforcement Learning
2022 Β· Simon Wiedemann, Daniel Hein, Steffen Udluft, et al.
Abstract
We present a full implementation and simulation of a novel quantum reinforcement learning method. Our work is a detailed and formal proof of concept for how quantum algorithms can be used to solve reinforcement learning problems and shows that, given access to error-free, efficient quantum realizations of the agent and environment, quantum methods can yield provable improvements over classical Monte-Carlo based methods in terms of sample complexity. Our approach shows in detail how to combine amplitude estimation and Grover search into a policy evaluation and improvement scheme. We first develop quantum policy evaluation (QPE) which is quadratically more efficient compared to an analogous classical Monte Carlo estimation and is based on a quantum mechanical realization of a finite Markov decision process (MDP). Building on QPE, we derive a quantum policy iteration that repeatedly improves an initial policy using Grover search until the optimum is reached. Finally, we present an impleme
Authors
(none)
Tags
Stats
Related papers
- From Classical Data To Quantum Advantage -- Quantum Policy Evaluation On Quantum Hardware (2025)0.00
- Quantum Algorithms For Reinforcement Learning With A Generative Model (2021)0.00
- Quantum Natural Policy Gradients: Towards Sample-efficient Reinforcement Learning (2023)7.16
- Quantum Framework For Reinforcement Learning: Integrating Markov Decision Process, Quantum Arithmetic, And Trajectory Search (2024)0.00
- A Bit Of Freedom Goes A Long Way: Classical And Quantum Algorithms For Reinforcement Learning Under A Generative Model (2025)0.00
- Accelerating Quantum Reinforcement Learning With A Quantum Natural Policy Gradient Based Approach (2025)0.00
- Quantum Policy Gradient Algorithm With Optimized Action Decoding (2022)0.00
- On The Convergence Of Projective-simulation-based Reinforcement Learning In Markov Decision Processes (2019)8.35