From Classical Data To Quantum Advantage -- Quantum Policy Evaluation On Quantum Hardware
2025 Β· Daniel Hein, Simon Wiedemann, Markus Baumann, et al.
Abstract
Quantum policy evaluation (QPE) is a reinforcement learning (RL) algorithm which is quadratically more efficient than an analogous classical Monte Carlo estimation. It makes use of a direct quantum mechanical realization of a finite Markov decision process, in which the agent and the environment are modeled by unitary operators and exchange states, actions, and rewards in superposition. Previously, the quantum environment has been implemented and parametrized manually for an illustrative benchmark using a quantum simulator. In this paper, we demonstrate how these environment parameters can be learned from a batch of classical observational data through quantum machine learning (QML) on quantum hardware. The learned quantum environment is then applied in QPE to also compute policy evaluations on quantum hardware. Our experiments reveal that, despite challenges such as noise and short coherence times, the integration of QML and QPE shows promising potential for achieving quantum advantag
Authors
(none)
Tags
Stats
Related papers
- Quantum Policy Iteration Via Amplitude Estimation And Grover Search -- Towards Quantum Advantage For Reinforcement Learning (2022)0.00
- Hybrid Quantum-classical Policy Gradient For Adaptive Control Of Cyber-physical Systems: A Comparative Study Of VQC Vs. MLP (2025)0.00
- Hybrid Quantum-classical Algorithm For Near-optimal Planning In Pomdps (2025)0.00
- Accelerating Quantum Reinforcement Learning With A Quantum Natural Policy Gradient Based Approach (2025)0.00
- An Introduction To Quantum Reinforcement Learning (QRL) (2024)0.00
- Exponential Improvements For Quantum-accessible Reinforcement Learning (2017)0.00
- On Quantum Natural Policy Gradients (2024)5.24
- Quantum Policy Gradient Algorithm With Optimized Action Decoding (2022)0.00