Accelerating Quantum Reinforcement Learning With A Quantum Natural Policy Gradient Based Approach
2025 Β· Yang Xu, Vaneet Aggarwal
Abstract
We address the problem of quantum reinforcement learning (QRL) under model-free settings with quantum oracle access to the Markov Decision Process (MDP). This paper introduces a Quantum Natural Policy Gradient (QNPG) algorithm, which replaces the random sampling used in classical Natural Policy Gradient (NPG) estimators with a deterministic gradient estimation approach, enabling seamless integration into quantum systems. While this modification introduces a bounded bias in the estimator, the bias decays exponentially with increasing truncation levels. This paper demonstrates that the proposed QNPG algorithm achieves a sample complexity of \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-1.5\})\) for queries to the quantum oracle, significantly improving the classical lower bound of \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-2\})\) for queries to the MDP.
Authors
(none)
Tags
Stats
Related papers
- Quantum Natural Policy Gradients: Towards Sample-efficient Reinforcement Learning (2023)7.16
- Quantum Algorithms For Reinforcement Learning With A Generative Model (2021)0.00
- Hybrid Quantum-classical Algorithm For Near-optimal Planning In Pomdps (2025)0.00
- Quantum Policy Iteration Via Amplitude Estimation And Grover Search -- Towards Quantum Advantage For Reinforcement Learning (2022)0.00
- From Classical Data To Quantum Advantage -- Quantum Policy Evaluation On Quantum Hardware (2025)0.00
- A Bit Of Freedom Goes A Long Way: Classical And Quantum Algorithms For Reinforcement Learning Under A Generative Model (2025)0.00
- Quantum Speedups In Regret Analysis Of Infinite Horizon Average-reward Markov Decision Processes (2023)0.00
- On Quantum Natural Policy Gradients (2024)5.24