Quantum Speedups In Regret Analysis Of Infinite Horizon Average-reward Markov Decision Processes
2023 Β· Bhargav Ganguly, Yang Xu, Vaneet Aggarwal
Abstract
This paper investigates the potential of quantum acceleration in addressing infinite horizon Markov Decision Processes (MDPs) to enhance average reward outcomes. We introduce an innovative quantum framework for the agent's engagement with an unknown MDP, extending the conventional interaction paradigm. Our approach involves the design of an optimism-driven tabular Reinforcement Learning algorithm that harnesses quantum signals acquired by the agent through efficient quantum mean estimation techniques. Through thorough theoretical analysis, we demonstrate that the quantum advantage in mean estimation leads to exponential advancements in regret guarantees for infinite horizon Reinforcement Learning. Specifically, the proposed Quantum algorithm achieves a regret bound of \(\tilde\{\mathcal\{O\}\}(1)\), a significant improvement over the \(\tilde\{\mathcal\{O\}\}(\sqrt\{T\})\) bound exhibited by classical counterparts.
Authors
(none)
Tags
Stats
Related papers
- A Bit Of Freedom Goes A Long Way: Classical And Quantum Algorithms For Reinforcement Learning Under A Generative Model (2025)0.00
- Accelerating Quantum Reinforcement Learning With A Quantum Natural Policy Gradient Based Approach (2025)0.00
- Quantum Algorithms For Reinforcement Learning With A Generative Model (2021)0.00
- Quantum Policy Iteration Via Amplitude Estimation And Grover Search -- Towards Quantum Advantage For Reinforcement Learning (2022)0.00
- Hybrid Quantum-classical Algorithm For Near-optimal Planning In Pomdps (2025)0.00
- On The Convergence Of Projective-simulation-based Reinforcement Learning In Markov Decision Processes (2019)8.35
- Logarithmic Regret Bounds For Continuous-time Average-reward Markov Decision Processes (2022)5.24
- A Model-free Learning Algorithm For Infinite-horizon Average-reward Mdps With Near-optimal Regret (2020)0.00