A Bit Of Freedom Goes A Long Way: Classical And Quantum Algorithms For Reinforcement Learning Under A Generative Model
2025 Β· Andris Ambainis, Joao F. Doriguello, Debbie Lim
Abstract
We propose novel classical and quantum online algorithms for learning finite-horizon and infinite-horizon average-reward Markov Decision Processes (MDPs). Our algorithms are based on a hybrid exploration-generative reinforcement learning (RL) model wherein the agent can, from time to time, freely interact with the environment in a generative sampling fashion, i.e., by having access to a "simulator". By employing known classical and new quantum algorithms for approximating optimal policies under a generative model within our learning algorithms, we show that it is possible to avoid several paradigms from RL like "optimism in the face of uncertainty" and "posterior sampling" and instead compute and use optimal policies directly, which yields better regret bounds compared to previous works. For finite-horizon MDPs, our quantum algorithms obtain regret bounds which only depend logarithmically on the number of time steps \(T\), thus breaking the \(O(\sqrt\{T\})\) classical barrier. This mat
Authors
(none)
Tags
Stats
Related papers
- Quantum Algorithms For Reinforcement Learning With A Generative Model (2021)0.00
- Quantum Speedups In Regret Analysis Of Infinite Horizon Average-reward Markov Decision Processes (2023)0.00
- Hybrid Quantum-classical Algorithm For Near-optimal Planning In Pomdps (2025)0.00
- Quantum Policy Iteration Via Amplitude Estimation And Grover Search -- Towards Quantum Advantage For Reinforcement Learning (2022)0.00
- Sharper Model-free Reinforcement Learning For Average-reward Markov Decision Processes (2023)0.00
- Accelerating Quantum Reinforcement Learning With A Quantum Natural Policy Gradient Based Approach (2025)0.00
- Model-free Reinforcement Learning In Infinite-horizon Average-reward Markov Decision Processes (2019)0.00
- Quantum Framework For Reinforcement Learning: Integrating Markov Decision Process, Quantum Arithmetic, And Trajectory Search (2024)0.00