Hybrid Quantum-classical Policy Gradient For Adaptive Control Of Cyber-physical Systems: A Comparative Study Of VQC Vs. MLP
2025 Β· Aueaphum Aueawatthanaphisut, Nyi Wunna Tun
Abstract
The comparative evaluation between classical and quantum reinforcement learning (QRL) paradigms was conducted to investigate their convergence behavior, robustness under observational noise, and computational efficiency in a benchmark control environment. The study employed a multilayer perceptron (MLP) agent as a classical baseline and a parameterized variational quantum circuit (VQC) as a quantum counterpart, both trained on the CartPole-v1 environment over 500 episodes. Empirical results demonstrated that the classical MLP achieved near-optimal policy convergence with a mean return of 498.7 +/- 3.2, maintaining stable equilibrium throughout training. In contrast, the VQC exhibited limited learning capability, with an average return of 14.6 +/- 4.8, primarily constrained by circuit depth and qubit connectivity. Noise robustness analysis further revealed that the MLP policy deteriorated gracefully under Gaussian perturbations, while the VQC displayed higher sensitivity at equivalent n
Authors
(none)
Tags
Stats
Related papers
- From Classical Data To Quantum Advantage -- Quantum Policy Evaluation On Quantum Hardware (2025)0.00
- Quantum Reinforcement Learning By Adaptive Non-local Observables (2025)2.26
- Quantum Policy Gradient Algorithm With Optimized Action Decoding (2022)0.00
- Robustness And Generalization In Quantum Reinforcement Learning Via Lipschitz Regularization (2024)0.00
- Quantum Natural Policy Gradients: Towards Sample-efficient Reinforcement Learning (2023)7.16
- On Quantum Natural Policy Gradients (2024)5.24
- Hybrid Quantum-classical Algorithm For Near-optimal Planning In Pomdps (2025)0.00
- Hybrid-quantum Neural Architecture Search For The Proximal Policy Optimization Algorithm (2025)0.00