Continuous-time Q-learning For Mean-field Control Problems
2023 Β· Xiaoli Wei, Xiang Yu
Abstract
This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent's control problem in Jia and Zhou (2023), the mean-field interaction of agents renders the definition of the q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by \(q\)) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023), which can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by \(q_e\)) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test policies. Based on the weak martingale condition and our proposed searching method of test policies, some
Authors
(none)
Tags
Stats
Related papers
- Unified Continuous-time Q-learning For Mean-field Game And Mean-field Control Problems (2024)0.00
- Unified Reinforcement Q-learning For Mean Field Game And Control Problems (2020)0.00
- Q-learning In Continuous Time (2022)0.00
- Global Convergence Of Policy Gradient For Linear-quadratic Mean-field Control/game In Continuous Time (2020)0.00
- Actor-critic Learning For Mean-field Control In Continuous Time (2023)0.00
- Model-free Mean-field Reinforcement Learning: Mean-field MDP And Mean-field Q-learning (2019)0.00
- Analysis Of Multiscale Reinforcement Q-learning Algorithms For Mean Field Control Games (2024)0.00
- MFC-EQ: Mean-field Control With Envelope Q-learning For Moving Decentralized Agents In Formation (2024)0.00