Reinforcement Learning With Reward Machines In Stochastic Games
2023 Β· Jueming Hu, Jean-Raphael Gaglione, Yanze Wang, et al.
Abstract
We investigate multi-agent reinforcement learning for stochastic games with complex tasks, where the reward functions are non-Markovian. We utilize reward machines to incorporate high-level knowledge of complex tasks. We develop an algorithm called Q-learning with reward machines for stochastic games (QRM-SG), to learn the best-response strategy at Nash equilibrium for each agent. In QRM-SG, we define the Q-function at a Nash equilibrium in augmented state space. The augmented state space integrates the state of the stochastic game and the state of reward machines. Each agent learns the Q-functions of all agents in the system. We prove that Q-functions learned in QRM-SG converge to the Q-functions at a Nash equilibrium if the stage game at each time step during learning has a global optimum point or a saddle point, and the agents update Q-functions based on the best-response strategy at this point. We use the Lemke-Howson method to derive the best-response strategy given current Q-func
Authors
(none)
Tags
Stats
Related papers
- Decentralized Graph-based Multi-agent Reinforcement Learning Using Reward Machines (2021)0.00
- Inferring Probabilistic Reward Machines From Non-markovian Reward Processes For Reinforcement Learning (2021)0.00
- Learning Reward Machines: A Study In Partially Observable Reinforcement Learning (2021)0.00
- Independent Learning In Stochastic Games (2021)6.77
- Joint Learning Of Reward Machines And Policies In Environments With Partially Known Semantics (2022)3.58
- Decentralized Multi-agent Reinforcement Learning For Continuous-space Stochastic Games (2023)5.24
- Strategically Robust Multi-agent Reinforcement Learning With Linear Function Approximation (2026)0.00
- Balancing Two-player Stochastic Games With Soft Q-learning (2018)0.00