Efficient Use Of Heuristics For Accelerating Xcs-based Policy Learning In Markov Games
2020 Β· Hao Chen, Chang Wang, Jian Huang, et al.
Abstract
In Markov games, playing against non-stationary opponents with learning ability is still challenging for reinforcement learning (RL) agents, because the opponents can evolve their policies concurrently. This increases the complexity of the learning task and slows down the learning speed of the RL agents. This paper proposes efficient use of rough heuristics to speed up policy learning when playing against concurrent learners. Specifically, we propose an algorithm that can efficiently learn explainable and generalized action selection rules by taking advantages of the representation of quantitative heuristics and an opponent model with an eXtended classifier system (XCS) in zero-sum Markov games. A neural network is used to model the opponent from their behaviors and the corresponding policy is inferred for action selection and rule evolution. In cases of multiple heuristic policies, we introduce the concept of Pareto optimality for action selection. Besides, taking advantages of the co
Authors
(none)
Tags
Stats
Related papers
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Efficient Competitive Self-play Policy Optimization (2020)0.00
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00
- Empirical Policy Optimization For \(n\)-player Markov Games (2021)0.00
- Provably Efficient Generalized Lagrangian Policy Optimization For Safe Multi-agent Reinforcement Learning (2023)0.00
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- Minimax-optimal Multi-agent RL In Markov Games With A Generative Model (2022)2.26
- Provably Efficient Fictitious Play Policy Optimization For Zero-sum Markov Games With Structured Transitions (2022)0.00