Can Meta-interpretive Learning Outperform Deep Reinforcement Learning Of Evaluable Game Strategies?
2019 · Céline Hocquette, Stephen H. Muggleton
Abstract
World-class human players have been outperformed in a number of complex two person games (Go, Chess, Checkers) by Deep Reinforcement Learning systems. However, owing to tractability considerations minimax regret of a learning system cannot be evaluated in such games. In this paper we consider simple games (Noughts-and-Crosses and Hexapawn) in which minimax regret can be efficiently evaluated. We use these games to compare Cumulative Minimax Regret for variants of both standard and deep reinforcement learning against two variants of a new Meta-Interpretive Learning system called MIGO. In our experiments all tested variants of both normal and deep reinforcement learning have worse performance (higher cumulative minimax regret) than both variants of MIGO on Noughts-and-Crosses and Hexapawn. Additionally, MIGO's learned rules are relatively easy to comprehend, and are demonstrated to achieve significant transfer learning in both directions between Noughts-and-Crosses and Hexapawn.
Authors
(none)
Tags
Stats
Related papers
- Impartial Games: A Challenge For Reinforcement Learning (2022)0.00
- Meta-value Learning: A General Framework For Learning With Learning Awareness (2023)0.00
- Modeling Strong And Human-like Gameplay With Kl-regularized Search (2021)0.00
- Score Vs. Winrate In Score-based Games: Which Reward For Reinforcement Learning? (2022)7.16
- Monte Carlo Q-learning For General Game Playing (2018)0.00
- Learning In Multi-memory Games Triggers Complex Dynamics Diverging From Nash Equilibrium (2023)0.00
- First-explore, Then Exploit: Meta-learning To Solve Hard Exploration-exploitation Trade-offs (2023)0.00
- One Step At A Time: Pros And Cons Of Multi-step Meta-gradient Reinforcement Learning (2021)0.00