Provably Efficient Information-directed Sampling Algorithms For Multi-agent Reinforcement Learning

Abstract

This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. The basic algorithm, referred to as MAIDS, employs an asymmetric learning structure where the max-player first solves a minimax optimization problem based on the joint information ratio of the joint policy, and the min-player then minimizes the marginal information ratio with the max-player's policy fixed. Theoretical analyses show that it achieves a Bayesian regret of tilde\{O\}(sqrt\{K\}) for K episodes. To reduce the computational load of MAIDS, we develop an improved algorithm called Reg-MAI

Provably Efficient Information-directed Sampling Algorithms For Multi-agent Reinforcement Learning

Abstract

Authors

Tags

Stats

Related papers