Exploiting Expertise Of Non-expert And Diverse Agents In Social Bandit Learning: A Free Energy Approach
2026 Β· Erfan Mirzaei, Seyed Pooya Shariatpanahi, Alireza Tavakoli, et al.
Abstract
Personalized AI-based services involve a population of individual reinforcement learning agents. However, most reinforcement learning algorithms focus on harnessing individual learning and fail to leverage the social learning capabilities commonly exhibited by humans and animals. Social learning integrates individual experience with observing others' behavior, presenting opportunities for improved learning outcomes. In this study, we focus on a social bandit learning scenario where a social agent observes other agents' actions without knowledge of their rewards. The agents independently pursue their own policy without explicit motivation to teach each other. We propose a free energy-based social bandit learning algorithm over the policy space, where the social agent evaluates others' expertise levels without resorting to any oracle or social norms. Accordingly, the social agent integrates its direct experiences in the environment and others' estimated policies. The theoretical converge
Authors
(none)
Tags
Stats
Related papers
- Bandit Social Learning: Exploration Under Myopic Behavior (2023)0.00
- Bandit Social Learning With Exploration Episodes (2026)0.00
- Emergent Social Learning Via Multi-agent Reinforcement Learning (2020)0.00
- Near-optimal Collaborative Learning In Bandits (2022)0.00
- Principal-agent Bandit Games With Self-interested And Exploratory Learning Agents (2024)0.00
- An Efficient Open World Environment For Multi-agent Social Learning (2025)0.00
- Multi-agent Cooperation Through Learning-aware Policy Gradients (2024)0.00
- Social Learning Spontaneously Emerges By Searching Optimal Heuristics With Deep Reinforcement Learning (2022)0.00