Vision-based Generic Potential Function For Policy Alignment In Multi-agent Reinforcement Learning
2025 Β· Hao Ma, Shijie Wang, Zhiqiang Pu, et al.
Abstract
Guiding the policy of multi-agent reinforcement learning to align with human common sense is a difficult problem, largely due to the complexity of modeling common sense as a reward, especially in complex and long-horizon multi-agent tasks. Recent works have shown the effectiveness of reward shaping, such as potential-based rewards, to enhance policy alignment. The existing works, however, primarily rely on experts to design rule-based rewards, which are often labor-intensive and lack a high-level semantic understanding of common sense. To solve this problem, we propose a hierarchical vision-based reward shaping method. At the bottom layer, a visual-language model (VLM) serves as a generic potential function, guiding the policy to align with human common sense through its intrinsic semantic understanding. To help the policy adapts to uncertainty and changes in long-horizon tasks, the top layer features an adaptive skill selection module based on a visual large language model (vLLM). The
Authors
(none)
Tags
Stats
Related papers
- Subgoal-based Reward Shaping To Improve Efficiency In Reinforcement Learning (2021)0.00
- Scalable Agent Alignment Via Reward Modeling: A Research Direction (2018)0.00
- Training Value-aligned Reinforcement Learning Agents Using A Normative Prior (2021)0.00
- Co-evolution Of Policy And Internal Reward For Language Agents (2026)0.00
- Coordinated Exploration Via Intrinsic Rewards For Multi-agent Reinforcement Learning (2019)0.00
- Guiding Multi-agent Multi-task Reinforcement Learning By A Hierarchical Framework With Logical Reward Shaping (2024)0.00
- Enhancing Vision-language Model Training With Reinforcement Learning In Synthetic Worlds For Real-world Success (2025)0.00
- Influencing Reinforcement Learning Through Natural Language Guidance (2021)0.00