Balancing Act: Prioritization Strategies For Llm-designed Restless Bandit Rewards
2024 Β· Shresth Verma, Niclas Boehmer, Lingkai Kong, et al.
Abstract
LLMs are increasingly used to design reward functions based on human preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards for Restless Multi-Armed Bandits, a framework for allocating limited resources among agents. In applications such as public health, this approach empowers grassroots health workers to tailor automated allocation decisions to community needs. In the presence of multiple agents, altering the reward function based on human preferences can impact subpopulations very differently, leading to complex tradeoffs and a multi-objective resource allocation problem. We are the first to present a principled method termed Social Choice Language Model for dealing with these tradeoffs for LLM-designed rewards for multiagent planners in general and restless bandits in particular. The novel part of our model is a transparent and configurable selection component, called an adjudicator, external to the LLM that controls complex tradeoffs via a user-selected socia
Authors
(none)
Tags
Stats
Related papers
- IRL For Restless Multi-armed Bandits With Applications In Maternal And Child Health (2024)0.00
- Comparing Exploration-exploitation Strategies Of Llms And Humans: Insights From Standard Multi-armed Bandit Experiments (2026)0.00
- Reinforcement Learning Agent Design And Optimization With Bandwidth Allocation Model (2022)0.00
- Provably Efficient Reinforcement Learning For Adversarial Restless Multi-armed Bandits With Unknown Transitions And Bandit Feedback (2024)0.00
- Restless Bandit Problem With Rewards Generated By A Linear Gaussian Dynamical System (2024)0.00
- Response-level Rewards Are All You Need For Online Reinforcement Learning In Llms: A Mathematical Perspective (2025)0.00
- Adaptive Reward Design For Reinforcement Learning (2024)0.00
- Learning In Restless Bandits Under Exogenous Global Markov Process (2021)6.34