Toward Negotiable Reinforcement Learning: Shifting Priorities In Pareto Optimal Sequential Decision-making
2017 Β· Andrew Critch
Abstract
Existing multi-objective reinforcement learning (MORL) algorithms do not account for objectives that arise from players with differing beliefs. Concretely, consider two players with different beliefs and utility functions who may cooperate to build a machine that takes actions on their behalf. A representation is needed for how much the machine's policy will prioritize each player's interests over time. Assuming the players have reached common knowledge of their situation, this paper derives a recursion that any Pareto optimal policy must satisfy. Two qualitative observations can be made from the recursion: the machine must (1) use each player's own beliefs in evaluating how well an action will serve that player's utility function, and (2) shift the relative priority it assigns to each player's expected utilities over time, by a factor proportional to how well that player's beliefs predict the machine's inputs. Observation (2) represents a substantial divergence from na\"\{i\}ve linear
Authors
(none)
Tags
Stats
Related papers
- Sample-efficient Multi-objective Learning Via Generalized Policy Improvement Prioritization (2023)5.24
- A Generalized Algorithm For Multi-objective Reinforcement Learning And Policy Adaptation (2019)0.00
- Addressing The Issue Of Stochastic Environments And Local Decision-making In Multi-objective Reinforcement Learning (2022)0.00
- Navigating Trade-offs: Policy Summarization For Multi-objective Reinforcement Learning (2024)2.26
- Issues With Value-based Multi-objective Reinforcement Learning: Value Function Interference And Overestimation Sensitivity (2024)0.00
- Interpretability By Design For Efficient Multi-objective Reinforcement Learning (2025)0.00
- Using Logical Specifications Of Objectives In Multi-objective Reinforcement Learning (2019)0.00
- Utility-based Reinforcement Learning: Unifying Single-objective And Multi-objective Reinforcement Learning (2024)2.26