Reinforcement Learning Beyond Expectation
2021 Β· Bhaskar Ramasubramanian, Luyao Niu, Andrew Clark, et al.
Abstract
The inputs and preferences of human users are important considerations in situations where these users interact with autonomous cyber or cyber-physical systems. In these scenarios, one is often interested in aligning behaviors of the system with the preferences of one or more human users. Cumulative prospect theory (CPT) is a paradigm that has been empirically shown to model a tendency of humans to view gains and losses differently. In this paper, we consider a setting where an autonomous agent has to learn behaviors in an unknown environment. In traditional reinforcement learning, these behaviors are learned through repeated interactions with the environment by optimizing an expected utility. In order to endow the agent with the ability to closely mimic the behavior of human users, we optimize a CPT-based cost. We introduce the notion of the CPT-value of an action taken in a state, and establish the convergence of an iterative dynamic programming-based approach to estimate this quanti
Authors
(none)
Tags
Stats
Related papers
- Privacy-preserving Reinforcement Learning Beyond Expectation (2022)0.00
- Policy Gradients For Cumulative Prospect Theory In Reinforcement Learning (2024)0.00
- Reinforcement Learning From Diverse Human Preferences (2023)0.00
- Human-level Reinforcement Learning Through Theory-based Modeling, Exploration, And Planning (2021)0.00
- Tiered Reinforcement Learning: Pessimism In The Face Of Uncertainty And Constant Regret (2022)0.00
- Bounded Risk-sensitive Markov Games: Forward Policy Design And Inverse Reward Learning With Iterative Reasoning And Cumulative Prospect Theory (2020)0.00
- Hindsight Priors For Reward Learning From Human Preferences (2024)0.00
- Reinforcement Learning With Human Feedback: Learning Dynamic Choices Via Pessimism (2023)0.00