Incorporating Behavioral Constraints In Online AI Systems
2018 Β· Avinash Balakrishnan, Djallel Bouneffouf, Nicholas Mattei, et al.
Abstract
AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. However, in many cases the online rewards should not be the only guiding criteria, as there are additional constraints and/or priorities imposed by regulations, values, preferences, or ethical principles. We detail a novel online agent that learns a set of behavioral constraints by observation and uses these learned constraints as a guide when making decisions in an online setting while still being reactive to reward feedback. To define this agent, we propose to adopt a novel extension to the classical contextual multi-armed bandit setting and we provide a new algorithm called Behavior Constrained Thompson Sampling (BCTS) that allows for online learning while obeying exogenous constraints. Our agent learns a constrained policy that implements the observed behavioral constraints demonstrated by a teacher agent, and then uses this
Authors
(none)
Tags
Stats
Related papers
- Learning To Influence Human Behavior With Offline Reinforcement Learning (2023)0.00
- Improving TD3-BC: Relaxed Policy Constraint For Offline Learning And Stable Online Fine-tuning (2022)0.00
- Bandit Social Learning: Exploration Under Myopic Behavior (2023)0.00
- Principal-agent Bandit Games With Self-interested And Exploratory Learning Agents (2024)0.00
- Learning To Coordinate Under Threshold Rewards: A Cooperative Multi-agent Bandit Framework (2025)0.00
- Unified Models Of Human Behavioral Agents In Bandits, Contextual Bandits And RL (2020)8.35
- Behaviour-conditioned Policies For Cooperative Reinforcement Learning Tasks (2021)2.26
- A New Bandit Setting Balancing Information From State Evolution And Corrupted Context (2020)0.00