Contrastive Explanations For Comparing Preferences Of Reinforcement Learning Agents
2021 Β· Jasmina Gajcin, Rahul Nair, Tejaswini Pedapati, et al.
Abstract
In complex tasks where the reward function is not straightforward and consists of a set of objectives, multiple reinforcement learning (RL) policies that perform task adequately, but employ different strategies can be trained by adjusting the impact of individual objectives on reward function. Understanding the differences in strategies between policies is necessary to enable users to choose between offered policies, and can help developers understand different behaviors that emerge from various reward functions and training hyperparameters in RL systems. In this work we compare behavior of two policies trained on the same task, but with different preferences in objectives. We propose a method for distinguishing between differences in behavior that stem from different abilities from those that are a consequence of opposing preferences of two RL agents. Furthermore, we use only data on preference-based differences in order to generate contrasting explanations about agents' preferences.
Authors
(none)
Tags
Stats
Related papers
- (when) Are Contrastive Explanations Of Reinforcement Learning Helpful? (2022)0.00
- Experiential Explanations For Reinforcement Learning (2022)2.26
- Why The Agent Made That Decision: Contrastive Explanation Learning For Reinforcement Learning (2024)0.00
- Contrastive Explanations For Reinforcement Learning In Terms Of Expected Consequences (2018)0.00
- Reward Model Learning Vs. Direct Policy Optimization: A Comparative Analysis Of Learning From Human Preferences (2024)0.00
- Reinforcement Learning From Diverse Human Preferences (2023)0.00
- Understanding The Performance Gap In Preference Learning: A Dichotomy Of RLHF And DPO (2025)0.00
- Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes (2023)5.84