Can You See How I Learn? Human Observers' Inferences About Reinforcement Learning Agents' Learning Processes
2025 Β· Bernhard Hilpert, Muhan Hou, Kim Baraka, et al.
Abstract
Reinforcement Learning (RL) agents often exhibit learning behaviors that are not intuitively interpretable by human observers, which can result in suboptimal feedback in collaborative teaching settings. Yet, how humans perceive and interpret RL agent's learning behavior is largely unknown. In a bottom-up approach with two experiments, this work provides a data-driven understanding of the factors of human observers' understanding of the agent's learning process. A novel, observation-based paradigm to directly assess human inferences about agent learning was developed. In an exploratory interview study (\textit\{N\}=9), we identify four core themes in human interpretations: Agent Goals, Knowledge, Decision Making, and Learning Mechanisms. A second confirmatory study (\textit\{N\}=34) applied an expanded version of the paradigm across two tasks (navigation/manipulation) and two RL algorithms (tabular/function approximation). Analyses of 816 responses confirmed the reliability of the parad
Authors
(none)
Tags
Stats
Related papers
- Interpretable Learning Dynamics In Unsupervised Reinforcement Learning (2025)0.00
- Accounting For Human Learning When Inferring Human Preferences (2020)0.00
- When Your Ais Deceive You: Challenges Of Partial Observability In Reinforcement Learning From Human Feedback (2024)0.00
- Mapping Out The Space Of Human Feedback For Reinforcement Learning: A Conceptual Framework (2024)0.00
- Experiential Explanations For Reinforcement Learning (2022)2.26
- Improving Multimodal Interactive Agents With Reinforcement Learning From Human Feedback (2022)0.00
- Machine Versus Human Attention In Deep Reinforcement Learning Tasks (2020)0.00
- Perspectives On The Social Impacts Of Reinforcement Learning With Human Feedback (2023)0.00