Towards Reinforcement Learning From Neural Feedback: Mapping Fnirs Signals To Agent Performance
2025 Β· Julia Santaniello, Matthew Russell, Benson Jiang, et al.
Abstract
Reinforcement Learning from Human Feedback (RLHF) is a methodology that aligns agent behavior with human preferences by integrating user feedback into the agent's training process. This paper introduces a framework that guides agent training through implicit neural signals, with a focus on the neural classification problem. Our work presents and releases a novel dataset of functional near-infrared spectroscopy (fNIRS) recordings collected from 25 human participants across three domains: Pick-and-Place Robot, Lunar Lander, and Flappy Bird. We train multiple classifiers to predict varying levels of agent performance (optimal, suboptimal, or worst-case) from windows of preprocessed fNIRS features, achieving an average F1 score of 67% for binary and 46% for multi-class classification across conditions and domains. We also train multiple regressors to predict the degree of deviation between an agent's chosen action and a set of near-optimal policy actions, providing a continuous measure of
Authors
(none)
Tags
Stats
Related papers
- A Survey Of Reinforcement Learning From Human Feedback (2023)0.00
- Mapping Out The Space Of Human Feedback For Reinforcement Learning: A Conceptual Framework (2024)0.00
- Aligning Humans And Robots Via Reinforcement Learning From Implicit Human Feedback (2025)2.26
- Improving Multimodal Interactive Agents With Reinforcement Learning From Human Feedback (2022)0.00
- Accelerating Reinforcement Learning Agent With Eeg-based Implicit Human Feedback (2020)0.00
- Can You See How I Learn? Human Observers' Inferences About Reinforcement Learning Agents' Learning Processes (2025)0.00
- The Alignment Ceiling: Objective Mismatch In Reinforcement Learning From Human Feedback (2023)0.00
- Ego-foresight: Self-supervised Learning Of Agent-aware Representations For Improved RL (2024)0.00