Stepscorer: Accelerating Reinforcement Learning With Step-wise Scoring And Psychological Regret Modeling
2026 Β· Zhe Xu
Abstract
Reinforcement learning algorithms often suffer from slow convergence due to sparse reward signals, particularly in complex environments where feedback is delayed or infrequent. This paper introduces the Psychological Regret Model (PRM), a novel approach that accelerates learning by incorporating regret-based feedback signals after each decision step. Rather than waiting for terminal rewards, PRM computes a regret signal based on the difference between the expected value of the optimal action and the value of the action taken in each state. This transforms sparse rewards into dense feedback signals through a step-wise scoring framework, enabling faster convergence. We demonstrate that PRM achieves stable performance approximately 36% faster than traditional Proximal Policy Optimization (PPO) in benchmark environments such as Lunar Lander. Our results indicate that PRM is particularly effective in continuous control tasks and environments with delayed feedback, making it suitable for rea
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning Algorithms For Regret Minimization In Structured Markov Decision Processes (2016)0.00
- Online Reinforcement Learning In Markov Decision Process Using Linear Programming (2023)3.58
- Efficient Deep Reinforcement Learning With Predictive Processing Proximal Policy Optimization (2022)0.00
- Logarithmic Regret Of Exploration In Average Reward Markov Decision Processes (2025)0.00
- Regret-guided Search Control For Efficient Learning In Alphazero (2026)0.00
- Warm-up Free Policy Optimization: Improved Regret In Linear Markov Decision Processes (2024)0.00
- Delay-adapted Policy Optimization And Improved Regret For Adversarial MDP With Delayed Bandit Feedback (2023)0.00
- Fast Rates For The Regret Of Offline Reinforcement Learning (2021)2.26