Computational Benefits Of Intermediate Rewards For Goal-reaching Policy Learning
2021 Β· Yuexiang Zhai, Christina Baek, Zhengyuan Zhou, et al.
Abstract
Many goal-reaching reinforcement learning (RL) tasks have empirically verified that rewarding the agent on subgoals improves convergence speed and practical performance. We attempt to provide a theoretical framework to quantify the computational benefits of rewarding the completion of subgoals, in terms of the number of synchronous value iterations. In particular, we consider subgoals as one-way \{\em intermediate states\}, which can only be visited once per episode and propose two settings that consider these one-way intermediate states: the one-way single-path (OWSP) and the one-way multi-path (OWMP) settings. In both OWSP and OWMP settings, we demonstrate that adding \{\em intermediate rewards\} to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state. We also reveal a trade-off between computational complexity and the pursuit of the shortest path in the OWMP setting: adding intermediate rewards significantly
Authors
(none)
Tags
Stats
Related papers
- Subgoal-based Reward Shaping To Improve Efficiency In Reinforcement Learning (2021)0.00
- Global Reinforcement Learning: Beyond Linear And Convex Rewards Via Submodular Semi-gradient Methods (2024)0.00
- On Learning Intrinsic Rewards For Policy Gradient Methods (2018)0.00
- Redeeming Intrinsic Rewards Via Constrained Optimization (2022)0.00
- Learning Self-imitating Diverse Policies (2018)0.00
- Reward Constrained Policy Optimization (2018)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Learning User-defined Sub-goals Using Memory Editing In Reinforcement Learning (2022)0.00