Abstract

Reinforcement learning (RL) has demonstrated impressive performance in decision-making tasks like embodied control, autonomous driving and financial trading. In many decision-making tasks, the agents often encounter the problem of executing actions under limited budgets. However, classic RL methods typically overlook the challenges posed by such sparse-executing actions. They operate under the assumption that all actions can be taken for a unlimited number of times, both in the formulation of the problem and in the development of effective algorithms. To tackle the issue of limited action execution in RL, this paper first formalizes the problem as a Sparse Action Markov Decision Process (SA-MDP), in which specific actions in the action space can only be executed for a limited time. Then, we propose a policy optimization algorithm, Action Sparsity REgularization (ASRE), which adaptively handles each action with a distinct preference. ASRE operates through two steps: First, ASRE evaluate

Authors

(none)

Tags

  • Uncategorized

Stats

Related papers