Tempo Adaptation In Non-stationary Reinforcement Learning

Abstract

We first raise and tackle a ``time synchronization'' issue between the agent and the environment in non-stationary reinforcement learning (RL), a crucial factor hindering its real-world applications. In reality, environmental changes occur over wall-clock time (\(t\)) rather than episode progress (\(k\)), where wall-clock time signifies the actual elapsed time within the fixed duration \(t \in [0, T]\). In existing works, at episode \(k\), the agent rolls a trajectory and trains a policy before transitioning to episode \(k+1\). In the context of the time-desynchronized environment, however, the agent at time \(t_\{k\}\) allocates \(\Delta t\) for trajectory generation and training, subsequently moves to the next episode at \(t_\{k+1\}=t_\{k\}+\Delta t\). Despite a fixed total number of episodes (\(K\)), the agent accumulates different trajectories influenced by the choice of interaction times (\(t_1,t_2,...,t_K\)), significantly impacting the suboptimality gap of the policy. We propose

Tempo Adaptation In Non-stationary Reinforcement Learning

Abstract

Authors

Tags

Stats

Related papers