Abstract

We first raise and tackle a ``time synchronization'' issue between the agent and the environment in non-stationary reinforcement learning (RL), a crucial factor hindering its real-world applications. In reality, environmental changes occur over wall-clock time (\(t\)) rather than episode progress (\(k\)), where wall-clock time signifies the actual elapsed time within the fixed duration \(t \in [0, T]\). In existing works, at episode \(k\), the agent rolls a trajectory and trains a policy before transitioning to episode \(k+1\). In the context of the time-desynchronized environment, however, the agent at time \(t_\{k\}\) allocates \(\Delta t\) for trajectory generation and training, subsequently moves to the next episode at \(t_\{k+1\}=t_\{k\}+\Delta t\). Despite a fixed total number of episodes (\(K\)), the agent accumulates different trajectories influenced by the choice of interaction times (\(t_1,t_2,...,t_K\)), significantly impacting the suboptimality gap of the policy. We propose

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keylee2023tempo

Related papers