Abstract

Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing. Yet, it poses a critical challenge since the scheduler needs to make real-time decisions to guarantee the delay and resource constraints simultaneously without prior information of system dynamics, which can be time-varying and hard to estimate. Moreover, many practical scenarios suffer from partial observability issues, e.g., due to sensing noise or hidden correlation. To tackle these challenges, we propose a deep reinforcement learning (DRL) algorithm, named Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient (\(\mathtt\{RSD4\}\)), which is a data-driven method based on a Partially Observed Markov Decision Process (POMDP) formulation. \(\mathtt\{RSD4\}\) guarantees resource and delay constraints by Lagrangian dual and delay-sensitive queues, respectively. It also efficiently tackles partial observability with a mem

Authors

(none)

Tags

  • Multi-Agent

Stats

  • citations8
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score7.16
  • arxiv keyhu2022effective

Related papers