Average Reward Reinforcement Learning For Wireless Radio Resource Management
2025 Β· Kun Yang, Jing Yang, Cong Shen
Abstract
In this paper, we address a crucial but often overlooked issue in applying reinforcement learning (RL) to radio resource management (RRM) in wireless communications: the mismatch between the discounted reward RL formulation and the undiscounted goal of wireless network optimization. To the best of our knowledge, we are the first to systematically investigate this discrepancy, starting with a discussion of the problem formulation followed by simulations that quantify the extent of the gap. To bridge this gap, we introduce the use of average reward RL, a method that aligns more closely with the long-term objectives of RRM. We propose a new method called the Average Reward Off policy Soft Actor Critic (ARO SAC) is an adaptation of the well known Soft Actor Critic algorithm in the average reward framework. This new method achieves significant performance improvement our simulation results demonstrate a 15% gain in the system performance over the traditional discounted reward RL approach, u
Authors
(none)
Tags
Stats
Related papers
- Offline Reinforcement Learning For Wireless Network Optimization With Mixture Datasets (2023)9.59
- Offline And Distributional Reinforcement Learning For Radio Resource Management (2024)0.00
- Resource Management In Wireless Networks Via Multi-agent Deep Reinforcement Learning (2020)16.43
- Dynamics Of Resource Allocation In O-rans: An In-depth Exploration Of On-policy And Off-policy Deep Reinforcement Learning For Real-time Applications (2024)2.26
- Deep Reinforcement Learning For Distributed Uncoordinated Cognitive Radios Resource Allocation (2019)0.00
- Generalization In Reinforcement Learning For Radio Access Networks (2025)0.00
- Deep Reinforcement Learning For Distributed And Uncoordinated Cognitive Radios Resource Allocation (2022)0.00
- A Policy-driven DRL Framework For System-level Tradeoff Control In Nr-u/wi-fi Coexistence (2026)0.00