Provable Performance Bounds For Digital Twin-driven Deep Reinforcement Learning In Wireless Networks: A Novel Digital-twin Bisimulation Metric
2025 Β· Zhenyu Tao, Wei Xu, Xiaohu You
Abstract
Digital twin (DT)-driven deep reinforcement learning (DRL) has emerged as a promising paradigm for wireless network optimization, offering safe and efficient training environment for policy exploration. However, in theory existing methods cannot always guarantee real-world performance of DT-trained policies before actual deployment, due to the absence of a universal metric for assessing DT's ability to support reliable DRL training transferrable to physical networks. In this paper, we propose the DT bisimulation metric (DT-BSM), a novel metric based on the Wasserstein distance, to quantify the discrepancy between Markov decision processes (MDPs) in both the DT and the corresponding real-world wireless network environment. We prove that for any DT-trained policy, the sub-optimality of its performance (regret) in the real-world deployment is bounded by a weighted sum of the DT-BSM and its sub-optimality within the MDP in the DT. Then, a modified DT-BSM based on the total variation distan
Authors
(none)
Tags
Stats
Related papers
- Towards Robust Bisimulation Metric Learning (2021)0.00
- Stealing That Free Lunch: Exposing The Limits Of Dyna-style Reinforcement Learning (2024)0.00
- Trade-off On Sim2real Learning: Real-world Learning Faster Than Simulations (2020)3.58
- WD3: Taming The Estimation Bias In Deep Reinforcement Learning (2020)10.21
- Dual-mind World Models: A General Framework For Learning In Dynamic Wireless Networks (2025)0.00
- Mitigating Estimation Errors By Twin Td-regularized Actor And Critic For Deep Reinforcement Learning (2023)0.00
- Koopman-based Generalization Of Deep Reinforcement Learning With Application To Wireless Communications (2025)0.00
- Leveraging Digital Cousins For Ensemble Q-learning In Large-scale Wireless Networks (2024)6.77