Learning Mirror Maps In Policy Mirror Descent
2024 Β· Carlo Alfano, Sebastian Towers, Silvia Sapora, et al.
Abstract
Policy Mirror Descent (PMD) is a popular framework in reinforcement learning, serving as a unifying perspective that encompasses numerous algorithms. These algorithms are derived through the selection of a mirror map and enjoy finite-time convergence guarantees. Despite its popularity, the exploration of PMD's full potential is limited, with the majority of research focusing on a particular mirror map -- namely, the negative entropy -- which gives rise to the renowned Natural Policy Gradient (NPG) method. It remains uncertain from existing theoretical studies whether the choice of mirror map significantly influences PMD's efficacy. In our work, we conduct empirical investigations to show that the conventional mirror map choice (NPG) often yields less-than-optimal outcomes across several standard benchmark environments. Using evolutionary strategies, we identify more efficient mirror maps that enhance the performance of PMD. We first focus on a tabular environment, i.e. Grid-World, wher
Authors
(none)
Tags
Stats
Related papers
- Policy Mirror Descent With Temporal Difference Learning: Sample Complexity Under Online Markov Data (2025)0.00
- Optimal Convergence Rate For Exact Policy Mirror Descent In Discounted Markov Decision Processes (2023)0.00
- A Novel Framework For Policy Mirror Descent With General Parameterization And Linear Convergence (2023)2.26
- Mirror Learning: A Unifying Framework Of Policy Optimisation (2022)0.00
- Mirror Descent Policy Optimisation For Robust Constrained Markov Decision Processes (2025)0.00
- Independent Policy Mirror Descent For Markov Potential Games: Scaling To Large Number Of Players (2024)0.00
- Policy Mirror Descent Inherently Explores Action Space (2023)2.26
- Heterogeneous Multi-agent Reinforcement Learning Via Mirror Descent Policy Optimization (2023)0.00