Stackelberg Coupling Of Online Representation Learning And Reinforcement Learning
2025 Β· Fernando Martinez, Tao Li, Yingdong Lu, et al.
Abstract
Deep Q-learning jointly learns representations and values within monolithic networks, promising beneficial co-adaptation between features and value estimates. Although this architecture has attained substantial success, the coupling between representation and value learning creates instability as representations must constantly adapt to non-stationary value targets, while value estimates depend on these shifting representations. This is compounded by high variance in bootstrapped targets, which causes bias in value estimation in off-policy methods. We introduce Stackelberg Coupled Representation and Reinforcement Learning (SCORER), a framework for value-based RL that views representation and Q-learning as two strategic agents in a hierarchical game. SCORER models the Q-function as the leader, which commits to its strategy by updating less frequently, while the perception network (encoder) acts as the follower, adapting more frequently to learn representations that minimize Bellman erro
Authors
(none)
Tags
Stats
Related papers
- Orchestrated Value Mapping For Reinforcement Learning (2022)0.00
- The Value-improvement Path: Towards Better Representations For Reinforcement Learning (2020)6.77
- Value-consistent Representation Learning For Data-efficient Reinforcement Learning (2022)0.00
- Oracles & Followers: Stackelberg Equilibria In Deep Multi-agent Reinforcement Learning (2022)0.00
- Actions Speak What You Want: Provably Sample-efficient Reinforcement Learning Of The Quantal Stackelberg Equilibrium From Strategic Feedbacks (2023)0.00
- Towards Understanding Cooperative Multi-agent Q-learning With Value Factorization (2020)0.00
- Greedy-based Value Representation For Optimal Coordination In Multi-agent Reinforcement Learning (2021)0.00
- An Empirical Investigation Of Value-based Multi-objective Reinforcement Learning For Stochastic Environments (2024)0.00