No Representation, No Trust: Connecting Representation, Collapse, And Trust Issues In PPO
2024 Β· Skander Moalla, Andrea Miele, Daniil Pyatko, et al.
Abstract
Reinforcement learning (RL) is inherently rife with non-stationarity since the states and rewards the agent observes during training depend on its changing policy. Therefore, networks in deep RL must be capable of adapting to new observations and fitting new targets. However, previous works have observed that networks trained under non-stationarity exhibit an inability to continue learning, termed loss of plasticity, and eventually a collapse in performance. For off-policy deep value-based RL methods, this phenomenon has been correlated with a decrease in representation rank and the ability to fit random targets, termed capacity loss. Although this correlation has generally been attributed to neural network learning under non-stationarity, the connection to representation dynamics has not been carefully studied in on-policy policy optimization methods. In this work, we empirically study representation dynamics in Proximal Policy Optimization (PPO) on the Atari and MuJoCo environments,
Authors
(none)
Tags
Stats
Related papers
- What's Behind Ppo's Collapse In Long-cot? Value Optimization Holds The Secret (2025)0.00
- Truly Proximal Policy Optimization (2019)0.00
- Trust Region Bounds For Decentralized PPO Under Non-stationarity (2022)0.00
- Revisiting Design Choices In Proximal Policy Optimization (2020)0.00
- Neural Proximal/trust Region Policy Optimization Attains Globally Optimal Policy (2019)0.00
- Rethinking Model-based, Policy-based, And Value-based Reinforcement Learning Via The Lens Of Representation Complexity (2023)2.26
- Preventing Learning Stagnation In PPO By Scaling To 1 Million Parallel Environments (2026)0.00
- A Theoretical Analysis Of Optimistic Proximal Policy Optimization In Linear Markov Decision Processes (2023)0.00