Low-precision Reinforcement Learning: Running Soft Actor-critic In Half Precision
2021 Β· Johan Bjorck, Xiangyu Chen, Christopher de Sa, et al.
Abstract
Low-precision training has become a popular approach to reduce compute requirements, memory footprint, and energy consumption in supervised learning. In contrast, this promising approach has not yet enjoyed similarly widespread adoption within the reinforcement learning (RL) community, partly because RL agents can be notoriously hard to train even in full precision. In this paper we consider continuous control with the state-of-the-art SAC agent and demonstrate that a na\"ive adaptation of low-precision methods from supervised learning fails. We propose a set of six modifications, all straightforward to implement, that leaves the underlying agent and its hyperparameters unchanged but improves the numerical stability dramatically. The resulting modified SAC agent has lower memory and compute requirements while matching full-precision rewards, demonstrating that low-precision training can substantially accelerate state-of-the-art RL without parameter tuning.
Authors
(none)
Tags
Stats
Related papers
- Low Precision Policy Distillation With Application To Low-power, Real-time Sensation-cognition-action Loop With Neuromorphic Computing (2018)0.00
- Adviser-actor-critic: Eliminating Steady-state Error In Reinforcement Learning Control (2025)0.00
- Revisiting Discrete Soft Actor-critic (2022)0.00
- Langevin Soft Actor-critic: Efficient Exploration Through Uncertainty-driven Critic Learning (2025)0.00
- Context-based Soft Actor Critic For Environments With Non-stationary Dynamics (2021)0.00
- Improving Exploration In Soft-actor-critic With Normalizing Flows Policies (2019)0.00
- Discrete And Continuous Action Representation For Practical RL In Video Games (2019)0.00
- Striving For Simplicity And Performance In Off-policy DRL: Output Normalization And Non-uniform Sampling (2019)0.00