Braxlines: Fast And Interactive Toolkit For Rl-driven Behavior Engineering Beyond Reward Maximization
2021 Β· Shixiang Shane Gu, Manfred Diaz, Daniel C. Freeman, et al.
Abstract
The goal of continuous control is to synthesize desired behaviors. In reinforcement learning (RL)-driven approaches, this is often accomplished through careful task reward engineering for efficient exploration and running an off-the-shelf RL algorithm. While reward maximization is at the core of RL, reward engineering is not the only -- sometimes nor the easiest -- way for specifying complex behaviors. In this paper, we introduce \braxlines, a toolkit for fast and interactive RL-driven behavior generation beyond simple reward maximization that includes Composer, a programmatic API for generating continuous control environments, and set of stable and well-tested baselines for two families of algorithms -- mutual information maximization (MiMax) and divergence minimization (DMin) -- supporting unsupervised skill learning and distribution sketching as other modes of behavior specification. In addition, we discuss how to standardize metrics for evaluating these algorithms, which can no lon
Authors
(none)
Tags
Stats
Related papers
- BXRL: Behavior-explainable Reinforcement Learning (2026)0.00
- Aligning Agents Via Planning: A Benchmark For Trajectory-level Reward Modeling (2026)0.00
- Direct Behavior Specification Via Constrained Reinforcement Learning (2021)0.00
- Rlexplore: Accelerating Research In Intrinsically-motivated Reinforcement Learning (2024)5.33
- Designing Rewards For Fast Learning (2022)0.00
- Rltools: A Fast, Portable Deep Reinforcement Learning Library For Continuous Control (2023)0.00
- Test-driven Reinforcement Learning In Continuous Control (2025)0.00
- Scilab-rl: A Software Framework For Efficient Reinforcement Learning And Cognitive Modeling Research (2024)0.00