Reward Shaping For Happier Autonomous Cyber Security Agents
2023 Β· Elizabeth Bates, Vasilios Mavroudis, Chris Hicks
Abstract
As machine learning models become more capable, they have exhibited increased potential in solving complex tasks. One of the most promising directions uses deep reinforcement learning to train autonomous agents in computer network defense tasks. This work studies the impact of the reward signal that is provided to the agents when training for this task. Due to the nature of cybersecurity tasks, the reward signal is typically 1) in the form of penalties (e.g., when a compromise occurs), and 2) distributed sparsely across each defense episode. Such reward characteristics are atypical of classic reinforcement learning tasks where the agent is regularly rewarded for progress (cf. to getting occasionally penalized for failures). We investigate reward shaping techniques that could bridge this gap so as to enable agents to train more sample-efficiently and potentially converge to a better performance. We first show that deep reinforcement learning algorithms are sensitive to the magnitude of
Authors
(none)
Tags
Stats
Related papers
- Beyond Rewards In Reinforcement Learning For Cyber Defence (2026)0.00
- Shaping Sparse Rewards In Reinforcement Learning: A Semi-supervised Approach (2025)0.00
- Scalable Agent Alignment Via Reward Modeling: A Research Direction (2018)0.00
- FRESH: Interactive Reward Shaping In High-dimensional State Spaces Using Human Feedback (2020)0.00
- Highly Efficient Self-adaptive Reward Shaping For Reinforcement Learning (2024)0.00
- Unpacking Reward Shaping: Understanding The Benefits Of Reward Engineering On Sample Complexity (2022)4.52
- Reward Design For Reinforcement Learning Agents (2025)0.00
- Deep Reinforcement Learning For Autonomous Cyber Defence: A Survey (2023)0.00