Privacy-preserving Reinforcement Learning Beyond Expectation
2022 Β· Arezoo Rajabi, Bhaskar Ramasubramanian, Abdullah Al Maruf, et al.
Abstract
Cyber and cyber-physical systems equipped with machine learning algorithms such as autonomous cars share environments with humans. In such a setting, it is important to align system (or agent) behaviors with the preferences of one or more human users. We consider the case when an agent has to learn behaviors in an unknown environment. Our goal is to capture two defining characteristics of humans: i) a tendency to assess and quantify risk, and ii) a desire to keep decision making hidden from external parties. We incorporate cumulative prospect theory (CPT) into the objective of a reinforcement learning (RL) problem for the former. For the latter, we use differential privacy. We design an algorithm to enable an RL agent to learn policies to maximize a CPT-based objective in a privacy-preserving manner and establish guarantees on the privacy of value functions learned by the algorithm when rewards are sufficiently close. This is accomplished through adding a calibrated noise using a Gauss
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning Beyond Expectation (2021)5.84
- Privacy-preserving Reinforcement Learning From Human Feedback Via Decoupled Reward Modeling (2026)0.00
- Offline Reinforcement Learning With Differential Privacy (2022)0.00
- Privacy Preserving Reinforcement Learning For Population Processes (2024)0.00
- Preserving Expert-level Privacy In Offline Reinforcement Learning (2024)0.00
- New Challenges In Reinforcement Learning: A Survey Of Security And Privacy (2022)10.85
- Beyond Rewards In Reinforcement Learning For Cyber Defence (2026)0.00
- Local Differential Privacy For Regret Minimization In Reinforcement Learning (2020)0.00