Adversarially-robust TD Learning With Markovian Data: Finite-time Rates And Fundamental Limits
2025 Β· Sreejeet Maity, Aritra Mitra
Abstract
One of the most basic problems in reinforcement learning (RL) is policy evaluation: estimating the long-term return, i.e., value function, corresponding to a given fixed policy. The celebrated Temporal Difference (TD) learning algorithm addresses this problem, and recent work has investigated finite-time convergence guarantees for this algorithm and variants thereof. However, these guarantees hinge on the reward observations being always generated from a well-behaved (e.g., sub-Gaussian) true reward distribution. Motivated by harsh, real-world environments where such an idealistic assumption may no longer hold, we revisit the policy evaluation problem from the perspective of adversarial robustness. In particular, we consider a Huber-contaminated reward model where an adversary can arbitrarily corrupt each reward sample with a small probability \(\epsilon\). Under this observation model, we first show that the adversary can cause the vanilla TD algorithm to converge to any arbitrary val
Authors
(none)
Tags
Stats
Related papers
- Uncertainty Quantification For Markov Chain Induced Martingales With Application To Temporal Difference Learning (2025)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Learning In Markov Games With Adaptive Adversaries: Policy Regret, Fundamental Barriers, And Efficient Algorithms (2024)0.00
- Pseudo-quantized Actor-critic Algorithm For Robustness To Noisy Temporal Difference Error (2026)0.00
- Finite-sample Analysis Of Decentralized Temporal-difference Learning With Linear Function Approximation (2019)0.00
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- Multi-agent Off-policy TD Learning: Finite-time Analysis With Near-optimal Sample Complexity And Communication Complexity (2021)0.00
- TD Or Not TD: Analyzing The Role Of Temporal Differencing In Deep Reinforcement Learning (2018)0.00