Abstract
Accurate and energy-efficient localization of autonomous underwater vehicles (AUVs) remains a fundamental challenge due to the complex, bandwidth-limited, and highly dynamic nature of underwater acoustic environments. This paper proposes a fully adaptive deep reinforcement learning (DRL)-driven localization framework for AUVs operating in Underwater Acoustic Sensor Networks (UAWSNs). The localization problem is formulated as a Markov Decision Process (MDP) in which an intelligent agent jointly optimizes beacon selection and transmit power allocation to minimize long-term localization error and energy consumption. A hierarchical learning architecture is developed by integrating four actor–critic algorithms, which are (i) Twin Delayed Deep Deterministic Policy Gradient (TD3), (ii) Soft Actor–Critic (SAC), (iii) Multi-Agent Deep Deterministic Policy Gradient (MADDPG), and (iv) Distributed DDPG (D2DPG), enabling robust learning under non-stationary channels, cooperative multi-AUV scenarios, and large-scale deployments. A round-trip time (RTT)-based geometric localization model incorporating a depth-dependent sound speed gradient is employed to accurately capture realistic underwater acoustic propagation effects. A multi-objective reward function jointly balances localization accuracy, energy efficiency, and ranging reliability through a risk-aware metric. Furthermore, the Cramér–Rao Lower Bound (CRLB) is derived to characterize the theoretical performance limits, and a comprehensive complexity analysis is performed to demonstrate the scalability of the proposed framework. Extensive Monte Carlo simulations show that the proposed DRL-based methods achieve significantly lower localization error, lower energy consumption, faster convergence, and higher overall system utility than classical TD3. These results confirm the effectiveness and robustness of DRL for next-generation adaptive underwater localization systems.