Learning When To Switch: Adaptive Policy Selection Via Reinforcement Learning
2025 ยท Chris Tava
Abstract
Autonomous agents often require multiple strategies to solve complex tasks, but determining when to switch between strategies remains challenging. This research introduces a reinforcement learning technique to learn switching thresholds between two orthogonal navigation policies. Using maze navigation as a case study, this work demonstrates how an agent can dynamically transition between systematic exploration (coverage) and goal-directed pathfinding (convergence) to improve task performance. Unlike fixed-threshold approaches, the agent uses Q-learning to adapt switching behavior based on coverage percentage and distance to goal, requiring only minimal domain knowledge: maze dimensions and target location. The agent does not require prior knowledge of wall positions, optimal threshold values, or hand-crafted heuristics; instead, it discovers effective switching strategies dynamically during each run. The agent discretizes its state space into coverage and distance buckets, then adapts
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning With A Focus On Adjusting Policies To Reach Targets (2024)0.00
- Learning To Switch Among Agents In A Team Via 2-layer Markov Decision Processes (2020)0.00
- Continuously Discovering Novel Strategies Via Reward-switching Policy Optimization (2022)0.00
- Never Give Up: Learning Directed Exploration Strategies (2020)0.00
- When Should Agents Explore? (2021)0.00
- Dynamic Subgoal-based Exploration Via Bayesian Optimization (2019)0.00
- Learning Adaptive Exploration Strategies In Dynamic Environments Through Informed Policy Regularization (2020)0.00
- Post-convergence Sim-to-real Policy Transfer: A Principled Alternative To Cherry-picking (2025)0.00