Monte Carlo Beam Search For Actor-critic Reinforcement Learning In Continuous Control
2025 Β· Hazim Alzorgan, Abolfazl Razi
Abstract
Actor-critic methods, like Twin Delayed Deep Deterministic Policy Gradient (TD3), depend on basic noise-based exploration, which can result in less than optimal policy convergence. In this study, we introduce Monte Carlo Beam Search (MCBS), a new hybrid method that combines beam search and Monte Carlo rollouts with TD3 to improve exploration and action selection. MCBS produces several candidate actions around the policy's output and assesses them through short-horizon rollouts, enabling the agent to make better-informed choices. We test MCBS across various continuous-control benchmarks, including HalfCheetah-v4, Walker2d-v5, and Swimmer-v5, showing enhanced sample efficiency and performance compared to standard TD3 and other baseline methods like SAC, PPO, and A2C. Our findings emphasize MCBS's capability to enhance policy learning through structured look-ahead search while ensuring computational efficiency. Additionally, we offer a detailed analysis of crucial hyperparameters, such as
Authors
(none)
Tags
Stats
Related papers
- Guided Exploration In Reinforcement Learning Via Monte Carlo Critic Optimization (2022)0.00
- Wasserstein Barycenter Soft Actor-critic (2025)0.00
- Behavior-guided Actor-critic: Improving Exploration Via Learning Policy Behavior Representation For Deep Reinforcement Learning (2021)0.00
- Deep Exploration With Pac-bayes (2024)0.00
- Sample-efficient Model-free Reinforcement Learning With Off-policy Critics (2019)9.60
- Mean Actor Critic (2017)0.00
- Actor-critic Reinforcement Learning With Phased Actor (2024)0.00
- How To Learn A Useful Critic? Model-based Action-gradient-estimator Policy Optimization (2020)0.00