Active Causal Experimentalist (ACE): Learning Intervention Strategies Via Direct Preference Optimization
2026 Β· Patrick Cooper, Alvaro Velasquez
Abstract
Discovering causal relationships requires controlled experiments, but experimentalists face a sequential decision problem: each intervention reveals information that should inform what to try next. Traditional approaches such as random sampling, greedy information maximization, and round-robin coverage treat each decision in isolation, unable to learn adaptive strategies from experience. We propose Active Causal Experimentalist (ACE), which learns experimental design as a sequential policy. Our key insight is that while absolute information gains diminish as knowledge accumulates (making value-based RL unstable), relative comparisons between candidate interventions remain meaningful throughout. ACE exploits this via Direct Preference Optimization, learning from pairwise intervention comparisons rather than non-stationary reward magnitudes. Across synthetic benchmarks, physics simulations, and economic data, ACE achieves 70-71% improvement over baselines at equal intervention budgets (p
Authors
(none)
Tags
Stats
Related papers
- ACE : Off-policy Actor-critic With Causality-aware Entropy Regularization (2024)0.00
- Causal Reinforcement Learning Using Observational And Interventional Data (2021)0.00
- Towards Intervention-centric Causal Reasoning In Learning Agents (2020)0.00
- Resolving Spurious Correlations In Causal Models Of Environments Via Interventions (2020)0.00
- Learning By Doing: An Online Causal Reinforcement Learning Framework With Causal-aware Policy (2024)1.56
- Direct Advantage Estimation (2021)0.00
- Learning Causal Overhypotheses Through Exploration In Children And Computational Models (2022)0.00
- Reducing Action Space For Deep Reinforcement Learning Via Causal Effect Estimation (2025)0.00