Dual-gated Epistemic Time-dilation: Autonomous Compute Modulation In Asynchronous MARL
2026 Β· Igor Jankowski
Abstract
While Multi-Agent Reinforcement Learning (MARL) algorithms achieve unprecedented successes across complex continuous domains, their standard deployment strictly adheres to a synchronous operational paradigm. Under this paradigm, agents are universally forced to execute deep neural network inferences at every micro-frame, regardless of immediate necessity. This dense throughput acts as a fundamental barrier to physical deployment on edge-devices where thermal and metabolic budgets are highly constrained. We propose Epistemic Time-Dilation MAPPO (ETD-MAPPO), augmented with a Dual-Gated Epistemic Trigger. Instead of depending on rigid frame-skipping (macro-actions), agents autonomously modulate their execution frequency by interpreting aleatoric uncertainty (via Shannon entropy of their policy) and epistemic uncertainty (via state-value divergence in a Twin-Critic architecture). To format this, we structure the environment as a Semi-Markov Decision Process (SMDP) and build the SMDP-Aligne
Authors
(none)
Tags
Stats
Related papers
- Multi-agent Reinforcement Learning In Stochastic Networked Systems (2020)0.00
- Non-stationary Policy Learning For Multi-timescale Multi-agent Reinforcement Learning (2023)5.24
- Efficient Episodic Memory Utilization Of Cooperative Multi-agent Reinforcement Learning (2024)0.00
- Macro-action-based Multi-agent/robot Deep Reinforcement Learning Under Partial Observability (2022)5.84
- Dealing With Non-stationarity In Decentralized Cooperative Multi-agent Deep Reinforcement Learning Via Multi-timescale Learning (2023)0.00
- Characterizing Speed Performance Of Multi-agent Reinforcement Learning (2023)4.52
- Hierarchical Deep Multiagent Reinforcement Learning With Temporal Abstraction (2018)0.00
- Inducing Stackelberg Equilibrium Through Spatio-temporal Sequential Decision-making In Multi-agent Reinforcement Learning (2023)7.50