Abstract
Conversational AI agents have evolved from text-based agents into tool-using agents that can perform computation and physical actions. This builds on the capabilities of pre-trained language models by integrating them into tool-using systems, performing accurate calculations, and supporting multi-agent systems, sometimes in conjunction with human operators and sometimes independently. They all follow a similar pattern of using self-supervised learning to allow agents to self-invoke and control external tools, verifying tool execution using reasoning models, using hierarchical recovery to counteract tool-using system failures, and accessing thousands of real-world APIs using semantic retrieval mechanisms, as well as a wide variety of protocol standards. Compositional emergence, a subset of emergent behavior, is the idea that complex behaviors can emerge in a system composed of simple components engaging in pipelining and parallelism. Examples include tree-based deliberative strategies that converge on a solution with efficient exploration, world model planning, micro- to meta-level error recovery, and graceful degradation in production contexts. Multi-agent orchestration mechanisms allow modular components to communicate via a declarative requirement and event-driven message systems. Limitations in the scalability, hallucination, and explainability of these systems need to be addressed to create effective multi-agent architectures. Future work could research online reinforcement learning, quantum planning algorithms, and representational robotics that transfer knowledge of tool use from simulations to real-world applications.