VoiceAgentRAG: Solving the RAG Latency Bottleneck in Real-Time Voice Agents Using Dual-Agent Architectures

Jielin Qiu·Jianguo Zhang·Zixiang Chen·Liangwei Yang·Ming Zhu·Juntao Tan·Haolin Chen·Wenting Zhao·Rithesh Murthy·Roshan Ram·Akshara Prabhakar·Shelby Heinecke·Caiming Xiong·Silvio Savarese·Huan Wang·2026

arXiv:2603.02206 ↗Google Scholar ↗Semantic Scholar ↗

cs.SD

Abstract

We present VoiceAgentRAG, an open-source dual-agent memory router that decouples retrieval from response generation. A background Slow Thinker agent continuously monitors the conversation stream, predicts likely follow-up topics using an LLM, and pre-fetches relevant document chunks into a FAISS-backed semantic cache. A foreground Fast Talker agent reads only from this sub-millisecond cache, bypassing the vector database entirely on cache hits.

Abstract

Related papers