Abstract
arXiv:2510.01724v2 Announce Type: replace Abstract: Mass spectrometry-based metabolomics generates complex, high-dimensional data that holds vast potential for biological discovery but remains difficult to integrate and interpret. Knowledge graphs (KGs) unify this heterogeneous information by representing spectra, annotations, taxa, chemical classes, and biological activities as a single interoperable network; however, their practical use is limited by the steep learning curve of corresponding specialized representation and query languages. Here we introduce MetaboT, an open-source multi-agent Large Language Model (LLM) framework that translates natural-language questions into executable SPARQL queries over metabolomics knowledge graphs. MetaboT mitigates the hallucination and schema-compliance limitations of single-model approaches through a modular architecture in which specialised agents handle scope validation, entity resolution against authoritative resources, schema-aware query generation, iterative refinement, and result interpretation. We validated MetaboT on the Experimental Natural Products Knowledge Graph (ENPKG), using an expert-authored benchmark of natural-language questions paired with reference SPARQL queries, and demonstrate its ability to answer complex questions about plant--metabolite relationships and biological activities. MetaboT lowers the technical barrier for metabolomics researchers and enables semantic data mining without specialised programming expertise.