Legal RAG Bench: An End-to-end Benchmark For Legal RAG
2026 Β· Abdur-Rahman Butler, Umar Butler
Abstract
We introduce Legal RAG Bench, a benchmark and evaluation methodology for assessing the end-to-end performance of legal RAG systems. As a benchmark, Legal RAG Bench consists of 4,876 passages from the Victorian Criminal Charge Book alongside 100 complex, hand-crafted questions demanding expert knowledge of criminal law and procedure. Both long-form answers and supporting passages are provided. As an evaluation methodology, Legal RAG Bench leverages a full factorial design and novel hierarchical error decomposition framework, enabling apples-to-apples comparisons of the contributions of retrieval and reasoning models in RAG. We evaluate three state-of-the-art embedding models (Isaacus' Kanon 2 Embedder, Google's Gemini Embedding 001, and OpenAI's Text Embedding 3 Large) and two frontier LLMs (Gemini 3.1 Pro and GPT-5.2), finding that information retrieval is the primary driver of legal RAG performance, with LLMs exerting a more moderate effect on correctness and groundedness. Kanon 2 Emb
Authors
(none)
Tags
Stats
Related papers
- Ragperf: An End-to-end Benchmarking Framework For Retrieval-augmented Generation Systems (2026)0.00
- Visual-rag: Benchmarking Text-to-image Retrieval Augmented Generation For Visual Knowledge Intensive Queries (2025)0.00
- SRAG: RAG With Structured Data Improves Vector Retrieval (2026)0.00
- RAG Playground: A Framework For Systematic Evaluation Of Retrieval Strategies And Prompt Engineering In RAG Systems (2024)0.00
- Ragsmith: A Framework For Finding The Optimal Composition Of Retrieval-augmented Generation Methods Across Datasets (2025)0.00
- Are We On The Right Way For Assessing Document Retrieval-augmented Generation? (2025)0.00
- Frustratingly Simple Retrieval Improves Challenging, Reasoning-intensive Benchmarks (2025)0.00
- From BM25 To Corrective RAG: Benchmarking Retrieval Strategies For Text-and-table Documents (2026)0.00