Evaluating Llm-based Approaches To Legal Citation Prediction: Domain-specific Pre-training, Fine-tuning, Or RAG? A Benchmark And An Australian Law Case Study

Abstract

Large Language Models (LLMs) have demonstrated strong potential across legal tasks, yet the problem of legal citation prediction remains under-explored. At its core, this task demands fine-grained contextual understanding and precise identification of relevant legislation or precedent. We introduce the AusLaw Citation Benchmark, a real-world dataset comprising 55k Australian legal instances and 18,677 unique citations which to the best of our knowledge is the first of its scale and scope. We then conduct a systematic benchmarking across a range of solutions: (i) standard prompting of both general and law-specialised LLMs, (ii) retrieval-only pipelines with both generic and domain-specific embeddings, (iii) supervised fine-tuning, and (iv) several hybrid strategies that combine LLMs with retrieval augmentation through query expansion, voting ensembles, or re-ranking. Results show that neither general nor law-specific LLMs suffice as stand-alone solutions, with performance near zero. Ins

Evaluating Llm-based Approaches To Legal Citation Prediction: Domain-specific Pre-training, Fine-tuning, Or RAG? A Benchmark And An Australian Law Case Study

Abstract

Authors

Tags

Stats

Related papers