Evaluating Llm-based Approaches To Legal Citation Prediction: Domain-specific Pre-training, Fine-tuning, Or RAG? A Benchmark And An Australian Law Case Study
2024 Β· Jiuzhou Han, Paul Burgess, Ehsan Shareghi
Abstract
Large Language Models (LLMs) have demonstrated strong potential across legal tasks, yet the problem of legal citation prediction remains under-explored. At its core, this task demands fine-grained contextual understanding and precise identification of relevant legislation or precedent. We introduce the AusLaw Citation Benchmark, a real-world dataset comprising 55k Australian legal instances and 18,677 unique citations which to the best of our knowledge is the first of its scale and scope. We then conduct a systematic benchmarking across a range of solutions: (i) standard prompting of both general and law-specialised LLMs, (ii) retrieval-only pipelines with both generic and domain-specific embeddings, (iii) supervised fine-tuning, and (iv) several hybrid strategies that combine LLMs with retrieval augmentation through query expansion, voting ensembles, or re-ranking. Results show that neither general nor law-specific LLMs suffice as stand-alone solutions, with performance near zero. Ins
Authors
(none)
Tags
Stats
Related papers
- LEMUR: A Corpus For Robust Fine-tuning Of Multilingual Law Embedding Models For Retrieval (2026)0.00
- LMAR: Language Model Augmented Retriever For Domain-specific Knowledge Indexing (2025)1.57
- Legal RAG Bench: An End-to-end Benchmark For Legal RAG (2026)3.00
- Optimizing Legal Document Retrieval In Vietnamese With Semi-hard Negative Mining (2025)0.00
- A Comparative Study Of Specialized Llms As Dense Retrievers (2025)2.26
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00
- CSPLADE: Learned Sparse Retrieval With Causal Language Models (2025)0.00
- Llm-powered Real-time Patent Citation Recommendation For Financial Technologies (2026)0.00