Why Do Nearest Neighbor Language Models Work?
2023 Β· Frank F. Xu, Uri Alon, Graham Neubig
Abstract
Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context and using this representation to predict the next word. Currently, most LMs calculate these representations through a neural network consuming the immediate previous context. However recently, retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore, in addition to their standard, parametric, next-word prediction. In this paper, we set out to understand why retrieval-augmented language models, and specifically why k-nearest neighbor language models (kNN-LMs) perform better than standard parametric LMs, even when the k-nearest neighbor component retrieves examples from the same training set that the LM was originally trained on. To this end, we perform a careful analysis of the various dimensions over which kNN-LM diverges from standard LMs, and investigate these dimensions one by one. Empir
Authors
(none)
Tags
Stats
Related papers
- You Can't Pick Your Neighbors, Or Can You? When And How To Rely On Retrieval In The \(k\)nn-lm (2022)5.24
- Test-time Training On Nearest Neighbors For Large Language Models (2023)0.00
- Explaining The Success Of Nearest Neighbor Methods In Prediction (2025)18.63
- Neurocache: Efficient Vector Retrieval For Long-range Language Modeling (2024)1.91
- A Theory-based Evaluation Of Nearest Neighbor Models Put Into Practice (2018)0.00
- Feasibility Based Large Margin Nearest Neighbor Metric Learning (2016)0.00
- Interpretable Locally Adaptive Nearest Neighbors (2020)3.58
- Stochastic Learning Of Nonstationary Kernels For Natural Language Modeling (2018)0.00