Test-time Training On Nearest Neighbors For Large Language Models
2023 Β· Moritz Hardt, Yu Sun
Abstract
Many recent efforts augment language models with retrieval, by adding retrieved data to the input context. For this approach to succeed, the retrieved data must be added at both training and test time. Moreover, as input length grows linearly with the size of retrieved data, cost in computation and memory grows quadratically for modern Transformers. To avoid these complications, we simply fine-tune the model on retrieved data at test time, using its standard training setup. We build a large-scale distributed index based on text embeddings of the Pile dataset. For each test input, our system retrieves its neighbors and fine-tunes the model on their text. Surprisingly, retrieving and training on as few as 20 neighbors, each for only one gradient iteration, drastically improves performance across more than 20 language modeling tasks in the Pile. For example, test-time training with nearest neighbors significantly narrows the performance gap between a small GPT-2 and a GPT-Neo model more t
Authors
(none)
Tags
Stats
Related papers
- Why Do Nearest Neighbor Language Models Work? (2023)3.56
- Neurocache: Efficient Vector Retrieval For Long-range Language Modeling (2024)1.91
- Explaining The Success Of Nearest Neighbor Methods In Prediction (2025)18.63
- You Can't Pick Your Neighbors, Or Can You? When And How To Rely On Retrieval In The \(k\)nn-lm (2022)5.24
- Retrievalattention: Accelerating Long-context LLM Inference Via Vector Retrieval (2024)0.00
- One-layer Transformer Provably Learns One-nearest Neighbor In Context (2024)0.00
- Scalingnote: Scaling Up Retrievers With Large Language Models For Real-world Dense Retrieval (2024)0.00
- Fast Nearest-neighbor Classification Using RNN In Domains With Large Number Of Classes (2017)0.00