Text Embeddings For Retrieval From A Large Knowledge Base
2018 Β· Tolgahan Cakaloglu, Christian Szegedy, Xiaowei Xu
Abstract
Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a semantically meaningful way, we suggest the use of the Stanford Question Answering Dataset (SQuAD) in an open-domain question answering context, where the first task is to find paragraphs useful for answering a given question. First, we compare the quality of various text-embedding methods on the performance of retrieval and give an extensive empirical comparison on the performance of various non-augmented base embedding with, and without IDF weighting. Our main results are that by training deep residual neural models, specifically for retrieval purposes, can yield significant gains when it is used to augment existing embeddings. We also establish that deeper models are superior to this task. The best base baseline embeddings augmented by our learned neural ap
Authors
(none)
Tags
Stats
Related papers
- A Multi-resolution Word Embedding For Document Retrieval From Large Unstructured Knowledge Bases (2019)0.00
- Enhancing Question Answering Precision With Optimized Vector Retrieval And Instructions (2024)0.00
- QAEA-DR: A Unified Text Augmentation Framework For Dense Retrieval (2024)5.24
- Utilizing Embeddings For Ad-hoc Retrieval By Document-to-document Similarity (2017)0.00
- Pre-training Tasks For Embedding-based Large-scale Retrieval (2020)0.00
- Vector Representations Of Text Data In Deep Learning (2019)0.00
- Multi-modal Retrieval Of Tables And Texts Using Tri-encoder Models (2021)6.34
- Progressively Optimized Bi-granular Document Representation For Scalable Embedding Based Retrieval (2022)11.06