Vector Representations Of Text Data In Deep Learning
2019 Β· Karol Grzegorczyk
Abstract
In this dissertation we report results of our research on dense distributed representations of text data. We propose two novel neural models for learning such representations. The first model learns representations at the document level, while the second model learns word-level representations. For document-level representations we propose Binary Paragraph Vector: a neural network models for learning binary representations of text documents, which can be used for fast document retrieval. We provide a thorough evaluation of these models and demonstrate that they outperform the seminal method in the field in the information retrieval task. We also report strong results in transfer learning settings, where our models are trained on a generic text corpus and then used to infer codes for documents from a domain-specific dataset. In contrast to previously proposed approaches, Binary Paragraph Vector models learn embeddings directly from raw text data. For word-level representations we pr
Authors
(none)
Tags
Stats
Related papers
- Variational Deep Semantic Hashing For Text Documents (2017)12.25
- Text Embeddings For Retrieval From A Large Knowledge Base (2018)4.52
- A Survey On Deep Text Hashing: Efficient Semantic Text Retrieval With Binary Representation (2025)3.83
- Ultra-high Dimensional Sparse Representations With Binarization For Efficient Text Retrieval (2021)8.60
- Learning To Match Using Local And Distributed Representations Of Text For Web Search (2016)18.09
- What Are You Token About? Dense Retrieval As Distributions Over The Vocabulary (2022)8.09
- Learning Compressed Sentence Representations For On-device Text Processing (2019)5.84
- Neural Vector Spaces For Unsupervised Information Retrieval (2017)12.93