Doctag2vec: An Embedding Based Multi-label Learning Approach For Document Tagging
2017 Β· Sheng Chen, Akshay Soni, Aasish Pappu, et al.
Abstract
Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and search. In this work, we propose a novel yet simple approach called DocTag2Vec to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two popular models for learning distributed representation of words and documents. In DocTag2Vec, we simultaneously learn the representation of words, documents, and tags in a joint vector space during training, and employ the simple \(k\)-nearest neighbor search to predict tags for unseen documents. In contrast to previous multi-label learning methods, DocTag2Vec directly deals with raw text instead of provided feature vector, and in addition, enjoys advantages like the learning of tag representation, and the ability of handling newly created tags. To demonstrate the effectiveness of our approach, we cond
Authors
(none)
Tags
Stats
Related papers
- Learning To Hash-tag Videos With Tag2vec (2016)3.58
- Vlm2vec-v2: Advancing Multimodal Embedding For Videos, Images, And Visual Documents (2025)0.00
- Vector Representations Of Text Data In Deep Learning (2019)0.00
- Weakly Supervised Deep Image Hashing Through Tag Embeddings (2018)11.29
- Simpledoc: Multi-modal Document Understanding With Dual-cue Page Retrieval And Iterative Refinement (2025)5.50
- Tagging Before Alignment: Integrating Multi-modal Tags For Video-text Retrieval (2023)10.74
- Vlm2vec: Training Vision-language Models For Massive Multimodal Embedding Tasks (2024)0.00
- Multi-view Document Representation Learning For Open-domain Dense Retrieval (2022)10.21