Tree-augmented Cross-modal Encoding For Complex-query Video Retrieval
2020 Β· Xun Yang, Jianfeng Dong, Yixin Cao, et al.
Abstract
The rapid growth of user-generated videos on the Internet has intensified the need for text-based video retrieval systems. Traditional methods mainly favor the concept-based paradigm on retrieval with simple queries, which are usually ineffective for complex queries that carry far more complex semantics. Recently, embedding-based paradigm has emerged as a popular approach. It aims to map the queries and videos into a shared embedding space where semantically-similar texts and videos are much closer to each other. Despite its simplicity, it forgoes the exploitation of the syntactic structure of text queries, making it suboptimal to model the complex queries. To facilitate video retrieval with complex queries, we propose a Tree-augmented Cross-modal Encoding method by jointly learning the linguistic structure of queries and the temporal representation of videos. Specifically, given a complex user query, we first recursively compose a latent semantic tree to structurally describe the te
Authors
(none)
Tags
Stats
Related papers
- Dual Encoding For Video Retrieval By Text (2020)16.05
- Multimodal Contextualized Support For Enhancing Video Retrieval System (2026)0.00
- Enhanced Multimodal Video Retrieval System: Integrating Query Expansion And Cross-modal Temporal Event Retrieval (2025)0.00
- UATVR: Uncertainty-adaptive Text-video Retrieval (2023)15.46
- Use What You Have: Video Retrieval Using Representations From Collaborative Experts (2019)0.00
- Multiple Visual-semantic Embedding For Video Retrieval From Query Sentence (2020)2.26
- Tencent Text-video Retrieval: Hierarchical Cross-modal Interactions With Multi-level Representations (2022)7.81
- Delving Deeper: Hierarchical Visual Perception For Robust Video-text Retrieval (2026)1.24