Embedding-based Retrieval In Multimodal Content Moderation
2025 Β· Hanzhong Liang, Jinghao Shi, Xiang Shen, et al.
Abstract
Video understanding plays a fundamental role for content moderation on short video platforms, enabling the detection of inappropriate content. While classification remains the dominant approach for content moderation, it often struggles in scenarios requiring rapid and cost-efficient responses, such as trend adaptation and urgent escalations. To address this issue, we introduce an Embedding-Based Retrieval (EBR) method designed to complement traditional classification approaches. We first leverage a Supervised Contrastive Learning (SCL) framework to train a suite of foundation embedding models, including both single-modal and multi-modal architectures. Our models demonstrate superior performance over established contrastive learning methods such as CLIP and MoCo. Building on these embedding models, we design and implement the embedding-based retrieval system that integrates embedding generation and video retrieval to enable efficient and effective trend handling. Comprehensive offline
Authors
(none)
Tags
Stats
Related papers
- Modality-balanced Embedding For Video Retrieval (2022)7.16
- Verve: Versatile Retrieval For Videos Via Unified Embeddings (2026)0.00
- Clamr: Contextualized Late-interaction For Multimodal Content Retrieval (2025)0.00
- Memory Enhanced Embedding Learning For Cross-modal Video-text Retrieval (2021)0.00
- Multimodal Contextualized Support For Enhancing Video Retrieval System (2026)0.00
- Unified Interactive Multimodal Moment Retrieval Via Cascaded Embedding-reranking And Temporal-aware Score Fusion (2025)0.00
- Rzenembed: Towards Comprehensive Multimodal Retrieval (2025)0.00
- Towards Fast Adaptation Of Pretrained Contrastive Models For Multi-channel Video-language Retrieval (2022)7.50