End-to-end Training Of Multimodal Model And Ranking Model
2024 Β· Xiuqi Deng, Lu Xu, Xiyao Li, et al.
Abstract
Traditional recommender systems heavily rely on ID features, which often encounter challenges related to cold-start and generalization. Modeling pre-extracted content features can mitigate these issues, but is still a suboptimal solution due to the discrepancies between training tasks and model parameters. End-to-end training presents a promising solution for these problems, yet most of the existing works mainly focus on retrieval models, leaving the multimodal techniques under-utilized. In this paper, we propose an industrial multimodal recommendation framework named EM3: End-to-end training of Multimodal Model and ranking Model, which sufficiently utilizes multimodal information and allows personalized ranking tasks to directly train the core modules in the multimodal model to obtain more task-oriented content features, without overburdening resource consumption. First, we propose Fusion-Q-Former, which consists of transformers and a set of trainable queries, to fuse different modali
Authors
(none)
Tags
Stats
Related papers
- Self-supervised Multi-modal Sequential Recommendation (2023)0.00
- I Know Why You Like This Movie: Interpretable Efficient Multimodal Recommender (2020)0.00
- Vlm4rec: Multimodal Semantic Representation For Recommendation With Large Vision-language Models (2026)1.82
- Specializing Joint Representations For The Task Of Product Recommendation (2017)8.35
- Matching Images And Text With Multi-modal Tensor Fusion And Re-ranking (2019)19.77
- Unified Interactive Multimodal Moment Retrieval Via Cascaded Embedding-reranking And Temporal-aware Score Fusion (2025)0.00
- Exploiting "quantum-like Interference" In Decision Fusion For Ranking Multimodal Documents (2018)0.00
- Joint Fusion And Encoding: Advancing Multimodal Retrieval From The Ground Up (2025)0.00