Simple Dialogue System With AUDITED
2021 Β· Yusuf Tas, Piotr Koniusz
Abstract
We devise a multimodal conversation system for dialogue utterances composed of text, image or both modalities. We leverage Auxiliary UnsuperviseD vIsual and TExtual Data (AUDITED). To improve the performance of text-based task, we utilize translations of target sentences from English to French to form the assisted supervision. For the image-based task, we employ the DeepFashion dataset in which we seek nearest neighbor images of positive and negative target images of the MMD data. These nearest neighbors form the nearest neighbor embedding providing an external context for target images. We form two methods to create neighbor embedding vectors, namely Neighbor Embedding by Hard Assignment (NEHA) and Neighbor Embedding by Soft Assignment (NESA) which generate context subspaces per target image. Subsequently, these subspaces are learnt by our pipeline as a context for the target data. We also propose a discriminator which switches between the image- and text-based tasks. We show improvem
Authors
(none)
Tags
Stats
Related papers
- Dialog-based Interactive Image Retrieval (2018)0.00
- A Retrieval-based Dialogue System Utilizing Utterance And Context Embeddings (2017)9.03
- Bootstrapping Disjoint Datasets For Multilingual Multimodal Representation Learning (2019)0.00
- Webly Supervised Joint Embedding For Cross-modal Image-text Retrieval (2018)13.17
- MULE: Multimodal Universal Language Embedding (2019)9.03
- Photochat: A Human-human Dialogue Dataset With Photo Sharing Behavior For Joint Image-text Modeling (2021)9.92
- MURAL: Multimodal, Multitask Retrieval Across Languages (2021)0.00
- Interactive Text-to-image Retrieval With Large Language Models: A Plug-and-play Approach (2024)10.24