cross-modal retrieval and multi-modal retrieval的区别

与unimodal 检索不同,在cross-modal 检索中,检索结果的模态和查询的模态是不同的。比如,用户使用图像检索文本,视频和音频。cross-modal检索的关键在于对不同模态的关系进行建模,难点就是bridging the semantic gap。 然而,当要检索的文档包含多模态的时候,一般的cross-modal方法就无法直接应用到multi-modal retrieval。

multi-modal 检索方法可以处理带有多个模态的多媒体数据,在multi-modal检索中,查询和要检索的文档可能包含不止一个模态。multi-modal检索方法可以用来提高unimodal 检索的准确度。multi-modal和cross-modal检索的主要区别在于在multi-modal检索中,查询和要检索的文档必须至少有一个模态是相同的multi-modal方法通常是融合不同的模态进行检索,而不是对他们的关系进行建模。比如, 在许多multimodal图像检索系统中,查询图像可能都有相关的文本,要检索的图像也包含相关的文本信息。而如果查询和要检索的文档(document)没有相同的模态,那么这就是cross-modal要解决的问题,传统的multi-modal方法就无能为力了


翻译自论文《A Semantic Model for Cross-Modal and Multi-Modal Retrieval 

基于对抗的跨媒体检索Cross-modal retrieval aims to enable flexible retrieval experience across different modalities (e.g., texts vs. images). The core of crossmodal retrieval research is to learn a common subspace where the items of different modalities can be directly compared to each other. In this paper, we present a novel Adversarial Cross-Modal Retrieval (ACMR) method, which seeks an effective common subspace based on adversarial learning. Adversarial learning is implemented as an interplay between two processes. The first process, a feature projector, tries to generate a modality-invariant representation in the common subspace and to confuse the other process, modality classifier, which tries to discriminate between different modalities based on the generated representation. We further impose triplet constraints on the feature projector in order to minimize the gap among the representations of all items from different modalities with same semantic labels, while maximizing the distances among semantically different images and texts. Through the joint exploitation of the above, the underlying cross-modal semantic structure of multimedia data is better preserved when this data is projected into the common subspace. Comprehensive experimental results on four widely used benchmark datasets show that the proposed ACMR method is superior in learning effective subspace representation and that it significantly outperforms the state-of-the-art cross-modal retrieval methods.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值