Cross-modal Retrieval

Cross-modal retrieval aims at retrieving relevant items that are of different nature with respect to the query format.

Four Challenges:

1.representation

2.translation

3.alignment(对齐)

4.co-learning

挑战:The main challenge is to measure the similarity between different modalities of data.

方法:map images and texts into a shared latent space F in which they can be compared

对齐的两种策略

1) global alignment methods aiming at mapping each modal manifold in F such that semantically similar regions share the same directions in F;

全局对齐方法,将每个模态流形映射到F中,使得语义上相似的区域在F中共享相同的方向。

2) local metric learning approaches aiming at mapping each modal manifold such that semantically similar items have a short distances in F

局部度量方法:映射每个模态流形,使得语义相似的items在F中距离更短。

 

 

Multimodal alignment faces a number of difficulties:

1) there are few datasets with explicitly annotated alignments;

2) it is difficult to design similarity metrics between modalities;(模态间的相似度度量)

3) there may exist multiple possible alignments and not all elements in one modality have correspondences in another(可能存在多个匹配或者无匹配)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值