论文阅读:Cross and Learn: Cross-Modal Self-supervision

目录

Contributions

Method

Cross-Modal Loss

Diversity Loss

Combining Both Loss Contributions

Results

 


 

论文名称:Cross and Learn: Cross-Modal Self-supervision(2018 GCPR: German Conference on Pattern Recognition)

论文作者:Nawid Sayed, Biagio Brattoli, and Bj¨orn Ommer

下载地址https://link.springer.com/chapter/10.1007/978-3-030-12939-2_17

 


 

Contributions

In this paper, we use cross-modal information as an alternative source of supervision and propose a new method to effectively exploit mutual information to train powerful feature representations for both modalities. The main motivation of our approach is derived from the following observation: Information shared across modalities has a much higher semantic meaning compared to information from modality-specific. So, our goal is to obtain feature representations that are sensitive to cross-modal information while being invariant to modality-specific content. These conditions are fulfilled by feature representations that are similar for a pair and dissimilar across different pairs. To achieve that we utilize a trainable two stream architecture with one network per modality similar to (Two-stream network) as the backbone of the proposed framework. Meanwhile, to achieve the former we propose a cross-modal loss L_cross, and to achieve the latter we utilize a diversity loss L_div, both of which act directly in feature space thus promising better training signals.

 


 

Method

Our method requires paired data from two different modalities xX and yY, which is available in most use cases i.e. RGB and optical flow. We utilize a two-stream architecture with trainable CNNs in order to obtain our feature representations f(x) and g(y). With exception of the first layer, the networks share the same architecture but do not share weights. To calculate both loss contributions we need a tuple of pairs xi, yi and xj, yj from our dataset.

 

 

 

Cross-Modal Loss

In order to enforce cross-modal similarity between f and g we enforce the feature representations of a pair to be close in feature space via some distance d. Solving this task requires the networks to ignore information which is only present in either x or y

We utilize the bounded cosine distance for d, which is given by

 

 

Diversity Loss

We obtain diversity by enforcing the feature representation for both modalities to be distant across pairs with respect to the same distance d as before. This spreads the features of different pairs apart in feature space. Due to the cross-modal loss these features mostly encode cross-modal information, thus ensuring sensitive feature representations for this content. The distance across pairs therefore contributes negatively into the loss

 

 

Combining Both Loss Contributions

Given our observations, we weight both loss contributions equally which yields our final loss

 


 

Results

 

 


 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值