MOCO: Momentum Contrast for Unsupervised Visual Representation Learning

动机:

Unsupervised representation learning is highly successful in natural language processing,but supervised pre-training is still dominant in computer vision. The reason may stem from differences in their respective signal spaces, Language tasks have discrete signal spaces, Computer vision, in contrast, as the raw signal is in a continuous, high-dimensional space and is not structured
【CC】无监督在NLP领域大获成功,但在CV领域没啥动静。大佬认为可能是两个领域的信息空间差异比较大:NLP是离散化的/低维度的结构化信息,CV是连续的/高维度非结构化信息

意义:

These results show that MoCo largely closes the gap between unsupervised and supervised representation learning in many computer vision tasks
【CC】本文的方法在CV大幅抹平监督-无监督的GAP

前置知识- 对比学习 as dictionary look-up
Though driven by various motivations, these methods can be thought of as building dynamic dictionaries. The“keys” (tokens) in the dictionary are sampled from data (e.g., images or patches) and are represented by an encoder network. Unsupervised learning trains encoders to perform dictionary look-up: an encoded “query” should be similar to its matching key and dissimilar to others. Learning is formulated as minimizing a contrastive loss.
【CC】大佬认为对比学习的本质是“构造字典-查字典”. 一个key,类比NLP的token,是由一个encoder(学习出来的NN网络)从一幅图片或者图片的一部分编码而成. 假设这个encoder已经训练好了,现在来做“查字典”:给定已经编码好的一条“query”(即待确认的一副图片),该“query”应该跟正样本的距离更近而跟负样本的距离更远(很像triplet loss). 整个过程就是学习一个encoder使得这个contrastive loss最小

Contrastive learning [29], and its recent developments, can be thought of as training an encoder for a dictionary look-up task Consider an encoded query q and a set of encoded samples {k0, k1, k2, …} that are the keys of a dictionary. Assume that there is a single key (denot

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值