核心
这个工作其实在去年就已经有一个版本了,当时效果还没有很好。
MoCo V2 融合了MoCo V1 和 SimCLR V1算法, 是二者的集大成者,并且全面超越SimCLR
吸收了SimCLR的两个重要改进
- using an MLP projection head
- more data augmentation
- 无需SimCLR一样超大batch size, 普通8卡即可训练
- 代码开源
Introduction Recent
无监督学习的进展主要来源与对比学习,
- 更小的batch size
- 更少的资源消耗
- 更好的效果
In contrast to SimCLR’s large 4k∼8k batches, which require TPU support, our “MoCo v2” baselines can run on a typical 8-GPU machine and achieve better results than SimCLR.
2.背景
Contrastive learning
每张图片视为一个类.增强后属于同类
Contrastive learning可以根据key如何存储分为三种方式
- In an end-to-end mechanism (Fig. 1a) [13, 8, 17, 1, 9, 2], the negative keys are from the same batch and updated end- to-end by back-propagation. SimCLR
- SimCLR [2] is based on this mechanism and requires a large batch to provide a large set of negatives. I
- In the MoCo mechanism (Fig. 1b) [6], the neg- ative keys are maintained in a queue, and only the queries and positive keys are encoded in each training batch
SimCLR [2] improves the end-to-end variant of instance discrimination in three aspects:
(i) a sub- stantially larger batch (4k or 8k) that can provide more neg- ative samples;
(ii) replacing the output fc projection head [16] with an MLP head;
(iii) stronger data augmentation.
Experiments
Settings.
Unsupervised learning is conducted on the 1.28M ImageNet [3] training set.
We follow two common proto- cols for evaluation.
(i) ImageNet linear classification: features are frozen and a supervised linear classifier is trained; we report 1-crop (224×224), top-1 validation accuracy.
固定特征提取层
(ii) Transferring to VOC object detection [4]: a Faster R-CNN detector [14] (C4-backbone) is fine-tuned end-to-end on the VOC 07+12 trainval set1 and evaluated on the VOC 07 test set using the COCO suite of metrics [10].
MLP head: 将fc替换为两层fc(hidden layer 2048-d, with ReLU),这个层只在无监督阶段使用,
the linear classification or transferring stage does not use this MLP head. Also,