A Simple Framework for Contrastive Learning of Visual Representations

1.framework

T:family of augmentations. sequentially apply three simple augmentation:random cropping followed by resize back to the original size, random color distortions, and random Gaussian blur

combination of random crop and color distortion is crucial to achieve a good performance.

f(.):encoder. without any constraint ---resnet50

g(.):projection head,learnable nonlinear transformation---MLP with one hidden layer(W-relu-W)

loss function  τ denotes a temperature parameter. loss decrease with respect to the similarity of positive pair. loss increase with respect to the similarity of negative pair.

evaluation Protocol:linear evaluation protocol---where a linear classifier is trained on top of the frozen base network, and test accuracy is used as a proxy for representation quality

2.experiment

data augmentation

 

 

conclusion:

1.no single transformation suffices to learn good representations

2.color histograms alone suffice to distinguish images. Neural nets may exploit this shortcut to solve the predictive task

3.it is critical to compose cropping with color distortion in order to learn generalizable features.

4.unsupervised contrastive learning benefits from stronger (color) data augmentation than supervised learning.

 

architecture for encoder and head

conclusion:

the gap between supervised models and linear classifiers trained on unsupervised models shrinks as the model size increases, suggesting that unsupervised learning benefits more from bigger models than its supervised counterpart.

 

projection head

the hidden layer before the projection head is a better representation than the layer after.

We conjecture that the importance of using the representation before the nonlinear projection is due to loss of information induced by the contrastive loss. In particular, z = g(h) is trained to be invariant to data transformation. Thus, g can remove information that may be useful for the downstream task, such as the color or orientation of objects.

Loss Functions 

 

conclusion:

1) L2 normalization (i.e. cosine similarity) along with temperature effectively weights different examples, and an appropriate temperature can help the model learn from hard negatives

2) unlike cross-entropy, other objective functions do not weigh the negatives by their relative hardness.( As a result, one must apply semi-hard negative mining (Schroff et al., 2015) for these loss functions: instead of computing the gradient over all loss terms, one can compute the gradient using semi-hard negative terms (i.e., those that are within the loss margin and closest in distance, but farther than positive examples). )

Batch Size

1)We find that, when the number of training epochs is small (e.g. 100 epochs), larger batch sizes have a significant advantage over the smaller ones.

2)With more training steps/epochs, the gaps between different batch sizes decrease or disappear, provided the batches are randomly resampled.

explanation:

1)larger batch sizes provide more negative examples, facilitating convergence (i.e. taking fewer epochs and steps for a given accuracy)

2)Training longer also provides more negative examples, improving the results.

 

Comparison with State-of-the-art

Linear evaluation.

semi-supervise learning

transfer learning

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值