A Simple Framework for Contrastive Learning of Visual Representations

最新推荐文章于 2024-04-08 15:04:55 发布

吴俊达9812

最新推荐文章于 2024-04-08 15:04:55 发布

阅读量346

点赞数

文章标签：深度学习

本文链接：https://blog.csdn.net/ehdhg13455/article/details/109570150

版权

1.framework

T：family of augmentations. sequentially apply three simple augmentation:random cropping followed by resize back to the original size, random color distortions, and random Gaussian blur

combination of random crop and color distortion is crucial to achieve a good performance.

f(.)：encoder. without any constraint ---resnet50

g(.)：projection head,learnable nonlinear transformation---MLP with one hidden layer（W-relu-W）

loss function ： τ denotes a temperature parameter. loss decrease with respect to the similarity of positive pair. loss increase with respect to the similarity of negative pair.

evaluation Protocol:linear evaluation protocol---where a linear classifier is trained on top of the frozen base network, and test accuracy is used as a proxy for representation quality

2.experiment

data augmentation

conclusion:

1.no single transformation suffices to learn good representations

2.color histograms alone suffice to distinguish images. Neural nets may exploit this shortcut to solve the predictive task

3.it is critical to compose cropping with color distortion in order to learn generalizable features.

4.unsupervised contrastive learning benefits from stronger (color) data augmentation than supervised learning.

architecture for encoder and head

conclusion:

the gap between supervised models and linear classifiers trained on unsupervised models shrinks as the model size increases, suggesting that unsupervised learning benefits more from bigger models than its supervised counterpart.

projection head

the hidden layer before the projection head is a better representation than the layer after.

We conjecture that the importance of using the representation before the nonlinear projection is due to loss of information induced by the contrastive loss. In particular, z = g(h) is trained to be invariant to data transformation. Thus, g can remove information that may be useful for the downstream task, such as the color or orientation of objects.

Loss Functions

conclusion:

1) L2 normalization (i.e. cosine similarity) along with temperature effectively weights different examples, and an appropriate temperature can help the model learn from hard negatives

2) unlike cross-entropy, other objective functions do not weigh the negatives by their relative hardness.( As a result, one must apply semi-hard negative mining (Schroff et al., 2015) for these loss functions: instead of computing the gradient over all loss terms, one can compute the gradient using semi-hard negative terms (i.e., those that are within the loss margin and closest in distance, but farther than positive examples). )

Batch Size

1)We find that, when the number of training epochs is small (e.g. 100 epochs), larger batch sizes have a significant advantage over the smaller ones.

2)With more training steps/epochs, the gaps between different batch sizes decrease or disappear, provided the batches are randomly resampled.

explanation:

1)larger batch sizes provide more negative examples, facilitating convergence (i.e. taking fewer epochs and steps for a given accuracy)

2)Training longer also provides more negative examples, improving the results.

Comparison with State-of-the-art

Linear evaluation.

semi-supervise learning

transfer learning

吴俊达9812

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
A Simple Framework for Contrastive Learning of Visual Representations

1.frameworkT：family of augmentations. sequentially apply three simple augmentation:random cropping followed by resize back to the original size, random color distortions, and random Gaussian blurcombination of random crop and color distortion is cruc
复制链接

扫一扫