对比学习学习笔记
A Simple Framework for Contrastive Learning of Visual Representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton
Google Research, Brain Team
paper; code:tensorflow
文章的贡献:
- Introduce SimCLR (no specialized architectures and memory bank)
- 数据增强的组合是非常重要的;
- 需要一个表示和对比损失之间的可学习的非线性变换;
- 交叉熵的学习需要normalized embeddings 和一个合适的temperature parameter
- 与supervised learning相比,对比学习需要更大的batch size和迭代次数。
结果:在ImageNet上臂之前的self-supervised 和 semi-supervised learning效果更好:76.5%的top1 accuracy(7%的提升)。与supervised ResNet-50的效果相当。用1%样本fine-tune 在AlexNet上达到了85.5%的top-5 accuracy。
网络架构:
framework comprises the following four major components
-
A stochastic data augmentation module: 将一个图片做两种不同的变化,连续的做三种变换:
random cropping + resize back, random color distortions, and random Gaussian blur
-
A neural network base encoder f ( ⋅ ) f(·) f(⋅): extract representation vectors
-
A small neural network projection head g ( ⋅ ) g(·) g(⋅): maps representations to the contrastive loss space Nonlinear!!!
-
A contrastive loss function:
NT-Xent (the normalized temperature-scaled cross entropy loss),其中的sim是向量的cosine相似度
ℓ i , j = − log exp ( sim ( z i , z j ) / τ ) ∑ k = 1 2 N 1 [ k ≠ i ] exp ( sim ( z i , z k ) / τ ) \ell_{i, j}=-\log \frac{\exp \left(\operatorname{sim}\left(\boldsymbol{z}_{i}, \boldsymbol{z}_{j}\right) / \tau\right)}{\sum_{k=1}^{2 N} \mathbb{1}_{[k \neq i]} \exp \left(\operatorname{sim}\left(\boldsymbol{z}_{i}, \boldsymbol{z}_{k}\right) / \tau\right)} ℓi,j=−log∑k=12N1[k=i]exp(sim(zi,zk)/τ)exp(sim(zi,zj)/τ)
L = 1 2 N ∑ k = 1 N [ ℓ ( 2 k − 1 , 2 k ) + ℓ ( 2 k , 2 k − 1 ) ] \mathcal{L}=\frac{1}{2 N} \sum_{k=1}^{N}[\ell(2 k-1,2 k)+\ell(2 k, 2 k-1)] L=2N1∑k=1N[ℓ(2k−1,2k)+ℓ(2k,2k−1)]
网络训练的技巧:
5. 大的batch size: 256 to 8192:
6. 使用LARS optimizer(因为大的batchsize时,SGD/Momentum可能不稳定)一个解读
7. 使用Global BN