A Simple Framework for Contrastive Learning of Visual Representations

最新推荐文章于 2024-04-18 14:23:18 发布

MTandHJ

最新推荐文章于 2024-04-18 14:23:18 发布

阅读量505

点赞数

分类专栏： neural networks

本文链接：https://blog.csdn.net/MTandHJ/article/details/108856246

版权

neural networks 专栏收录该内容

143 篇文章 6 订阅

订阅专栏

文章目录

Chen T., Kornblith S., Norouzi M., Hinton G. A Simple Framework for Contrastive Learning of Visual Representations. arXiv: Learning, 2020.

@article{chen2020a,
title={A Simple Framework for Contrastive Learning of Visual Representations},
author={Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey E},
journal={arXiv: Learning},
year={2020}}

概

SimCLR 主要是利用augmentation来生成正负样本对, 虽然没有花里胡哨的结构, 但是通过细致的tricks比之前的方法更为有效.

主要内容

在这里插入图片描述

流程

流程是很简单的, 假设有一个batch的样本 $x$ , 然后从augmentation $\mathcal{T}$ 中随机选取俩个 $t, t^{'}$ , 由此得到两批数据 $\tilde{x}_i=t(x), \tilde{x}_j=t'(x)$ , 经过第一个encoder得到特征表示 $h_i,h_j$ , 再经由一个非线性变化 $g$ 得到 $z_i,z_j$ (注意这一步是和以往方法不同的点), 再由 $z_i, z_j$ 生成正负样本对(对应同一个样本的俩个样本构成正样本对, 否则为负样本对).

在这里插入图片描述

接下来先介绍一些比较重要的特别的tricks, 再介绍别的.

projection head g

一般方法只有一个encoder $f(\cdot)$ , SimCLR多了一个projection head $g(\cdot)$ , 它把第一次提到的特征再进行一次过滤:
$z_i = g(h_i)=W^{(2)} \sigma(W^{(1)}h_i),$
其中 $\sigma$ 为ReLU.

作者说, 这是为了过滤到由augmentation带来的额外的可分性, 让区分特征 $z$ 变得更为困难从而学习到更好的特征 $h$ .
注: 用于下游任务的特征是 $h$ 而非 $z$ !

在这里插入图片描述

上表是将特征 $h$ 或者 $z$ 用于一个二分类任务, 区分输入是否经过了特定的augmentation, 结果显示 $h$ 能够更好的分类, 意味着 $h$ 比 $z$ 含有更多的augmentation的信息.

constractive loss

$\tag{1} \ell_{ij}=-\log \frac{\exp(\mathrm{sim}(z_i,z_j)/\tau)}{\sum_{k\not=i} \exp(\mathrm{sim}(z_i,z_k)/\tau)},$
其中 $\mathrm{sim}(u,v)=u^Tv/\|u\|\|v\|$ .