CoDiM: Learning with Noisy Labels via Contrastive Semi-Supervised Learning 解读

Noisy label learning, semi-supervised learning, and contrastive learning are three different strategies for designing learning processes requiring less annotation cost.

作者将这三种方法融合起来,CSSL, a unified Contrastive Semi-Supervised Learning algorithm, and CoDiM (Contrastive DivideMix), a novel algorithm for learning with noisy labels。

然而当前的一些方法,在noise ratio比较高的时候,表现不佳。

Contrastive learning

Contrastive Learning (CL) approaches (Chen et al. 2020a; He et al. 2020; Chen et al. 2020b,c) have shown great potential on learning good representations by learning a feature extractor and a projector where in projection space, similar samples will be closer while dissimilar samples will be far apart。

现在对比学习可以用来实现:神经网络参数的初始化和无监督预训练的label corrector。

对比自监督学习可以学到更好的表示。

In an unsupervised manner, some methods treat different views from the same source as positive pairs, and views from different sources as negative pairs (Chen et al. 2020a). In a supervised way, with label supervision, views from the same class will be seen as positive pairs, and views from different classes will be regarded as negative pairs (Khosla et al. 2020)。

Semisupervised learning

Typical semi-supervised learning methods perform self-training by pseudo-labeling unlabeled data and design extra regularization objectives.

两种正则化的目标:

consistency regularization:encourages the model to generate consistent predictions on source data and randomly augmented views.

entropy minimization:low-entropy predictions with confidence

MixMatch (Berthelot et al. 2019b) incorporates MixUp augmentations (Zhang et al. 2017) and proposes a unified frame work containing both of these regularizations. Following its success, UDA (Xie et al. 2020), ReMixMatch (Berthelot et al. 2019a) and FixMatch (Sohn et al. 2020) proposes to use weakly augmented images to produce labels and enforce consistent predictions against strongly augmented samples through different designs.

Learn with noisy labels

面对噪声,也有两种不同的方案:

loss correction by estimating noise transition matrix,reweighting samples by designing criterions such as small-loss and prediction disagreement, or directly applying regularization through early-stop strategy.

correcting wrong labels by learning class prototypes (Han, Luo, and Wang 2019), predicting pseudo labels, or treating labels as learnable latent variables. 

DivideMix (Li, Socher, and Hoi 2020) proposes to learn with noisy labels in a semi-supervised learning manner and achieves impressive performance. It detects the noisy samples by fitting a Gaussian Mixture Model (GMM) with the training loss, regards them as unlabeled samples, and applies modified MixMatch. DM-AugDesc (Nishi et al. 2021) further explores augmentation strategies to boost DivideMix.

CSSL

The main difference between SelfCon and SupCon happens during loss calculation, as SelfCon will not use label supervision while SupCon will take categories into consideration.

SelfCon(self-supervised contrastive learning) Loss:

 SupCon(supervised contrastive learning) Loss:

For each anchor vector zi , only the other view generated from the same source zj(i) is seen as positive in SelfCon, yet in SupCon all the other views generated from data with the same label are seen as positive.

对于semi-supervised learning, 这个地方的processed labels 从哪里来的?

to measure entropy between processed labels and model’s predictions, Cross-Entropy (CE) loss and L2-Loss are commonly uesd.

CSSL 最重要的一点是:

employs SupCon to utilize label supervision of the labeled set, and uses SelfCon in two ways: 1) to provide self-supervised representation learning (a.k.a. SelfCon pretraining) on the whole dataset before multi-task learning; 2) to keep learning self-supervised features from the unlabeled set during the multi-objective optimization.

在第一阶段, 不考虑标签,进行self-contrastive learn,预训练好的模型参数作为第二阶段的初始参数。

第二阶段的半监督学习多了一个cls参数,无论是有标签还是无标签的都会经过这个参数去计算半监督的loss。对于对比学习的损失,self-con和sup-con分开计算,超参数去调节每个损失的权重。

CSSL with noisy labels

给定包含噪声的数据集,我们不知道噪声数据的分布,那么第一步常规的做法是设计一个模型去尝试将clean set 和noisy set分开,常用的方法是:choose samples with lower training loss based on the SSL classifier. To better leverage this measure, warming-up the classifier by training with traditional CE-loss for a few epochs is also a good choice。

CoDim:

第一阶段:忽略到所有的标签,使用selfCon pre-training。

然后使用CE-loss对于classifier head进行warm up。

第二阶段:利用Gaussian Mixture Module去判断噪声和无噪声数据的分布,然后使用contrastive semi-sepervised learning进行训练,其中使用DivideMix作为SSL 的模型。在这一步,only apply SupCon or SelfCon to the possibly clean set as we find that keep applying SelfCon to the possibly noisy set will downgrade the performance. Also, when dealing with high ratio label noise or noise among similar classes, we suggest to replace SupCon with SelfCon to learn from possibly clean set to further avoid learning from biases. 使用两个网络,一个网络使用另一个网络得出的partition threshold。

Following the ‘AugDesc-WS’ augmentation strategy (Nishi et al. 2021), we use so-called ‘weak augmentation’ (random crop and flip) to generate views for querying prediction, use so-called ‘strong augmentation’ (AutoAugment) to generate views for gradient descent.

且预训练的时候的augmentation的方法用的是SimCLR中的方法,第二阶段为了减少计算,使用了‘AugDesc-WS’中的strong augmentation 方法。

实验部分

Two types of label noise: symmetric and asymmetric are tested. Symmetric noise is produced by selecting a percentage of the training data and assigning them uniformly random labels. Asymmetric noise is generated to simulate real world noise, where only the labels of similar classes will be assigned.

有关发现:

First, SelfCon pre-training improves the performance of SSL, especially when the labeled ratio is low. This also supports the discovery that SelfCon pre-training provides more robust results when dealing with high ratio noise.

Secondly, contrastive learning helps the performance of the classifier, as CSSL always outperforms basic SSL algorithms in both cases. This empirically supports our claim that contrastive learning will further provide consistency regularization.

另外,当噪声率比较高的时候,CoDim-Self展现出了竞争力的表现:

the noise ratio is not extremely high, CoDiM-Sup outperforms other methods. However, CoDiM-Self shows competitive performances under high ratios of symmetric and asymmetric noise。

Weak augmentation involves random crop and horizontal flip and the strong augmentation used is AutoAugment following AugDesc-WS. 

总结起来,self-pretrain 非常关键,尤其labeled data比较少的时候。其次,标签率比较高的时候,使用supervised contrastive learning 在labeled data上面,效果更好,或者说当噪声率比较低的时候,CoDiM-Sup表现的不错;反过来,当标签率比较低的时候,self-con 表现好一点;但GMM可能不完全将clean set和noisy set分开,所以self-con处理到噪声集的时候,会degrade the performance。

此外,作者还提到了label correction的重要性。在第一阶段结束后,复制并且固定feature extractor 的参数,使用带有噪声的数据和CE-loss训练classifier head ,然后使用预测得到的概率最大的类别作为标签,然后随机初始化classifier head。 将这些数据都经过GMM 之后,分辨出clean set和noisy set。

同时,当symmetric noise的比率比较低的时候,使用不同的augmentation策略会是一个好的选择。

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值