Jo-SRC: A Contrastive Approach for Combating Noisy Labels

Jo-SRC:一种对抗噪声标签的对比方法

Abstract

Due to the memorization effect in Deep Neural Networks (DNNs), training with noisy labels usually results in inferior model performance. 由于深度神经网络 (DNN) 的记忆效应,使用嘈杂标签进行训练通常会导致模型性能较差。

Existing state-of-the-art methods primarily adopt a sample selection strategy, which selects small-loss samples for subsequent training.现有最先进的方法主要采用样本选择策略,该策略选择小损失样本进行后续训练。

However, prior literature tends to perform sample selection within each mini-batch, neglecting the imbalance of noise ratios in different mini-batches. 然而,先前的文献倾向于在每个小批量内进行样本选择,而忽略了不同小批量中噪声比的不平衡。

Moreover, valuable knowledge within high-loss samples is wasted. 此外,浪费了高损失样本中的宝贵知识。

To this end, we propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency).为此,我们提出了一种名为 Jo-SRC(基于一致性的联合样本选择模型正则化)的噪声鲁棒方法。

Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its “likelihood” of being clean or out-of-distribution. 具体来说,我们以对比学习的方式训练网络。 来自每个样本的两个不同视图的预测用于估计其干净或分布外的“可能性”。

Furthermore, we propose a joint loss to advance the model generalization performance by introducing consistency regularization. 此外,我们提出了一种联合损失,通过引入一致性正则化来提高模型泛化性能。

Extensive experiments have validated the superiority of our approach over existing state-of-the-art methods. 大量实验证实了我们的方法优于现有的最先进方法。

1. Introduction

DNNs have recently lead to tremendous progress in various computer vision tasks [14, 28, 42, 25, 40, 21]. These successes largely attribute to large-scale datasets with reliable annotations (e.g., ImageNet [4]). DNN 最近在各种计算机视觉任务中取得了巨大进步 [14、28、42、25、40、21]。 这些成功很大程度上归功于具有可靠注释的大规模数据集(例如,ImageNet [4])。

However, collecting well-annotated datasets is extremely labor-intensive and time-consuming, especially in domains where expert knowledge is required (e.g., fine-grained categorization [37, 36]). 然而,收集标注良好的数据集非常耗费人力和时间,尤其是在需要专家知识的领域(例如,细粒度分类 [37, 36])。

The high cost of acquiring large-scale well-labeled data poses a bottleneck in employing DNNs in real-world scenarios.获取大规模标记良好的数据的高成本是在实际场景中使用 DNN 的瓶颈。

As an alternative, employing web images to train DNNs has received increasing attention recently [20, 41, 43, 34,46, 45, 52, 53, 32]. 作为替代方案,使用网络图像训练 DNN 最近受到越来越多的关注 [20, 41, 43, 34,46, 45, 52, 53, 32]。

Unfortunately, whereas web images are cheaper and easier to obtain via image search engines [5, 29, 47, 44], they usually yield inevitable noisy labels due to the error-prone automatic tagging system or non-expert annotations [23, 32, 46, 48]. 不幸的是,虽然网络图像更便宜且更容易通过图像搜索引擎获取 [5, 29, 47, 44],但由于容易出错的自动标记系统或非专家注释,它们通常会产生不可避免的噪声标签 [23, 32, 46, 48]。

A recent study has suggested that samples with noisy labels would be unavoidably overfitted by DNNs and consequently cause performance degradation [15, 51].最近的一项研究表明,带有噪声标签的样本不可避免地会被 DNN 过度拟合,从而导致性能下降 [15, 51]。

To alleviate this issue, many methods have been proposed for learning with noisy labels. 为了缓解这个问题,人们提出了许多使用噪声标签进行学习的方法。

Early approaches primarily attempt to correct losses during training. Some methods correct losses by introducing a noise transition matrix [31, 24, 6, 11]. 早期方法主要尝试在训练期间纠正损失。 一些方法通过引入噪声转换矩阵来纠正损耗 [31, 24, 6, 11]。

However, estimating the noise transition matrix is challenging, requiring either prior knowledge or a subset of well-labeled data. Some methods design noise-robust loss functions which correct losses according to predictions of DNNs [26, 55, 34].然而,估计噪声转移矩阵具有挑战性,需要先验知识或标记良好的数据子集。 一些方法设计了抗噪声损失函数,根据 DNN 的预测来校正损失 [26, 55, 34]。

 However, these methods are prone to fail when the noise ratio is high.然而,当噪声比高时,这些方法容易失败。

Another active research direction in mitigating the negative effect of noisy labels is training DNNs with selected or reweighted training samples [12, 27, 22, 8, 50, 38, 32].减轻噪声标签负面影响的另一个活跃研究方向是使用选定的重新加权的训练样本 [12, 27, 22, 8, 50, 38, 32]。

 The challenge is to design a proper criterion for identifying clean samples. It has been recently observed that DNNs have a memorization effect and tend to learn clean and simple patterns before overfitting noisy labels [15, 51]. 面临的挑战是设计一个合适的标准来识别干净的样品。最近观察到 DNN 具有记忆效应,并且倾向于在过度拟合嘈杂标签之前学习干净和简单的模式 [15, 51]。

Thus, state-of-the-art methods (e.g., Co-teaching [50], Co-teaching+ [50],and JoCoR [38]) propose to select a human-defined proportion of small-loss samples as clean ones. 因此,最先进的方法(例如,Co-teaching [50]、Co-teaching+ [50] 和 JoCoR [38])建议选择人类定义的小损失样本比例作为干净样本。

Although promising performance gains have been witnessed by employing the small-loss sample selection strategy, these methods tend to assume that noise ratios are identical among all mini-batches.虽然通过采用小损失样本选择策略已经见证了有希望的性能提升,但这些方法往往假设所有小批量之间的噪声比是相同的。

 Hence, they perform sample selection within each mini-batch based on an estimated noise rate. However, this assumption may not hold true in real-world cases, and the noise rate is also challenging to estimate accurately (e.g., Clothing1M [39]). 因此,他们根据估计的噪声率在每个小批量中执行样本选择。 然而,这个假设在现实世界中可能不成立,而且噪声率也很难准确估计(例如,Clothing1M [39])。

Furthermore, existing literature mainly focuses on closed-set scenarios, in which only in distribution (ID) noisy samples are considered. In open-set cases (i.e., real-world cases), both in-distribution (ID) and out-of-distribution (OOD) noisy samples exist. 此外,现有文献主要关注闭集场景,其中仅考虑分布 (ID) 中的噪声样本。 在开放集的情况下(即现实世界的情况),分布内 (ID) 和分布外 (OOD) 噪声样本都存在。

Motivated by the self-supervised contrastive learning [3, 7], we propose a simple yet effective approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency) to address aforementioned issues.受自监督对比学习 [3, 7] 的启发,我们提出了一种名为 Jo-SRC(基于一致性的联合样本选择和模型正则化)的简单而有效的方法来解决上述问题。

Specifically, we first feed two different views of an image into a backbone network and predict two corresponding softmax probabilities accordingly. Then we divide samples based on two likelihood metrics.具体来说,我们首先将图像的两个不同视图输入主干网络,并相应地预测两个对应的 softmax 概率。 然后我们根据两个似然度量划分样本。

We measure the likelihood of a sample being clean using the Jensen-Shannon divergence between its predicted probability distribution and its label distribution.我们使用预测概率分布与其标签分布之间的 Jensen-Shannon 散度来测量样本清洁的可能性。

We measure the likelihood of a sample being OOD based on the prediction disagreement between its two views. Subsequently, clean samples are trained conventionally to fit their given labels.我们根据样本的两个视图之间的预测差异来衡量样本是 OOD 的可能性。 随后,按照惯例训练干净的样本以适应它们给定的标签。

ID and OOD noisy samples are re-labeled by a mean-teacher model before they are back-propagated for updating network parameters. Finally, we propose a joint loss, including a classification term and a consistency regularization term, to further advance model performance. 
ID 和 OOD 噪声样本在反向传播以更新网络参数之前由均值教师模型重新标记。 最后,我们提出了一个联合损失,包括一个分类项和一个一致性正则化项,以进一步提高模型性能。

A comparison between Jo-SRC and existing sample selection methods is provided in Figure 1. The major contributions of this work are:图 1 提供了 Jo-SRC 和现有样本选择方法之间的比较。这项工作的主要贡献是:

(1) We propose a simple yet effective contrastive approach named Jo-SRC to alleviate the negative effect of noisy labels. Jo-SRC trains the network with a joint loss, including a cross-entropy term and a consistency term, to obtain higher classification and generalization performance.我们提出了一种简单而有效的对比方法 Jo-SRC 来减轻噪声标签的负面影响。 Jo-SRC 使用联合损失训练网络,包括交叉熵项和一致性项,以获得更高的分类和泛化性能。

(2) Our proposed Jo-SRC selects clean samples globally by adopting the Jensen-Shannon divergence to measure the likelihood of each sample being clean. We also propose to distinguish ID noisy samples and OOD noisy ones based on the prediction consistency between samples’ different views. ID and OOD noisy samples are relabeled by a mean-teacher network before being used for network update.(2) 我们提出的 Jo-SRC 通过采用 Jensen-Shannon 散度来衡量每个样本清洁的可能性,从而在全球范围内选择清洁样本。 我们还建议根据样本不同视图之间的预测一致性来区分 ID 噪声样本和 OOD 噪声样本。 ID 和 OOD 噪声样本在用于网络更新之前由平均教师网络重新标记

(3) By providing comprehensive experimental results, we show that Jo-SRC significantly outperforms state-of-the-art methods on both synthetic and real-world noisy datasets. Furthermore, extensive ablation studies are conducted to validate the effectiveness of our approach. 通过提供全面的实验结果,我们表明 Jo-SRC 在合成和现实世界的嘈杂数据集上都显着优于最先进的方法。 此外,还进行了广泛的消融研究以验证我们方法的有效性。

2. Related Works

Existing works on learning with noisy labels can be briefly categorized into the following two subsets [32]: 1) Loss Correction and 2) Sample Selection.现有的关于噪声标签学习的工作可以简单地分为以下两个子集 [32]:1) 损失校正和 2) 样本选择。

Loss correction. A large proportion of existing literature on training with noisy labels focuses on loss correction approaches. Some methods endeavor to estimate the noise transition matrix [31, 2, 24, 6, 11]. 损失修正。 关于使用噪声标签进行训练的大部分现有文献都侧重于损失校正方法。 一些方法试图估计噪声转移矩阵 [31, 2, 24, 6, 11]。

For example, Patrini et al. [24] provided a loss correction method to estimate the noise transition matrix by using a deep network trained on the noisy dataset. 例如,帕特里尼等人。 [24] 提供了一种损失校正方法,通过使用在噪声数据集上训练的深度网络来估计噪声转移矩阵。

However, these methods are limited in that the noise transition matrix is challenging to estimate accurately and may not be feasible in real-world scenarios. Some methods attempt to design noise-tolerant loss functions [26, 55, 34]. 然而,这些方法的局限性在于噪声转移矩阵难以准确估计,并且在现实世界中可能不可行。 一些方法试图设计抗噪损失函数 [26, 55, 34]。

For example, the bootstrapping loss [26] extended the conventional cross-entropy loss with a perceptual term. However, these methods fail to perform well in real-world cases when the noise ratio is high.例如,bootstrapping loss [26] 用感知项扩展了传统的交叉熵损失。 然而,当噪声比很高时,这些方法在现实世界中表现不佳。

Sample Selection. Another idea of dealing with noisy labels is to select and remove corrupted data. The problem is to find proper sample selection criteria. 样本选择。 处理嘈杂标签的另一个想法是选择并删除损坏的数据。 问题是找到合适的样本选择标准。

It has been shown that DNNs tend to learn simple patterns first before memorizing noisy data [15, 51]. Resorting to this observation, the small-loss sample selection criterion has been widely adopted: samples with lower loss values are more likely to have clean labels. 已经表明,DNN 倾向于先学习简单的模式,然后再记住嘈杂的数据 [15, 51]。 根据这一观察,小损失样本选择标准已被广泛采用:损失值较低的样本更有可能具有干净的标签。

For example, Co-teaching [8] proposed to maintain two networks simultaneously during training, with one network learning from the other network’s selected small-loss samples. 例如,Co-teaching [8] 建议在训练期间同时维护两个网络,一个网络从另一个网络选择的小损失样本中学习。

JoCoR [38] proposed to use a joint loss, including the conventional cross-entropy loss and the co-regularization loss, to select small-loss samples. However, above methods select samples within each mini-batch based on a human-defined drop rate. JoCoR [38] 提出使用联合损失,包括传统的交叉熵损失和共同正则化损失,来选择小损失样本。 然而,上述方法根据人为定义的下降率在每个小批量中选择样本。

In real-world scenarios, noise ratios in different mini-batches are not guaranteed to be identical, and the drop rate is challenging to estimate.在实际场景中,不能保证不同 mini-batch 中的噪声比完全相同,并且丢失率很难估计。

3. The Proposed Method

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值