mongdb 群集_通过对比群集分配进行视觉特征的无监督学习

最新推荐文章于 2024-05-30 12:10:29 发布

weixin_26752765

最新推荐文章于 2024-05-30 12:10:29 发布

阅读量338

点赞数

文章标签： python 机器学习人工智能深度学习算法

原文链接：https://medium.com/@nainaakash012/unsupervised-learning-of-visual-features-by-contrasting-cluster-assignments-fbedc8b9c3db

版权

mongdb 群集

Self-supervised learning, semi-supervised learning, pretraining, self-training, robust representations, etc. are some of the hottest terms right now in the field of Computer Vision and Deep Learning. The recent progress in terms of self-supervised learning is astounding. Towards this end, researchers at FAIR have now come up with this new paper that introduces a new method to learn robust image representations.

自我监督学习，半监督学习，预训练，自我训练，鲁棒表示等是计算机视觉和深度学习领域中目前最热门的术语。自我监督学习方面的最新进展令人震惊。为此，FAIR的研究人员现在提出了这份新论文，其中介绍了一种学习鲁棒图像表示的新方法。

介绍 (Introduction)

One of the most important goals of self-supervised learning is to learn robust representations without using labels. Recent works try to achieve this goal by combining two elements: Contrastive loss and Image transformations. Basically, we want our model to learn more robust representations, not just high-level features, and to achieve a certain level of invariance to image transformations.

自我监督学习的最重要目标之一是在不使用标签的情况下学习可靠的表示形式。最近的工作试图通过结合两个要素来实现这一目标： 对比损失和图像变换。 基本上，我们希望我们的模型学习更鲁棒的表示，而不仅仅是高级功能，并实现图像转换的一定程度的不变性。

The contrastive loss explicitly compares pairs of image representations. It pushes away the representations that come from different images while pulling together the representations that come from a different set of transformations or views of the same image.

对比损失明确地比较了成对的图像表示。它将来自不同图像的表示推开，而将来自同一图像的一组不同变换或视图的表示汇总在一起。

Computing all the pairwise comparisons on a large dataset is not practical. There are two ways to overcome this constraint. First, instead of comparing all pairs, approximate the loss by reducing the comparison to a fixed number of random images. Second, instead of approximating the loss, we can approximate the task, e.g. instead of discriminating between each pair, discriminating between groups of images with similar features.

在大型数据集上计算所有成对比较是不实际的。有两种方法可以克服此约束。 首先，不是比较所有对，而是通过将比较减少为固定数量的随机图像来近似损失。 第二，代替近似损失，我们可以近似任务，例如，代替区别每对，区别具有相似特征的图像组。

Clustering is a good example of this kind of approximation. However, clustering alone can’t solve the problem as it has its own limitations. For example, the objective of clustering doesn’t scale well with the dataset as it requires a pass over the entire dataset to form image codes during training.

聚类就是这种近似的一个很好的例子。但是，仅集群无法解决问题，因为它有其自身的局限性。例如，聚类的目标无法随数据集很好地扩展，因为它需要在训练过程中遍历整个数据集以形成图像代码 。

提案 (Proposal)

To overcome the limitations listed above, the authors proposed the following:

为了克服上述限制，作者提出了以下建议：

Online Clustering Loss: The authors propose a scalable loss function that works both on large as well as small batch sizes and doesn’t require extra stuff like a memory bank or a momentum encoder. Theoretically, it can be scaled to an unlimited amount of data.
在线聚类损失：作者提出，在大型以及小批量的作品既并且不需要像记忆库或气势编码器额外的东西一个可伸缩的损失函数。从理论上讲，它可以扩展到无限量的数据。
Multi-Crop Strategy: A new augmentation technique that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much.
多作物策略：一种新的增强技术，它使用具有不同分辨率的视图混合来代替两个全分辨率视图，而又不增加内存或计算需求。
Combining the above two into a single model that outperforms all other SSL methods as well as pretraining on multiple downstream tasks.
将上述两种方法组合成一个模型，该模型优于所有其他SSL方法以及对多个下游任务的预训练。

方法 (Method)

The ultimate goal of this exercise is to learn visual features in an online manner without supervision. To achieve this, the authors propose an online clustering-based self-supervised learning.

该练习的最终目标是在没有监督的情况下以在线方式学习视觉功能。为了实现这一目标，作者提出了一种基于在线聚类的自我监督学习方法。

But how is it different from typical clustering approaches?

但是，它与典型的聚类方法有何不同？

Typical clustering methods like DeepClutsering are offline as they rely on two steps in general. In the first step, you cluster the image features of the entire dataset and in the second step, we predict the clusters or the codes for different image views. The fact that these methods require multiple passes over the dataset makes them unsuitable for online learning. Let us see how the authors tackle these problems step by step.

DeepClutsering之类的典型集群方法通常都需要两个步骤，因此它们处于脱机状态 。第一步，对整个数据集的图像特征进行聚类；在第二步中，我们预测不同图像视图的聚类或代码。这些方法需要对数据集进行多次遍历，这一事实使其不适用于在线学习。让我们看看作者如何逐步解决这些问题。

在线聚类 (Online Clustering)

We have an image transformation set T. Each image xn is transformed into an augmented view xnt by applying a transformation t sampled from T.
我们有一个图像变换集T。通过应用从T采样的变换t，每个图像xn被变换为增强视图xnt 。
The augmented view, xnt, is then mapped to a vector representation by applying a non-linear mapping.
然后，通过应用非线性映射将增强视图xnt映射到矢量表示。
This feature vector is then projected to a unit sphere, which IMO is just a normalization process. Let’s take a look at the order again:
然后将此特征向量投影到单位球上，IMO只是一个标准化过程。让我们再次看一下顺序：

4. We then compute the codes qnt, for this vector znt by mapping it to a set of. K trainable prototype vectors {c₁, c₂…..c_k}. The matrix formed by these vectors is denoted by C.

4.然后，我们计算代码qnt 通过将向量znt映射到一组向量。 K个可训练的原型向量{c1，c2 ..... c_k}。这些向量形成的矩阵用C表示。

交换预测问题 (Swapped Prediction Problem)

We talked about image transformation, the feature vector projection, and code computation(q) but we haven’t discussed why are we doing it this way. As said earlier, one of the goals of this whole exercise is to learn visual features online without any supervision. We want our models to learn robust representations that are consistent across different image views.

我们讨论了图像变换，特征向量投影和代码计算(q)，但我们没有讨论为什么要这样做。如前所述，整个练习的目标之一是在没有任何监督的情况下在线学习视觉特征。我们希望我们的模型学习在不同图像视图之间一致的鲁棒表示。

The authors propose to enforce consistency between codes from different augmentations of the same image. This is inspired by contrastive learning but the difference is that instead of directly comparing the feature vectors, we would compare the cluster assignment for different image views. How?

作者建议在同一张图片的不同扩充内容之间强制执行代码之间的一致性。这是受对比学习启发的，但不同之处在于，我们将比较不同图像视图的聚类分配，而不是直接比较特征向量。 怎么样？

Once we have computed the codes zt and zs from two different augmentations of the same image, we would compute the codes qt and qs by mapping the features vectors to the K prototypes. The authors then propose to use a swapped prediction problem with the following function:

一旦我们从同一张图像的两个不同扩充中计算出代码zt和zs ，就可以通过将特征向量映射到K个原型来计算代码qt和qs 。然后，作者建议使用具有以下功能的交换预测问题：

Each term on the right-hand side in this equation is cross-entropy loss measures the fit between feature z and code q. The intuition behind this is that if the two features capture the same information, it should be possible to predict the code from the other feature. It is almost similar to contrastive learning but here we are comparing the codes instead of the features directly. If we expand one of the terms on the right-hand side, it looks like this:

该方程式右侧的每个项都是交叉熵损失 测量特征z和代码q之间的拟合。这背后的直觉是，如果两个功能捕获相同的信息，则应该可以从另一个功能预测代码。这几乎与对比学习类似，但是在这里我们直接比较代码而不是特征。如果我们在右侧扩展术语之一，则如下所示：

Here the softmax operation is applied on the dot product of z and C. The term Τ is the temperature parameter. Taking this loss over all the images and pairs of data augmentations leads to the following loss function for the swapped prediction problem:

此处，softmax操作应用于z和C的点积。项Τ是温度参数。对所有图像和数据增强对进行这种损失会导致以下损失函数用于交换预测问题：

This loss function is jointly minimized with respect to the prototypes C and the parameters θ of the image encoder f, used to produce the features znt

相对于原型C和用于生成特征znt的图像编码器f的参数θ，该损失函数被共同最小化。

在线代码计算 (Online Codes computation)

When we started this discussion, we talked about offline vs online clustering, but we haven’t looked at how this method is online.

在开始讨论时，我们谈到了离线群集和在线群集，但是我们没有研究此方法如何在线。

In order to make this method online, the authors propose to computed codes using only image features within a batch. The codes are computed using the prototypes C such that all the examples in a batch are equally partitioned by the prototype. The equipartition constraint is very important here as it ensures that the codes for different images in a batch are distinct, thus preventing the trivial solution where every image has the same code.

为了使此方法在线，作者建议仅使用批处理中的图像特征来计算代码。使用原型C计算代码，使得一批中的所有示例均由原型平均划分。等分约束在这里非常重要，因为它确保了批次中不同图像的代码是不同的，从而避免了每个图像都具有相同代码的简单解决方案。

Given B feature vectors Z = [z₁, z₂, . . . , z_B], we are interested in mapping them to the prototypes C = [c₁, . . . , c_K]. This mapping or the codes are represented by Q = [q₁, . . . , qB], and Q is optimized to maximize the similarity between the features and the prototypes, i.e.

给定B个特征向量Z ＝ [z 1，z 2，...。。。，z_B]，我们有兴趣将它们映射到原型C = [c₁，...。。。，c_K]。此映射或代码由Q = [q₁，...表示。。。，qB]和Q进行了优化，以最大化特征和原型之间的相似性，即

where H(Q) is the entropy function, and ε is a parameter that controls the smoothness of the mapping. The above expression represents the optimal transport problem (more about it later). We have the features and the prototypes and now with that, we want to find the optimal codes. The entropy term on the right-hand helps in equipartition (Please correct me in the comments section if I am wrong).

其中H(Q)是熵函数，而ε是控制映射平滑度的参数。上面的表达式代表了最佳的运输问题(稍后会详细介绍)。我们具有功能和原型，现在，我们想要找到最佳代码。右边的熵术语有助于均分( 如果我错了，请在评论部分中对我进行纠正 )。

Also, as we are working on mini-batches, the constraint is imposed on mini-batches and looks something like this:

另外，当我们在迷你批处理上工作时，对迷你批处理施加了约束，它看起来像这样：

where 1_K denotes the vector of ones in dimension K. These constraints enforce that on average each prototype is selected at least (B / K) times in the batch.

其中1_K表示维度K中1的向量。这些约束使得每个批次中的每个原型平均至少选择(B / K)次。

Once a solution Q* is found for (3), there are two options we can go with. First, we can directly use the soft codes. Second, we can get discrete codes by rounding the solution. The authors found out that discrete codes work well when computing codes in an offline manner on the full dataset. However, in the online setting where we are working with mini-batches, using the discrete codes performs worse than using the continuous codes. An explanation is that the rounding needed to obtain discrete codes is a more aggressive optimization step than gradient updates. While it makes the model converge rapidly, it leads to a worse solution. These soft codes Q* takes the form of a normalized exponential matrix.

找到(3)的解Q *后，我们可以使用两种选择。首先，我们可以直接使用软代码。其次，通过四舍五入可以得到离散代码。作者发现，离散代码在完整数据集上以离线方式计算代码时效果很好。但是，在我们使用迷你批处理的在线设置中，使用离散代码比使用连续代码更糟糕。一种解释是获得离散代码所需的舍入是比梯度更新更积极的优化步骤。尽管它使模型快速收敛，但会导致更糟糕的解决方案。这些软代码Q *采用归一化指数矩阵的形式。

Here u and v are renormalization vectors computed using a small number of matrix multiplications using the iterative Sinkhorn-Knopp algorithm.

u和v是使用迭代Sinkhorn-Knopp算法使用少量矩阵乘法计算的重归一化向量。

Side note: Thanks to Amit Chaudhary for pointing out the relevant resources for the transportation polytope and Sinkhorn-Knopp algorithm. You can read about these two in detail here and here

旁注：感谢 Amit Chaudhary 指出运输多态性和Sinkhorn-Knopp算法的相关资源。 你可以阅读一下这两个详细 她的 E和这里

小批量工作 (Working with small batches)

When the number B of batch features is too small compared to the number of prototypes K, it is impossible to equally partition the batch into the K prototypes. Therefore, when working with small batches, the authors use features from the previous batches to augment the size of Z in (3), and only the codes of the batch features are used in the training loss.

当批次特征的数量B与原型K的数量相比太小时，无法将批次均等地划分为K个原型。因此，在处理小批次时，作者使用先前批次中的特征来增加(3)中的Z的大小，并且训练损失中仅使用了批次特征的代码。

The authors propose to store around 3K features, i.e., in the same range as the number of code vectors. This means that they only keep features from the last 15 batches with a batch size of 256, while contrastive methods typically need to store the last 65K instances obtained from the last 250 batches.

作者建议存储3K左右的特征，即与代码向量的数量在同一范围内。这意味着它们仅保留最近15个批次中的特征(批次大小为256)，而对比方法通常需要存储从最近250个批次中获得的最近65K实例。

All of the above information is related to online clustering only. Nowhere you discussed the new augmentation strategy. Trying to keep the blogpost short? Huh!

以上所有信息仅与在线群集有关。 您无处讨论新的扩充策略。 试图使博客文章简短吗？ ！

多幅裁剪：以较小的图像增强视图 (Multi-crop: Augmenting views with smaller images)

It is a known fact that random crops always help (both in supervised as well as in self-supervised). Comparing random crops of an image plays a central role by capturing information in terms of relations between parts of a scene or an object.

众所周知，随机作物总是有帮助的(无论是在有监督的还是在有自我监督的情况下)。通过捕获场景或对象各部分之间的关系信息，比较图像的随机作物起着核心作用。

Perfect. Let’s take crops of sizes 4x4, 8x8, 16x16, 32x32, ….. Enough data to make the bloody network learn, ha!

完善。 让我们以4x4、8x8、16x16、32x32等大小的农作物为例。..足够的数据使血腥的网络学会了，哈！

Well, you can do that, but increasing the number of crops quadratically increases the memory and compute requirements. To address this, the authors proposed a new multi-crop strategy where they use:

好的，您可以这样做，但是增加农作物的数量将二次增加内存和计算需求。为了解决这个问题，作者提出了一种新的多作物策略，他们在其中使用：

Two standard resolution crops.
两种标准分辨率作物。
V additional low-resolution crops that cover only small parts of the image.
V仅覆盖图像的一小部分的其他低分辨率作物。

The loss function in (1) is then generalized as:

然后将(1)中的损失函数概括为：

The codes are computed using only the standard resolution crops. It is intuitive that if you include all the crops, it will increase the computational time. Also, if the crops are taken over a very small area, it won’t add much info, and this, very limited, partial information can degrade the overall performance.

仅使用标准分辨率作物计算代码。直观地讲，如果包括所有农作物，则会增加计算时间。另外，如果将农作物收在很小的区域，则不会增加太多信息，而这种非常有限的部分信息会降低整体性能。

结果 (Results)

The authors performed a bunch of experiments. I won’t be listing down all the training details here. You can read about them directly from the paper. One important thing to note is that most of the hyperparameters were directly taken from the SimCLR paper along with the LARS optimizer with cosine learning rate, and the MLP projection head. I am listing down some of the results here.

作者进行了大量实验。我不会在这里列出所有培训详细信息。您可以直接从纸上阅读它们。需要注意的重要一件事是，大多数超参数直接取自SimCLR论文，以及具有余弦学习速率的LARS优化器 r和MLP投影头。 我在这里列出了一些结果。

结论 (Conclusion)

I liked this paper a lot. IMHO, this is one of the best papers on SSL to date. Not only it tries to address the problems associated with instance discrimination task and contrastive learning but it also proposes a very creative solution to move forward. The biggest strength of this method is that it is online.

我非常喜欢这篇论文。恕我直言，这是迄今为止关于SSL的最佳论文之一。它不仅试图解决与实例区分任务和对比学习有关的问题，而且还提出了一个非常有创意的解决方案来向前发展。这种方法的最大优势是在线。

翻译自: https://medium.com/@nainaakash012/unsupervised-learning-of-visual-features-by-contrasting-cluster-assignments-fbedc8b9c3db

mongdb 群集

weixin_26752765

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
mongdb 群集_通过对比群集分配进行视觉特征的无监督学习

Self-supervised learning, semi-supervised learning, pretraining, self-training, robust representations, etc. are some of the hottest terms right now in the field of Computer Vision and Deep Learning. ...
复制链接

扫一扫