DiNO (Knowledge Distillation with No Labels)（二）

最新推荐文章于 2024-08-08 07:41:21 发布

CL.LIANG

最新推荐文章于 2024-08-08 07:41:21 发布

阅读量1.5k

点赞数 22

分类专栏： pytorch图像处理文章标签：深度学习

本文链接：https://blog.csdn.net/Liang_Cailei/article/details/140311896

版权

2021年Facebook research团队发布DiNO模型后，于2023年又发布了DiNOv2。本文是对DiNOv2论文的学习总结，更多详细细节可以参考论文原稿。

dino

论文的创新点

Abstract: The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing generalpurpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. In terms of models, we train a ViT model (Dosovitskiy et al., 2021) with 1B parameters and distill it into a series of smaller models that surpass the best available general-purpose features, OpenCLIP (Ilharco et al., 2021) on most of the benchmarks at image and pixel levels.

在摘要中，该文提出了一个准备图像数据集的pipeline，可以用于自监督学习。另外，文章训练了一个大型的ViT模型，并使用数据蒸馏技术获得了多个小模型，这些模型展现出了超强的特征提取能力。另外，Github代码仓库的介绍中，作者指出， DiNOv2具备出色的图像特征提取能力，而且其发布的预训练模型使用了多大142M的图像数据，使其在各个领域都具备鲁棒性，不需要微调。

数据准备

因为DiNOv2使用了多大142M的图像数据进行自监督训练。所以，需要一个质量较高的数据处理pipeline准备数据。数据准备工作需要下面几个步骤：

数据来源

文章选择了ImageNet-22k, training split of ImageNet-1k, Google landmarks和其他的几个标准数据集作为curated dataset。另外，从网络上收集了大量的，未经过筛选的图像作为uncurated datatse。所有的数据集一共1.2billion张图片。