Deep transfer learning for image classification: a survey简读

Daft shiner

于 2022-06-14 19:50:22 发布

阅读量375

点赞数

分类专栏：论文分享文章标签：深度学习人工智能机器学习

本文链接：https://blog.csdn.net/weixin_46782905/article/details/125276526

版权

论文分享专栏收录该内容

29 篇文章 5 订阅

订阅专栏

最近一段时间在学transfer learning相关知识，所以本文就简单介绍一下2022年最新的一篇迁移学习综述。由于笔者有一定的兴趣方向，所以将挑自己感兴趣的进行写作，如需详细阅读详见原文（我写的很乱）：Deep transfer learning for image classification: a survey

文章目录

背景
- Definition of Deep Transfer Learning
- Definition of Negative Transfer
Datasets commonly used in transfer learning for image classification
- Fine-grained image classification datasets
Deep transfer learning progress and areas for improvement
Discussion
- Suggestions
- Recommendations for best practice

背景

这节主要介绍为什么要进行迁移学习，文章归纳为一下几点：

Insufficient data because the data is very rare or there are issues with privacy etc. For example new and rare disease diagnosis tasks in the medical domain have limited training data due to both the examples themselves being rare and privacy concerns.
It is prohibitively expensive to collect and/or label data. For example labelling can only be done by highly qualified experts in the field.
The long tail distribution where a small number of objects/words/classes are very frequent and thus easy to model, while many many more are rare and thus hard to model. For example most language generation problems.
It is interesting from a cognitive science perspective to attempt to mimic the human ability to learn general concepts from a small number of examples.
There may be restraints on compute resources that limit training a large model from random initialisation with large amounts of data. For example environmental concerns.

Definition of Deep Transfer Learning

在这里插入图片描述

Definition of Negative Transfer

在这里插入图片描述

Datasets commonly used in transfer learning for image classification

Source dataset	Target dataset
ImageNet 1K, 5K, 9K, 21K	~
Places365	SUN
Inaturalist	fine-grained plants or animal classes
~	CIFAR-10 and CIFAR-100
~	PASCAL VOC 2007
~	Caltech-101, Caltech-256

Fine-grained image classification datasets

Food-101 (Food),Birdsnap (Birds),Stanford Cars (Cars),FGVC Aircraft (Aircraft),Oxford-IIIT Pets (Pets),Oxford 102 Flowers (Flowers),Caltech-uscd Birds 200 (CUB),Stanford Dogs (Dogs)

Deep transfer learning progress and areas for improvement

早期研究的发现：

Deep transfer learning results in comparable or above state of the art performance in many different tasks, particularly when compared to shallow machine learning methods.
More pretraining both in terms of the number of training examples and the number of iterations tends to result in better performance.
Fine-tuning the weights on the target task tends to result in better performance particularly when the target dataset is larger and less
similar to the source dataset.
Transferring more layers tends to result in better performance when the source and target dataset and task are closely matched, but less layers are better when they are less related.
Deeper networks result in better performance.

Insights on best practice

Selecting the best model for the task

选模型

Choosing the best data for pretraining

选source dataset

Finding the best hyperparameters for finetuning

learning rate, learning rate decay, weight decay, and momentum对迁移性能的影响

Whether a multi-step transfer process is better than a single step process

不感兴趣

Which type of regularization to use

L2-SP, DELTA, BSS, stochastic normalization等方法

Which loss function to use

不感兴趣

Insights on transferability

They found that models trained from pretrained weights make similar mistakes on the target domain, have similar features and are surprisingly close in $l_2$ distance in the parameter space. They are in the same basins of the loss landscape.

Discussion

More source data is better in general, but a more closely related source dataset for pretraining will often produce better performance on the target task than a larger source dataset.
The size of the target dataset and how closely related it is to the source dataset strongly impacts the performance of transfer learning. In particular using sub-optimal transfer learning hyperparameters can result in negative transfer when the target dataset is less related and large enough to be trained from random initialisation

Suggestions

Both the learning rate and momentum should be lower during fine-tuning for more similar source and target tasks and higher for less closely related datasets. The learning rate should also be decayed more quickly the more similar the source and target tasks are, so as not to change the pretrained parameters as much. Similarly the learning rate should be decayed more quickly with smaller target datasets where the empirical risk estimate is likely to be less reliable and overfitting more of a problem. However, when the target data set is small it must be taken into account that the number of weight updates per epoch will be lower and the number of updates should be reduced, not necessarily the number of epochs. When the source and target datasets are less similar it may be optimal to fine-tune
higher layers at a higher learning rate than lower layers. More work is needed to show how the learning rate, momentum and number of updates before decaying the learning rate change when the source and target tasks are very different.