Deep transfer learning for image classification: a survey简读

最近一段时间在学transfer learning相关知识,所以本文就简单介绍一下2022年最新的一篇迁移学习综述。由于笔者有一定的兴趣方向,所以将挑自己感兴趣的进行写作,如需详细阅读详见原文(我写的很乱):Deep transfer learning for image classification: a survey

背景

这节主要介绍为什么要进行迁移学习,文章归纳为一下几点:

  • Insufficient data because the data is very rare or there are issues with privacy etc. For example new and rare disease diagnosis tasks in the medical domain have limited training data due to both the examples themselves being rare and privacy concerns.
  • It is prohibitively expensive to collect and/or label data. For example labelling can only be done by highly qualified experts in the field.
  • The long tail distribution where a small number of objects/words/classes are very frequent and thus easy to model, while many many more are rare and thus hard to model. For example most language generation problems.
  • It is interesting from a cognitive science perspective to attempt to mimic the human ability to learn general concepts from a small number of examples.
  • There may be restraints on compute resources that limit training a large model from random initialisation with large amounts of data. For example environmental concerns.

Definition of Deep Transfer Learning

在这里插入图片描述

Definition of Negative Transfer

在这里插入图片描述

Datasets commonly used in transfer learning for image classification

Source datasetTarget dataset
ImageNet 1K, 5K, 9K, 21K~
Places365SUN
Inaturalistfine-grained plants or animal classes
~CIFAR-10 and CIFAR-100
~PASCAL VOC 2007
~Caltech-101, Caltech-256

Fine-grained image classification datasets

Food-101 (Food),Birdsnap (Birds),Stanford Cars (Cars),FGVC Aircraft (Aircraft),Oxford-IIIT Pets (Pets),Oxford 102 Flowers (Flowers),Caltech-uscd Birds 200 (CUB),Stanford Dogs (Dogs)

Deep transfer learning progress and areas for improvement

早期研究的发现:

  • Deep transfer learning results in comparable or above state of the art performance in many different tasks, particularly when compared to shallow machine learning methods.
  • More pretraining both in terms of the number of training examples and the number of iterations tends to result in better performance.
  • Fine-tuning the weights on the target task tends to result in better performance particularly when the target dataset is larger and less
    similar to the source dataset.
  • Transferring more layers tends to result in better performance when the source and target dataset and task are closely matched, but less layers are better when they are less related.
  • Deeper networks result in better performance.

Insights on best practice

Selecting the best model for the task

选模型

Choosing the best data for pretraining

选source dataset

Finding the best hyperparameters for finetuning

learning rate, learning rate decay, weight decay, and momentum对迁移性能的影响

Whether a multi-step transfer process is better than a single step process

不感兴趣

Which type of regularization to use

L2-SP, DELTA, BSS, stochastic normalization等方法

Which loss function to use

不感兴趣

Insights on transferability

They found that models trained from pretrained weights make similar mistakes on the target domain, have similar features and are surprisingly close in l 2 l_2 l2 distance in the parameter space. They are in the same basins of the loss landscape.

Discussion

  • More source data is better in general, but a more closely related source dataset for pretraining will often produce better performance on the target task than a larger source dataset.
  • The size of the target dataset and how closely related it is to the source dataset strongly impacts the performance of transfer learning. In particular using sub-optimal transfer learning hyperparameters can result in negative transfer when the target dataset is less related and large enough to be trained from random initialisation

Suggestions

Both the learning rate and momentum should be lower during fine-tuning for more similar source and target tasks and higher for less closely related datasets. The learning rate should also be decayed more quickly the more similar the source and target tasks are, so as not to change the pretrained parameters as much. Similarly the learning rate should be decayed more quickly with smaller target datasets where the empirical risk estimate is likely to be less reliable and overfitting more of a problem. However, when the target data set is small it must be taken into account that the number of weight updates per epoch will be lower and the number of updates should be reduced, not necessarily the number of epochs. When the source and target datasets are less similar it may be optimal to fine-tune
higher layers at a higher learning rate than lower layers. More work is needed to show how the learning rate, momentum and number of updates before decaying the learning rate change when the source and target tasks are very different.

Recommendations for best practice

1.Larger, similar target datasets
2.Larger, less similar target datasets
3.Smaller, more similar datasets
4.Smaller, less similar datasets

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值