Transfer Learning

最新推荐文章于 2024-08-01 09:43:10 发布

Leo_812

最新推荐文章于 2024-08-01 09:43:10 发布

阅读量1.1k

点赞数

分类专栏：深度学习文章标签：深度学习

本文链接：https://blog.csdn.net/Leo_812/article/details/52457334

版权

深度学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

本文先是对Transfer Learning做一个总结，接着提到依据Transfer Learning训练网络的一些心得体会。
下面是Transfer Learning论文中比较核心的几句话:

<1>
New dataset is small and similar to original dataset. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the ConvNet to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN codes.
<2>
New dataset is large and similar to the original dataset. Since we have more data, we can have more confidence that we won’t overfit if we were to try to fine-tune through the full network.
<3>
New dataset is small but very different from the original dataset. Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.
<4>
New dataset is large and very different from the original dataset. Since the dataset is very large, we may expect that we can afford to train a ConvNet from scratch. However, in practice it is very often still beneficial to initialize with weights from a pretrained model. In this case, we would have enough data and confidence to fine-tune through the entire network.

大概的意思是说：

1、如果样本很小，但是和原始数据很接近。继续训练容易过拟合，可以直接使用原始模型输出向量(选取高层特征)，根据向量构建线性分类器。
2、样本足够而且样本和原始数据接近。可以进行fine-tune。
3、如果样本很小，而且数据和原始数据相差又很大。同样的训练容易过拟合，选取高层的特征也不合适（高层的特征包含了样本的特有性质），所以选取低层的特征配合SVM分类器，可以达到较好的效果。
4、如果样本足够大，数据相差很大。同样可以使用已经训练好的数据进行微调，效果还是很好，可以有效的加快收敛速度。微调时，学习率不宜太大。

下面写一些自己的经验，其实个人觉得如果要想训练一个效果非常好的模型还是不适合做fine-tune（如果有错误请指正），之所以大家喜欢fine-tune可能是因为GPU资源有限，训练模型时间太久，而且最开始比较出名的一篇DL论文使用了VGG16的模型，这个模型因为权重参数太多，如果不使用pre-train模型一般是没法收敛的，论文中也提到，VGG的模型是一层一层加深的，如果直接训练是没法收敛的。所以大家训练VGG的网络一般都是要使用pre-train的，但是对于其他一些网络，比如GoogleNet，收敛速度非常快，可以不使用pre-train。
对于图像的问题一般都是可以通过一些样本扩大的算法有效的扩充样本量，所以一般是不存在上面提到的训练样本少的情况，前一段时间参加了Kaggle的一个深度学习比赛（49/1450），就是参考了这篇文章，使用了pre-train，而且配合freeze浅层参数继续训练以及一些其他的小tricks达到了比较好的效果（这样做可以有效的减少过拟合），但是比赛结束还是有大神提出了非常神奇的样本扩充方法，他们认为如果有足够的样本还是不要使用pre-train，这样可以达到更高的精度，他们单个模型的排名就可以达到top20。虽然说对于图像问题还是尽可能的想办法扩充样本，但是如果没法扩充的情况下依照上面提到的方法进行训练也是一种不错的选择。
之后会在博客里写一些样本扩充的方法。