objective，top1 error，top5error

最新推荐文章于 2022-03-31 09:50:57 发布

wangch要好好学习

最新推荐文章于 2022-03-31 09:50:57 发布

阅读量4.2k

点赞数 1

分类专栏： matlab，matcovnet

matlab，matcovnet 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

转载自：http://blog.sina.com.cn/s/blog_837f83580102vwv4.html

作者：wind_静水流深_cloud

本文主要记录对MatConvNet代码的一些个人理解，方便以后使用（有些地方理解也许有偏差），不定期更新。。。。

一、训练过程中matlab命令窗口输出的speed，obj，top1误差，top5误差怎么计算的

这几个结果的计算过程在cnn_train这个m文件里面，设置断点跟踪即可得到相应的计算过程。为了方便说明，假设epoch数目为1，有2个batch， cnn_train函数在第一个batch上优化结束之后会分别得到两个数组error=[obj, top1, top5]以及stats=sum( [stats,[0 ; error]],2)，对这两个数组分别解释如下：obj为一个batch里所有样本的loss，相应地top1和top5分别为一个batch里所有样本的top1和top5误差之和；stats保存的是第1个batch上得到的error。第1个batch上结束之后，会在第2个batch上优化，同样地，此时 error数组保存的是第2个batch的loss，top1和top5，需要注意的是此时的stats数组是2个batch的error数组求和。

下面讲matlab命令窗口输出的形如“training: epoch 01: batch 1/600: 174.4 Hz obj:2.3 top1e:0.88 top5e:0.39 [100/100]”的结果是如何计算的，计算很简单，如下：

obj=(stats数组的obj)/n，其中n=当前batch*batchSize，比如优化的是第2个batch的话就是n=2*batchSize，由此可以看出，其实n就是样本数目。top1和top5计算方法相同。

一个epoch优化结束后，会输出figure绘制误差曲线，其实图上每个epoch对应的obj，top1，top5就是当前epoch最后一个batch对应的值。在回答我提的问题时，matconvnet作者答复如下：

the training error is computed asthe model is updated during an epoch, whereas the validation erroris computed on a “frozen” model at the end of an epoch. This isimplicit in how the code works. Let M_{N-1} be the model at epochN-1 and M_N the model at epoch M_N. Let M_{N-1} -> M’ -> M’’-> M’’’ -> …. -> M_N the sequence of intermediate modelscomputed for each mini-batch processed during training. ThenTraining error at epoch N = Average of training errors of M’, M’’,M’’’, … on the training set mini-batches Validation error at epochN = Validation error of model M_N on the validation set Hence theestimated training error you get is somewhere in between thetraining error of model M_{N-1} and model M_N, whereas thevalidation error is for model M_N. Since M_N should be better thanM_{N-1}, then the validation error might be smaller than thisestimate (but won’t when N is large as models change little andoverfitting dominates). Note that you could the “proper" trainingerror of M_N after each epoch, but that would be expensive (as itwould require freezing M_N and passing all the training data again)and not worth it in practice.

二、如何利用自己的数据在现有模型上微调（fine-tune）

关于这部分内容准备分两部分来写，一是如何修改现有模型去初始化自己的网络；二是模型微调过程中需要注意的一些问题。

1、如何修改现有模型

由于我们的数据类别可能和训练模型的数据类别不一致，因此修改现有模型时通常只需要修改网络的最后两层（fully-connected 层和 loss层）即可，代码（一个例子）如下：

net.layers=net.layers(1:end-2);

net.layers{end+1} = struct('type', 'conv', ...

'weights', {{randn(1,1,input,output, 'single'), zeros(1,output,'single')}}, ...

'learningRate', 0.1*lr, 'stride', 1, 'pad', 0) ;

net.layers{end+1} = struct('type', 'softmaxloss') ;

其中input是上一个全连接层的维数，output是自己数据的类别数。

注意：如果原模型在训练时全连接层后加了dropout层，在fine-tune时最好也加上这些层，更有利于收敛

2、需要注意的问题

按上面的方法修改好模型后，我们需要的网络结构就确定了。但是在微调过程中需要主要一些问题：

(1) 关于learning rate(lr)的问题。

这里lr可以分为整体lr和每个卷积层单独的lr。

在MatConvNet提供的example中整体lr有两种设置方法：一是设置一个固定值，然后每个epoch都用同一个lr（mnist example用的这种）；二是根据设置的epoch数目，预先设置好每个epoch的lr（cifar-10 example用的这种）。在设置这个学习率时，设置为原网络学习率的1/10是个不错的选择。

每个卷积层的lr设置我觉着很关键（严格来说不是该层学习率，称为学习率系数也许更合适），对于我们修改的网络层通常需要设置较大的lr（比如[10,20]）以保证重新训练时候的收敛速度，而对于我们没有修改的网络层，通常需要设置较小的lr（比如默认的[1 2]等）使训练速度慢，因为原先模型的权值被当作我们微调时网络的初始权值，lr较小时更容易收敛。同时，如果我们不想重新训练某些卷积层而想直接采用其原有的权值（因为前面的卷积层是比较难收敛的，训练也更麻烦），只需要将该层的lr设置为[0,0]即可，当训练样本不多时，这样很重要。