问题描述
最近再用MATLAB做一个深度学习的作业:用Fpz-Cz单导联EEG信号进行睡眠分期。我在复现一篇文章的时候出现了一些问题,在这里记录
问题记录
1.最后一个iteration中验证集精度突然下降
如图所示,validation曲线中,最后输出的Final与曲线有很大差距,明显降低了大约10%的精度(其Loss曲线也是突然升高)
此时我的网络设置为
CNNLayers = [
imageInputLayer([1 3000 1],"Name","imageinput")
convolution2dLayer([1 50],64,"Name","InputConv","Padding","same","Stride",[1 5])
dropoutLayer(0.2,"Name","dropout_1")
batchNormalizationLayer("Name","batchnorm_1")
reluLayer("Name","relu_1")
convolution2dLayer([1 5],128,"Name","Conv1","Padding","same")
maxPooling2dLayer([1 2],"Name","Pool1","Padding","same","Stride",[1 2])
dropoutLayer(0.2,"Name","dropout_2")
batchNormalizationLayer("Name","batchnorm_2")
reluLayer("Name","relu_2")
convolution2dLayer([1 5],256,"Name","Conv2","Padding","same","Stride",[1 2])
maxPooling2dLayer([1 2],"Name","Pool2","Padding","same","Stride",[1 2])
dropoutLayer(0.2,"Name","dropout_3")
batchNormalizationLayer("Name","batchnorm_3")
fullyConnectedLayer(1500,"Name","Fc1")
dropoutLayer(0.5,"Name","dropout_4")
batchNormalizationLayer("Name","batchnorm_4")
fullyConnectedLayer(1000,"Name","Fc2")
dropoutLayer(0.5,"Name","dropout_5")
batchNormalizationLayer("Name","batchnorm_5")
fullyConnectedLayer(5,"Name","Fc3")
softmaxLayer("Name","softmax")
classificationLayer("Name","classoutput")];
训练设置为
miniBatchSize = 400;
options = trainingOptions('adam', ...
'ExecutionEnvironment','auto', ...
'MaxEpochs',100, ...
'MiniBatchSize',miniBatchSize, ...
'ValidationData',{XValidation,YValidation}, ...
'GradientThreshold',inf, ...
'Shuffle','every-epoch', ...
'Verbose',false, ...
'Plots','training-progress');
问题分析
怀疑是dropout层造成的,我将所有的dropout层删除,跑出来的曲线如下图。
可以看到,在消除了dropout层的情况下,会出现非常严重的过拟合的情况。
此外曲线也会在某一时刻莫名其妙精度断崖下滑,不过之后又会回归正常值、
2.过拟合问题解决
由于删除所有dropout层会出现过拟合的问题,所以保留fullyconnect后的dropout层,将前面的dropout层删除,dropoutrate=0.5。
在这种情况下曲线如图所示
从图中可以看出,在这种情况下模型正则化过度,预防过拟合过度,导致训练集的精度比测试集高
解决方法
将dropoutrate=0.3,曲线如下图所示
这说明在训练后期还是会出现过拟合的问题,所以还需要采取策略防止过拟合,这时可以引入L2正则化,通过控制L2正则化的系数来解决过拟合的问题。此时dropoutrate=0.4(只有全连接层后面有dropout层,其余dropout层删除),L2正则化参数为0.005。
网络设置和训练设置如下
CNNLayers = [
imageInputLayer([1 3000 1],"Name","imageinput")
convolution2dLayer([1 50],64,"Name","InputConv","Padding","same","Stride",[1 5])
batchNormalizationLayer("Name","batchnorm_1")
reluLayer("Name","relu_1")
convolution2dLayer([1 5],128,"Name","Conv1","Padding","same")
maxPooling2dLayer([1 2],"Name","Pool1","Padding","same","Stride",[1 2])
batchNormalizationLayer("Name","batchnorm_2")
reluLayer("Name","relu_2")
convolution2dLayer([1 5],256,"Name","Conv2","Padding","same","Stride",[1 2])
maxPooling2dLayer([1 2],"Name","Pool2","Padding","same","Stride",[1 2])
batchNormalizationLayer("Name","batchnorm_3")
fullyConnectedLayer(1500,"Name","Fc1")
dropoutLayer(0.3,"Name","dropout_4")
batchNormalizationLayer("Name","batchnorm_4")
fullyConnectedLayer(1000,"Name","Fc2")
dropoutLayer(0.3,"Name","dropout_5")
batchNormalizationLayer("Name","batchnorm_5")
fullyConnectedLayer(5,"Name","Fc3")
softmaxLayer("Name","softmax")
classificationLayer("Name","classoutput")];
options = trainingOptions('adam', ...
'ExecutionEnvironment','auto', ...
'MaxEpochs',100, ...
'MiniBatchSize',miniBatchSize, ...
'ValidationFrequency',50,...
'ValidationData',{XValidation,YValidation},
'L2Regularization',0.0005,...
'GradientThreshold',inf, ...
'Shuffle','every-epoch', ...
'Verbose',false, ...
'Plots','training-progress');
仿真结果如下
一个奇怪现象
如上图,最后输出的准确率在79.30%
然而,如果我保存下来这个训练好的网络,重新运行一下生成验证集和最终进行验证的两块代码,发现准确率非常高,不知道为啥
如图所示,精度达到了92.13%
我试了下别的网络,如果重新选择验证集,那么精度会高一点点(高出3% 4%这样 ),但是这种方法精度会高很多,不知道为啥
总结
针对出现的问题,实际上就是一个调参的过程