Original url:
https://blog.csdn.net/baidu_26408419/article/details/78497201
好的URL:
https://machinelearningmastery.com/improve-deep-learning-performance/
http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html
http://lamda.nju.edu.cn/weixs/book/CNN_book.html
图片加载有问题,直接百度云下载本文.doc文档:
链接:https://pan.baidu.com/s/1mhCxQsg 密码:igfo
总的来说:
0.data:http://machinelearningmastery.com/improve-deep-learning-performance/
(0.
1.特征选择是门学问
)
①learning rate很重要,可以设置小一点,但是训练时间会加长
②batch size在自己机器允许的条件下,尽可能大一点(设置为2的次方)
③查看精度的结果图表,从中获取信息:
④Sec. 8: Ensemble(怎么组合多个网络结构,具体参考周志华的《机器学习》)
⑤微调总体原理:
⑥https://research.fb.com/wp-content/uploads/2017/06/imagenet1kin1h5.pdf?(gpu分布式训练) :
Linear ScalingRule: When the minibatch size is multiplied by k, multiply the learning rate byk.(All other hyper-parameters (weight decay, momentum, etc.) are keptunchanged)
具体如下:
一.Caffe结构参数调试技巧
0.数据准备的时候:
①X= X.astype(np.float32)
X, y =shuffle(X, y, random_state=42) # shuffle train data
y =y.astype(np.float32)
②归一化等:
1. http://blog.csdn.net/u011762313/article/details/47399981
Solver初始化中(Caffe提供了3种Solver方法:Stochastic Gradient Descent(SGD,随机梯度下降),Adaptive Gradient(ADAGRAD,自适应梯度下降)和Nesterov’sAccelerated Gradient(NESTEROV,Nesterov提出的加速梯度下降)。)
SGD默认的较好的参数构造:
base_lr: 0.01 # 开始学习速率为:α = 0.01
lr_policy: "step"#学习策略: 每stepsize次迭代之后,将α乘以gamma
gamma: 0.1 # 学习速率变化因子
stepsize: 100000 # 每100K次迭代,下降学习速率
max_iter: 350000 # 训练的最大迭代次数
momentum: 0.9 #momentum为:μ = 0.01
其他两种方法效果也好。
2, https://corpocrat.com/2015/02/24/facial-keypoints-extraction-using-deep-learning-with-caffe/
①We specify RELU (to allowvalues > 0, plus faster converging) ,Dropout layerto prevent overfitting.(Dropout层防止过拟合)
(http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html:)
In conclusion,three types of ReLU variants all consistently outperform the original ReLU inthese three data sets. And PReLU and RReLU seem better choices. Moreover, Heet al. also reported similarconclusions in [4].
②全连接层加入(xavier作用:默认将Blob系数x初始化为满足x∼U(−a,+a)的均匀分布):
weight_filler{
type:"xavier"
}
bias_filler{
type:"constant"
value: 0.1
}
layer {
name:"relu22"
type:"ReLU"
bottom:"fc6"
top:"fc6"
}
③Layer层具体参数的初始方式:http://blog.csdn.net/wenlin33/article/details/53378613
3.看图表等结果:
①训练的时候 最好可以可视化一些featuremap
②训练测试的log保存下来成图表,根据图表显示的精度看参数的效果!
①(An overfittingnet can generally be made to perform better by using more training data)
增加数据方法:通过图片旋转,翻转等技巧
②加快网络训练速度:Rememberthat in our previous model, we initialized learning rate and momentum with astatic 0.01 and 0.9 respectively. Let's change that such that the learning ratedecreases linearly with the number of epochs, while we let the momentumincrease.
③加载预训练pre-trained model的权重,来加快本次的训练速度
④put the BatchNorm layer immediately afterfully connected
layers (or convolutional layers), and before activation
二 微调一些深度网络
①http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
②https://zhuanlan.zhihu.com/p/22624331
尤其是当我们的数据相对较少时,就更适合选择这种办法。既可以有效利用深度神经网络强大的泛化能力,又可以免去设计复杂的模型以及耗时良久的训练。目前最强大的模型是ResNet,很多视觉任务都可以通过fine-tuning ResNet得到非常好的performance!
③ 微调总体原理:
很好的博客:
① http://machinelearningmastery.com/improve-deep-learning-performance/
② http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html
③ https://github.com/hwdong/deep-learning/blob/master/deep%20learning%20papers.md