caffe——solver层详解

最新推荐文章于 2020-01-07 17:32:38 发布

海边的第八只螃蟹

最新推荐文章于 2020-01-07 17:32:38 发布

阅读量1.2k

点赞数

分类专栏： caffe

caffe 专栏收录该内容

28 篇文章 0 订阅

订阅专栏

本文为转载文章，已未知作者，如作者看到，可联系增加转载链接。

solver层（*_solver.prototxt）与train_val.prototxt层配合使用。
solver层定义了如何使用model。

The Caffe solvers are:
1. Stochastic Gradient Descent (type: “SGD”)
2. AdaDelta (type: “AdaDelta”)
3. Adaptive Gradient (type: “AdaGrad”)
4. Adam (type: “Adam”)
5. Nesterov’s Accelerated Gradient (type: “Nesterov”)
6. RMSprop (type: “RMSProp”)

The solver :
1. scaffolds the optimization bookkeeping and creates the training network for learning and test network(s) for evaluation.
2. iteratively optimizes by calling forward / backward and updating parameters
3. (periodically) evaluates the test networks
4. snapshots the model and solver state throughout the optimization

where each iteration
1. calls network forward to compute the output and loss
2. calls network backward to compute the gradients
3. incorporates the gradients into parameter updates according to the solver method
4. updates the solver state according to learning rate, history, and method

to take the weights all the way from initialization to learned model.

Like Caffe models, Caffe solvers run in CPU / GPU modes.

先看实例：

net: "examples/mnist/lenet_train_test.prototxt"
test_iter: 100
test_interval: 500
base_lr: 0.01
momentum: 0.9
type: SGD
weight_decay: 0.0005
lr_policy: "inv"
gamma: 0.0001
power: 0.75
display: 100
max_iter: 20000
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
solver_mode: CPU

test_iter：100

模型分为训练阶段和测试阶段，此参数表示测试阶段需要迭代的次数，与batch_size配合使用（如果在train_val.prototxt 的data layer没有特别定义，在training和testing阶段使用相同的batch_size），例如，testing的数据集是100000，batch_size=1000，则test_iter=100。

test_interval: 500

测试间隔，每训练500次，进行一次测试。

base_lr: 0.01

base_lr用于设置基础学习率，在迭代的过程中，可以对基础学习率进行调整。如果training过程中出现loss=87.3365，需要调小base_lr或者batch_size。

snapshot: 5000
snapshot_prefix: "./mynet"

快照功能。将训练出来的model和solver状态进行保存，snapshot用于设置训练多少次后进行保存，默认为0，不保存。snapshot_prefix设置保存路径。

本例中，在训练5000的倍数次是对模型进行保存，例如：
mynet_5000.caffemodel
mynet_5000.solverstate

mynet_10000.caffemodel
mynet_10000.solverstate

mynet_15000.caffemodel
…..

还可以设置snapshot_diff，是否保存梯度值，默认为false,不保存。

也可以设置snapshot_format，保存的类型。有两种选择：HDF5 和BINARYPROTO ，默认为BINARYPROTO

solver_mode: CPU

CPU或GPU可选

base_lr: 0.01

lr_policy: “inv”

gamma: 0.0001

power: 0.75

这四行可以放在一起理解，用于学习率的设置。只要是梯度下降法来求解优化，都会有一个学习率，也叫步长。base_lr用于设置基础学习率，在迭代的过程中，可以对基础学习率进行调整。怎么样进行调整，就是调整的策略，由lr_policy来设置。

base_lr: 0.01
lr_policy: "inv"
gamma: 0.0001
power: 0.75

lr_policy可以设置为下面这些值，相应的学习率的计算为：
- fixed: 保持base_lr不变.
- step: 如果设置为step,则还需要设置一个stepsize, 返回 base_lr * gamma ^ (floor(iter / stepsize)),其中iter表示当前的迭代次数
- exp: 返回base_lr * gamma ^ iter， iter为当前迭代次数
- inv: 如果设置为inv,还需要设置一个power, 返回base_lr * (1 + gamma * iter) ^ (- power)
- multistep: 如果设置为multistep,则还需要设置一个stepvalue。这个参数和step很相似，step是均匀等间隔变化，而multistep则是根据 stepvalue值变化
- poly: 学习率进行多项式误差, 返回 base_lr (1 - iter/max_iter) ^ (power)
- sigmoid:　学习率进行sigmod衰减，返回 base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))

change file path to ‘./caffe-master/src/caffe/proto/caffe.proto’，you will see the following descriptions:

// The learning rate decay policy. The currently implemented learning rate
// policies are as follows:
//    - fixed: always return base_lr.
//    - step: return base_lr * gamma ^ (floor(iter / step))
//    - exp: return base_lr * gamma ^ iter
//    - inv: return base_lr * (1 + gamma * iter) ^ (- power)
//    - multistep: similar to step but it allows non uniform steps defined by
//      stepvalue
//    - poly: the effective learning rate follows a polynomial decay, to be
//      zero by the max_iter. return base_lr * (1 - iter/max_iter) ^ (power)
//    - sigmoid: the effective learning rate follows a sigmod decay
//      return base_lr * ( 1/(1 + exp(-gamma * (iter - stepsize))))
//
// where base_lr, max_iter, gamma, step, stepvalue and power are defined
// in the solver parameter protocol buffer, and iter is the current iteration.

mnist模型

lr_policy: "inv"
gamma: 0.0001
power: 0.75

cifar模型

lr_policy: "fixed"

imagenet模型

lr_policy: "step"
gamma: 0.1
stepsize: 10000

海边的第八只螃蟹

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
caffe——solver层详解

solver层（*_solver.prototxt）与train_val.prototxt层配合使用。 solver层定义了如何使用model。The Caffe solvers are: 1. Stochastic Gradient Descent (type: “SGD”) 2. AdaDelta (type: “AdaDelta”) 3. Adaptive Gradien
复制链接

扫一扫

专栏目录