scaffolds the optimization bookkeeping and creates the training network for learning and test network(s) for evaluation.
iteratively optimizes by calling forward / backward and updating parameters(periodically)
evaluates the test networks
snapshots the model and solver state throughout the optimization
Solver Parameters
base_lr: 0.01 # begin training at a learning rate of0.01 = 1e-2
lr_policy: "step" # learning rate policy: drop the learning rate in"steps"
# by a factor of gamma every stepsize iterations
gamma: 0.1 # drop the learning rate by a factor of10
# (i.e., multiply it by a factor of gamma = 0.1)
stepsize: 100000 # drop the learning rate every 100K iterations
max_iter: 350000 # train for350K iterations total
momentum: 0.9
<pre>
The learning rate decay policy. The currently implemented learning rate
policies are as follows:
- fixed: always return base_lr.
- step: return base_lr * gamma ^ (floor(iter / step))
- exp: return base_lr * gamma ^ iter
- inv: return base_lr * (1 + gamma * iter) ^ (- power)
- multistep: similar to step but it allows non uniform steps defined by
-
stepvalue
- poly: theeffective learning rate follows a polynomial decay, to be
zerobythe max_iter. return base_lr (1 - iter/max_iter) ^ (power)
- sigmoid: theeffective learning rate follows a sigmod decay
return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
where base_lr, max_iter, gamma, step, stepvalue and power are defined
inthe solver parameter protocol buffer, and iter is the current iteration.
</pre>