# solver.prototxt参数说明（一）

1447人阅读 评论(0)

## 1. Methods

Solver方法一般用来解决loss函数的最小化问题。对于一个数据集D，需要优化的目标函数是整个数据集中所有数据loss的平均值。

### 1.1 SGD

[plain] view plain
1. base_lr: 0.01     # begin training at a learning rate of0.01 = 1e-2 lr_policy: "step" # learning ratepolicy: drop the learning rate in "steps"                  # by a factor of gamma everystepsize iterations gamma: 0.1        # drop the learning rate by a factor of10                  # (i.e., multiply it by afactor of gamma = 0.1) stepsize: 100000  # drop the learning rate every 100K iterations max_iter: 350000  # train for 350K iterations total momentum: 0.9

Note that the momentum setting μ effectively multiplies the size of your updates by a factor of 11−μ after many iterations of training, so if you increase μ, it may be a good idea to decrease α accordingly (and vice versa).

For example, with μ=0.9, we have an effective update size multiplier of 11−0.9=10. If we increased the momentum to μ=0.99, we’ve increased our update size multiplier to 100, so we should drop α (base_lr) by a factor of 10.

### 1.3 NAG

Nesterov 的加速梯度法（Nesterov’s accelerated gradient）作为凸优化中最理想的方法，其收敛速度可以达到而不是。但由于深度学习中的优化问题往往是非平滑的以及非凸的（non-smoothness and non-convexity），在实践中NAG对于某类深度学习的结构可以成为非常有效的优化方法，比如deep MNIST autoencoders[5]。

## 2. 参考：

[1] L. Bottou. Stochastic Gradient Descent Tricks. Neural Networks: Tricks of the Trade: Springer, 2012.
[2] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 2012.
[3] J. Duchi, E. Hazan, and Y. Singer. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. The Journal of Machine Learning Research, 2011.
[4] Y. Nesterov. A Method of Solving a Convex Programming Problem with Convergence Rate O(1/k√). Soviet Mathematics Doklady, 1983.
[5] I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the Importance of Initialization and Momentum in Deep Learning. Proceedings of the 30th International Conference on Machine Learning, 2013.
[6] http://caffe.berkeleyvision.org/tutorial/solver.html
0
0

* 以上用户言论只代表其个人观点，不代表CSDN网站的观点或立场
个人资料
• 访问：122068次
• 积分：1865
• 等级：
• 排名：千里之外
• 原创：53篇
• 转载：63篇
• 译文：4篇
• 评论：23条
阅读排行
评论排行
最新评论