caffe学习笔记（10）：Gradient descent optimization algorithm

最新推荐文章于 2022-04-02 21:43:10 发布

guyunee

最新推荐文章于 2022-04-02 21:43:10 发布

阅读量1.3k

点赞数

分类专栏： deep learning 文章标签：梯度下降 cnn caffe

本文链接：https://blog.csdn.net/qq_30401249/article/details/51483871

版权

deep learning 专栏收录该内容

16 篇文章 1 订阅

订阅专栏

solver的方法解决了使loss最小化的问题，给定一个dataset（D），优化的对象就是求loss的平均值：
这里写图片描述

当数据集D很大时，利用mini-batch of N<<|D| instances计算loss：

这里写图片描述

GD方法有以下六种：
1. Stochastic Gradient Descent (type: “SGD”)
2. AdaDelta (type: “AdaDelta”)
3. Adaptive Gradient (type: “AdaGrad”)
4. Adam (type: “Adam”)
5. Nesterov’s Accelerated Gradient (type: “Nesterov”)
6. RMSprop (type: “RMSProp”)

利用GD方法更新weight，进行优化。

参照文件“./caffe-master/src/caffe/proto/caffe.proto”，我们可以看到如何在solver.prototxt文件中对GD optimization algorithm进行配置。

message SolverParameter {
  //////////////////////////////////////////////////////////////////////////////
  // Specifying the train and test networks
  //
  // Exactly one train net must be specified using one of the following fields:
  //     train_net_param, train_net, net_param, net
  // One or more test nets may be specified using any of the following fields:
  //     test_net_param, test_net, net_param, net
  // If more than one test net field is specified (e.g., both net and
  // test_net are specified), they will be evaluated in the field order given
  // above: (1) test_net_param, (2) test_net, (3) net_param/net.
  // A test_iter must be specified for each test_net.
  // A test_level and/or a test_stage may also be specified for each test_net.
  //////////////////////////////////////////////////////////////////////////////
// type of the solver
  optional string type = 40 [default = "SGD"];

  // numerical stability for RMSProp, AdaGrad and AdaDelta and Adam
  optional float delta = 31 [default = 1e-8];
  // parameters for the Adam solver
  optional float momentum2 = 39 [default = 0.999];

  // RMSProp decay value
  // MeanSquare(t) = rms_decay*MeanSquare(t-1) + (1-rms_decay)*SquareGradient(t)
  optional float rms_decay = 38;

  // If true, print information about the state of the net that may help with
  // debugging learning problems.
  optional bool debug_info = 23 [default = false];

  // If false, don't save a snapshot after training finishes.
  optional bool snapshot_after_train = 28 [default = true];

  // DEPRECATED: old solver enum types, use string instead
  enum SolverType {
    SGD = 0;
    NESTEROV = 1;
    ADAGRAD = 2;
    RMSPROP = 3;
    ADADELTA = 4;
    ADAM = 5;
  }
  // DEPRECATED: use type instead of solver_type
  optional SolverType solver_type = 30 [default = SGD];
}

具体方法介绍详见链接一，链接二，链接三。

guyunee

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
caffe学习笔记（10）：Gradient descent optimization algorithm

本文介绍了六种梯度下降算法，SGD，AdaDelta, AdaGrad, Adam, Nesterov, RMSProp。
复制链接

扫一扫