[TensorFlow]优化器

最新推荐文章于 2023-11-28 23:05:10 发布

卡贝

最新推荐文章于 2023-11-28 23:05:10 发布

阅读量261

点赞数

分类专栏： TensorFlow

本文链接：https://blog.csdn.net/u013099449/article/details/80844556

版权

TensorFlow 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

基本算法

随机梯度下降

需要指定learning_rate
根据样本的特殊情况，也可以用于随机梯度下降
优化方法，随迭代次数降低学习率

tf.train.GradientDescentOptimizer(learning_rate, use_locking=False, name='GradientDescent')
Args:
      learning_rate: A Tensor or a floating point value.  The learning
        rate to use.
      use_locking: If True use locks for update operations.
      name: Optional name prefix for the operations created when applying
        gradients. Defaults to "GradientDescent".

动量梯度下降

参数：learning_rate和动量参数
让步长与导数有关系，导数大的方向步长大，导数小的方向步长小，用于多维数据样本

tf.train.MomentumOptimizer(learning_rate, momentum, use_locking=False, name='Momentum', use_nesterov=False)
      learning_rate: A `Tensor` or a floating point value.  The learning rate.
      momentum: A `Tensor` or a floating point value.  The momentum.
      use_locking: If `True` use locks for update operations.
      name: Optional name prefix for the operations created when applying
        gradients.  Defaults to "Momentum".
      use_nesterov: If `True` use Nesterov Momentum.

自适应学习率算法

AdaGrad

参数：Learning_rate
加快收敛，避免陷入局部最优，对凸问题效果很好
对损失最大偏导的参数有一个快速下降的学习率，对具有小偏导的参数在学习率上有相对小的下降。对某些模型效果不错，但不是全部。

tf.train.AdagradOptimizer(learning_rate, initial_accumulator_value=0.1, use_locking=False, name='Adagrad')
Args:
      learning_rate: A `Tensor` or a floating point value.  The learning rate.
      initial_accumulator_value: A floating point value.
        Starting value for the accumulators, must be positive.
      use_locking: If `True` use locks for update operations.
      name: Optional name prefix for the operations created when applying
        gradients.  Defaults to "Adagrad".

RMSProp

参数：learning_rate、decay、momentum、epsilon
解决 AdaGrad在非凸问题情况下效果不好的问题，同时添加了动量法，移动平均，比较好用

tf.train.RMSPropOptimizer(learning_rate, decay=0.9, momentum=0.0, epsilon=1e-10, use_locking=False, centered=False, name='RMSProp')
learning_rate: A Tensor or a floating point value.  The learning rate.
      decay: Discounting factor for the history/coming gradient
      momentum: A scalar tensor.
      epsilon: Small value to avoid zero denominator.
      use_locking: If True use locks for update operation.
      centered: If True, gradients are normalized by the estimated variance of
        the gradient; if False, by the uncentered second moment. Setting this to
        True may help with training, but is slightly more expensive in terms of
        computation and memory. Defaults to False.
      name: Optional name prefix for the operations created when applying
        gradients. Defaults to "RMSProp".

ADAM

有点像AdaGrad和RMSProp的合

tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam')
learning_rate: A Tensor or a floating point value.  The learning rate.
      beta1: A float value or a constant float tensor.
        The exponential decay rate for the 1st moment estimates.
      beta2: A float value or a constant float tensor.
        The exponential decay rate for the 2nd moment estimates.
      epsilon: A small constant for numerical stability. This epsilon is
        "epsilon hat" in the Kingma and Ba paper (in the formula just before
        Section 2.1), not the epsilon in Algorithm 1 of the paper.
      use_locking: If True use locks for update operations.
      name: Optional name for the operations created when applying gradients.
        Defaults to "Adam".

卡贝

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
[TensorFlow]优化器

梯度下降算法，需要指定learning_rate 根据样本的特殊情况，也可以用于随机梯度下降tf.train.GradientDescentOptimizer(learning_rate, use_locking=False, name='GradientDescent')Args: learning_rate: A Tensor or a floating point v...
复制链接

扫一扫