基本算法
随机梯度下降
需要指定learning_rate
根据样本的特殊情况,也可以用于随机梯度下降
优化方法,随迭代次数降低学习率
tf.train.GradientDescentOptimizer(learning_rate, use_locking=False, name='GradientDescent')
Args:
learning_rate: A Tensor or a floating point value. The learning
rate to use.
use_locking: If True use locks for update operations.
name: Optional name prefix for the operations created when applying
gradients. Defaults to "GradientDescent".
动量梯度下降
参数:learning_rate和动量参数
让步长与导数有关系,导数大的方向步长大,导数小的方向步长小,用于多维数据样本
tf.train.MomentumOptimizer(learning_rate, momentum, use_locking=False, name='Momentum', use_nesterov=False)
learning_rate: A `Tensor` or a floating point value. The learning rate.
momentum: A `Tensor` or a floating point value. The momentum.
use_locking: If `True` use locks for update operations.
name: Optional name prefix for the operations created when applying
gradients. Defaults to "Momentum".
use_nesterov: If `True` use Nesterov Momentum.
自适应学习率算法
AdaGrad
参数:Learning_rate
加快收敛,避免陷入局部最优,对凸问题效果很好
对损失最大偏导的参数有一个快速下降的学习率,对具有小偏导的参数在学习率上有相对小的下降。对某些模型效果不错,但不是全部。
tf.train.AdagradOptimizer(learning_rate, initial_accumulator_value=0.1, use_locking=False, name='Adagrad')
Args:
learning_rate: A `Tensor` or a floating point value. The learning rate.
initial_accumulator_value: A floating point value.
Starting value for the accumulators, must be positive.
use_locking: If `True` use locks for update operations.
name: Optional name prefix for the operations created when applying
gradients. Defaults to "Adagrad".
RMSProp
参数:learning_rate、decay、momentum、epsilon
解决 AdaGrad在非凸问题情况下效果不好的问题,同时添加了动量法,移动平均,比较好用
tf.train.RMSPropOptimizer(learning_rate, decay=0.9, momentum=0.0, epsilon=1e-10, use_locking=False, centered=False, name='RMSProp')
learning_rate: A Tensor or a floating point value. The learning rate.
decay: Discounting factor for the history/coming gradient
momentum: A scalar tensor.
epsilon: Small value to avoid zero denominator.
use_locking: If True use locks for update operation.
centered: If True, gradients are normalized by the estimated variance of
the gradient; if False, by the uncentered second moment. Setting this to
True may help with training, but is slightly more expensive in terms of
computation and memory. Defaults to False.
name: Optional name prefix for the operations created when applying
gradients. Defaults to "RMSProp".
ADAM
有点像AdaGrad和RMSProp的合
tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam')
learning_rate: A Tensor or a floating point value. The learning rate.
beta1: A float value or a constant float tensor.
The exponential decay rate for the 1st moment estimates.
beta2: A float value or a constant float tensor.
The exponential decay rate for the 2nd moment estimates.
epsilon: A small constant for numerical stability. This epsilon is
"epsilon hat" in the Kingma and Ba paper (in the formula just before
Section 2.1), not the epsilon in Algorithm 1 of the paper.
use_locking: If True use locks for update operations.
name: Optional name for the operations created when applying gradients.
Defaults to "Adam".