tensorflow之Optimizers(tensorflow的优化器)

一.概述

1.默认情况下,优化器训练目标函数所依赖的所有可训练变量。如果你不想训练某一个变量,你可以将关键词trainable设置为False。举例如下:

global_step = tf.Variable(0, trainable=False, dtype=tf.int32)
learning_rate = 0.01 * 0.99 ** tf.cast(global_step, tf.float32)

increment_step = global_step.assign_add(1)
optimizer = tf.GradientDescentOptimizer(learning_rate) # learning rate can be a tensor

2.tf.Variable类的完全定义

tf.Variable(initial_value=None, trainable=True, collections=None,validate_shape=True, caching_device=None, name=None, variable_def=None, dtype=None,expected_shape=None, import_scope=None)

3.可以要求优化器采用特定变量的渐变,也可以修改优化程序计算的渐变

# create an optimizer.
optimizer = GradientDescentOptimizer(learning_rate=0.1)

# compute the gradients for a list of variables.
grads_and_vars = optimizer.compute_gradients(loss, <list of variables>)

# grads_and_vars is a list of tuples (gradient, variable). Do whatever you
# need to the 'gradient' part, for example, subtract each of them by 1.
subtracted_grads_and_vars = [(gv[0] - 1.0, gv[1]) for gv in grads_and_vars]

# ask the optimizer to apply the subtracted gradients.
optimizer.apply_gradients(subtracted_grads_and_vars)

4.更底层的功能

tf.gradients(ys, xs, grad_ys=None, name='gradients',colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None)

格式:

tf.gradients(ys,xs, grad_ys=None, name='gradients',colocate_gradients_with_ops=False,gate_gradients=False,aggregation_method=None,stop_gradients=None)
解释说明:

对求导函数而言,其主要功能即求导公式:∂y/∂x。在tensorflow中,y和x都是tensor。

tf.gradients()接受求导值ysxs不仅可以是tensor,还可以是list,形如[tensor1, tensor2, …, tensorn]。当ysxs都是list时,它们的求导关系为:

  • tf.gradients()实现ysxs求导
  • 求导返回值是一个list,list的长度等于len(xs)
  • 假设返回值是[grad1, grad2, grad3],ys=[y1, y2],xs=[x1, x2, x3]。则,真实的计算过程为: 

                                                               

在仅训练模型的某些部分时,这尤其有用。

5.更多的Optimizer

  • tf.train.GradientDescentOptimizer
  • tf.train.AdadeltaOptimizer
  • tf.train.AdagradOptimizer
  • tf.train.AdagradDAOptimizer
  • tf.train.MomentumOptimizer
  • tf.train.AdamOptimizer
  • tf.train.FtrlOptimizer
  • tf.train.ProximalGradientDescentOptimizer
  • tf.train.ProximalAdagradOptimizer
  • tf.train.RMSPropOptimizer

使用结论:RMSprop is an extension of Adagrad that deals with its radically diminishing learning rates. It is identical to Adadelta, except that Adadelta uses the RMS of parameter updates in the numerator update rule. Adam, finally, adds bias-correction and momentum to RMSprop. Insofar,RMSprop, Adadelta, and Adam are very similar algorithms that do well in similar circumstances.Kingma et al. [15] show that its bias-correction helps Adam slightly outperform RMSprop towards the end of optimization as gradients become sparser. Insofar, Adam might be the best overall choice.

推荐:使用AdamOptimizer

二.代码实例与说明

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值