学习率设置
学习率过大会导致,参数在极优值两侧来回移动。学习率过小,会导致优化速度大大降低。常用指数衰减法设置学习率
指数衰减法
实现以下功能:
decayed_learning_rate =
learning_rate * decay_rate^(global_step/decay_steps)
decayed_learning_rate是每一轮优化时候使用的学习率,
learning_rate是事先设定的初始学习率,
decay_rate为衰减系数,decay_steps为衰减速度。
TRAINING_STEPS = 100
global_step = tf.Variable(0)
#通过exponential_decay函数生成学习率,s
#taircase=True时,global_step/decay_steps会转化为整数,学习率为阶梯函数。
LEARNING_RATE = tf.train.exponential_decay(0.1, global_step, 1, 0.96, staircase=True)
x = tf.Variable(tf.constant(5, dtype=tf.float32), name="x")
y = tf.square(x)
#使用指数衰减的学习率,在minimize函数中传入
#global_step将自动更新global_step参数,从而使得学习率也相应更新
train_op = tf.train.GradientDescentOptimizer(
LEARNING_RATE).minimize(y, global_step=global_step)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(TRAINING_STEPS):
sess.run(train_op)
if i % 10 == 0:
LEARNING_RATE_value = sess.run(LEARNING_RATE)
x_value = sess.run(x)
print "After %s iteration(s): x%s is %f, learning rate is %f."% (i+1, i+1, x_value, LEARNING_RATE_value)
After 1 iteration(s): x1 is 4.000000, learning rate is 0.096000.
After 11 iteration(s): x11 is 0.690561, learning rate is 0.063824.
After 21 iteration(s): x21 is 0.222583, learning rate is 0.042432.
After 31 iteration(s): x31 is 0.106405, learning rate is 0.028210.
After 41 iteration(s): x41 is 0.065548, learning rate is 0.018755.
After 51 iteration(s): x51 is 0.047625, learning rate is 0.012469.
After 61 iteration(s): x61 is 0.038558, learning rate is 0.008290.
After 71 iteration(s): x71 is 0.033523, learning rate is 0.005511.
After 81 iteration(s): x81 is 0.030553, learning rate is 0.003664.
After 91 iteration(s): x91 is 0.028727, learning rate is 0.002436.
过拟合问题
过拟合指的是当模型过复杂后,模型很好的“记忆”了每一个训练中随机噪声部分,而没有学习训练数据追中通用的趋势。
解决办法正则化
正则化:就是在损失函数中加入刻画模型复杂度的指标。
设损失函数为J(teta),lambda是模型复杂损失在总损失中的比例,R(w)刻画模型复杂程度
J(\theta)+\lambda R(w)
R(w)可以取参数的绝对值和,也可以取参数的平方和
R(w)=||w||_1=\sum_i|w_i|
R(w)=||w||_2=\sum_i|w_i^2|
L1和L2结合的方式
R(w)=\sum_i\alpha|w_i^2| + (1-\alpha)w_i^2
tensorflow中实现
loss = tf.reduce_mean(tf.square(y_ - y))+tf.contrib.layers.l2_regularizer(lambda)(w)
滑动平均模型
在采用梯度下降算法训练神经网络时,采用滑动平均模型一定程度导航提高最终模型在测试数据上的表现。
TensorFlow中使用
tf.train.ExponentialMovingAverage(衰减率)(控制衰减率的变量)