tf.train.exponential_decay()官网链接
tf.train.exponential_decay(
learning_rate,
global_step,
decay_steps,
decay_rate,
staircase=False,
name=None
)
一句话描述:对学习率learning_rate
应用指数衰减。
多说点:固定的学习率总是显得笨拙:太小速度太慢,太大又担心得不到最优解。一个很直接的想法就是随着训练的进行,动态设置学习率——随着训练次数增加,学习率逐步减小。而tf.train.exponential_decay()
就是tf内置的一个生成动态减小学习率的函数。
它的公式如下
decayed_learning_rate = learning_rate *
decay_rate ^ (global_step / decay_steps)
也即
d
e
c
a
y
e
d
_
l
e
a
r
n
i
n
g
_
r
a
t
e
=
l
e
a
r
n
i
n
g
_
r
a
t
e
∗
g
l
o
b
a
l
_
s
t
e
p
d
e
c
a
y
_
s
t
e
p
decayed\_learning\_rate = learning\_rate*\frac{global\_step}{decay\_step}
decayed_learning_rate=learning_rate∗decay_stepglobal_step
举个例子:
初始学习率LEARNING_RATE_BASE = 0.1
总训练步数GLOBAL_STEPS = 1000
衰减率DECAY_RATE = 0.9
每100步衰减一次(stair=True
时)DECAY_STEPS = 100
import tensorflow as tf
import matplotlib.pyplot as plt
LEARNING_RATE_BASE = 0.1
DECAY_RATE = 0.9
GLOBAL_STEPS = 1000
DECAY_STEPS = 100
global_ = tf.Variable(tf.constant(0))
learning_rate_1 = tf.train.exponential_decay(LEARNING_RATE_BASE, global_, DECAY_STEPS, DECAY_RATE, staircase=True)
learning_rate_2 = tf.train.exponential_decay(LEARNING_RATE_BASE, global_, DECAY_STEPS, DECAY_RATE, staircase=False)
LR1 = []
LR2 = []
with tf.Session() as sess:
for i in range(GLOBAL_STEPS):
lr1 = sess.run(learning_rate_1, feed_dict={global_: i})
LR1.append(lr1)
lr2 = sess.run(learning_rate_2, feed_dict={global_: i})
LR2.append(lr2)
plt.figure(1)
plt.plot(range(GLOBAL_STEPS), LR2, 'r-')
plt.plot(range(GLOBAL_STEPS), LR1, 'b-')
plt.show()
举一个用tf.train.exponential_decay()
优化函数的例子
l
o
s
s
=
(
w
+
1
)
2
loss=(w+1)^2
loss=(w+1)2
#coding:utf-8
#设损失函数 loss=(w+1)^2, 令w初值是常数10。反向传播就是求最优w,即求最小loss对应的w值
#使用指数衰减的学习率,在迭代初期得到较高的下降速度,可以在较小的训练轮数下取得更有收敛度。
import tensorflow as tf
LEARNING_RATE_BASE = 0.1 #最初学习率
LEARNING_RATE_DECAY = 0.99 #学习率衰减率
LEARNING_RATE_STEP = 1 #喂入多少轮BATCH_SIZE后,更新一次学习率,一般设为:总样本数/BATCH_SIZE
#运行了几轮BATCH_SIZE的计数器,初值给0, 设为不被训练
global_step = tf.Variable(0, trainable=False)
#定义指数下降学习率
learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,
global_step,
LEARNING_RATE_STEP,
LEARNING_RATE_DECAY,
staircase=True)
#定义待优化参数,初值给10
w = tf.Variable(tf.constant(5, dtype=tf.float32))
#定义损失函数loss
loss = tf.square(w+1)
#定义反向传播方法
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,
global_step=global_step) # 每运行一次,这里的global_step都+1
#生成会话,训练40轮
with tf.Session() as sess:
init_op=tf.global_variables_initializer()
sess.run(init_op)
for i in range(40):
sess.run(train_step)
learning_rate_val = sess.run(learning_rate)
global_step_val = sess.run(global_step)
w_val = sess.run(w)
loss_val = sess.run(loss)
print ("After %s steps: global_step is %f, w is %f, learning rate is %f, loss is %f" % (i, global_step_val, w_val, learning_rate_val, loss_val))