激活函数,均方误差 mse,学习率 learning_rate设计策略,滑动平均,正则化,模块化编程。
- 激活函数:引入非线性激活因素,提高模型的表达力。激活函数 sigmoid:在 Tensorflow 中,用 tf.nn.sigmoid(),包装过后恒正所以引出下一个激活函数,激活函数 tanh:在 Tensorflow 中,用 tf.nn.tanh(),梯度弥散(在梯度过大时),所以激活函数 relu: 在 Tensorflow 中,用 tf.nn.relu()
- 均方误差 mse:n 个样本的预测值 y 与已知答案 y_之差的平方和,再求平均值在 Tensorflow 中用 loss_mse = tf.reduce_mean(tf.square(y_ - y))
#0导入模块,生成数据集
import tensorflow as tf
import numpy as np
BATCH_SIZE = 8
X = np.random.rand(32,2)
Y_ = [[x1+x2+np.random.rand()/10.-0.05] for (x1,x2) in X]
#1定义神经网络的输入、参数和输出,定义前向传播过程。
x = tf.placeholder(tf.float32,shape=[None,2])
y_ = tf.placeholder(tf.float32,shape=[None,1])
w1 = tf.Variable(tf.random_normal([2,1],stddev=1))
y = tf.matmul(x,w1)
#2定义损失函数及反向传播方法。
#定义损失函数为MSE,反向传播方法为梯度下降。
loss_mse = tf.reduce_mean(tf.square(y-y_))
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)
#3生成会话,训练STEPS轮
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(20000):
start = (i*BATCH_SIZE)%32
end = start+BATCH_SIZE
loss_mse_dis,_ = sess.run([loss_mse,train_step],feed_dict={x:X[start:end],y_:Y_[start:end]})
if i %2000 == 0:
print('After {}steps loss:{}'.format(i,loss_mse_dis))
print('w1:',w1)
print('w1:', sess.run(w1))
out:
After 0steps loss:0.22664415836334229
After 2000steps loss:0.03838531672954559
After 4000steps loss:0.01794697903096676
After 6000steps loss:0.009312132373452187
After 8000steps loss:0.005021141842007637
After 10000steps loss:0.0028489811811596155
After 12000steps loss:0.001746544730849564
After 14000steps loss:0.0011868053115904331
After 16000steps loss:0.0009025642066262662
After 18000steps loss:0.0007582578109577298
w1: <tf.Variable 'Variable:0' shape=(2, 1) dtype=float32_ref>
w1: [[0.9663933]
[1.0228527]]
Process finished with exit code 0
- 学习率 learning_rate:表示了每次参数更新的幅度大小。学习率过大,会导致待优化的参数在最
小值附近波动,不收敛;学习率过小,会导致待优化的参数收敛缓慢。给出学习率设置的策略:指数衰减学习率:学习率随着训练轮数变化而动态更新。Learning_rate=LEARNING_RATE_BASELEARNING_RATE_DECAYLEARNING_RATE_STEP(更新学习率的频率,每隔多少轮批次进行更新)
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
LEARNING_RATE_STEP, LEARNING_RATE_DECAY,
staircase=True/False)
其中,LEARNING_RATE_BASE 为学习率初始值,LEARNING_RATE_DECAY 为学习率衰减率,global_step 记 录了当前训练轮数,为不可训练型参数。学习率 learning_rate 更新频率为输入数据集总样本数除以每次喂入样本数。若 staircase 设置为 True 时,表示 global_step/learning rate step 取整数,学习率阶梯型衰减;若 staircase 设置为 false 时,学习率会是一条平滑下降的曲线。
import tensorflow as tf
LEARNING_RATE_BASE = 0.1
LEARNING_RATE_DECAY = 0.99
LEARNING_RATE_STEP = 1
global_step = tf.Variable(0,trainable=False)
learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,
global_step,
LEARNING_RATE_STEP,
LEARNING_RATE_DECAY,
staircase=True)
w = tf.Variable(tf.constant(5.,tf.float32))
loss = tf.square(w+1)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(40):
w_dis,learning_rate_dis,loss_dis,global_step_dis,_ = sess.run([w,learning_rate,loss,global_step,train_step])
print ("After %s steps: global_step is %f, w is %f, learning rate is %f, loss is %f" % (i, global_step_dis, w_dis, learning_rate_dis, loss_dis))
out:
After 0 steps: global_step is 1.000000, w is 5.000000, learning rate is 0.100000, loss is 36.000000
After 1 steps: global_step is 2.000000, w is 3.800000, learning rate is 0.099000, loss is 23.040001
After 2 steps: global_step is 3.000000, w is 2.849600, learning rate is 0.098010, loss is 14.819419
After 3 steps: global_step is 4.000000, w is 2.095001, learning rate is 0.097030, loss is 9.579033
After 4 steps: global_step is 5.000000, w is 1.494386, learning rate is 0.096060, loss is 6.221960
After 5 steps: global_step is 6.000000, w is 1.015166, learning rate is 0.095099, loss is 4.060895
After 6 steps: global_step is 7.000000, w is 0.631886, learning rate is 0.094148, loss is 2.663051
After 7 steps: global_step is 8.000000, w is 0.324608, learning rate is 0.093207, loss is 1.754587
After 8 steps: global_step is 9.000000, w is 0.077684, learning rate is 0.092274, loss is 1.161402
After 9 steps: global_step is 10.000000, w is -0.121202, learning rate is 0.091352, loss is 0.772287
After 10 steps: global_step is 11.000000, w is -0.281761, learning rate is 0.090438, loss is 0.515867
After 11 steps: global_step is 12.000000, w is -0.411674, learning rate is 0.089534, loss is 0.346128
- 滑动平均:记录了一段时间内模型中所有参数 w 和 b 各自的平均值。利用滑动平均值可以增强模 型的泛化能力。
ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step)
其中,MOVING_AVERAGE_DECAY 表示滑动平均衰减率,一般会赋接近 1 的值,global_step 表示当前
训练了多少轮。 √ema_op = ema.apply(tf.trainable_variables())
其中,ema.apply()函数实现对括号内参数求滑动平均,tf.trainable_variables()函数实现把所有
待训练参数汇总为列表。
√with tf.control_dependencies([train_step, ema_op]): train_op = tf.no_op(name=‘train’)
其中,该函数实现将滑动平均和训练过程同步运行。
查看模型中参数的平均值,可以用 ema.average()函数。
影子 = 衰减率 * 影子 +(1 - 衰减率)* 参数
将 MOVING_AVERAGE_DECAY 设置为 0.99,参数 w1 设置为 0,w1 的滑动平均值设
置为 0,开始时,轮数 global_step 设置为 0,参数 w1 更新为 1,则 w1 的滑动平均值为:
w1 滑动平均值=min(0.99,1/10)*0+(1– min(0.99,1/10)*1 = 0.9
import tensorflow as tf
w1 = tf.Variable(0,dtype=tf.float32)
global_step = tf.Variable(0,trainable=False)
MOVING_AVERAGE_DECAY = 0.99
ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step)
ema_op = ema.apply(tf.trainable_variables())
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print("current global_step:", sess.run(global_step))
print("current w1", sess.run([w1, ema.average(w1)]) )
sess.run(tf.assign(w1, 1))
sess.run(ema_op)
print("current global_step:", sess.run(global_step))
print("current w1", sess.run([w1, ema.average(w1)]))
sess.run(tf.assign(w1, 10))
sess.run(ema_op)
print("current global_step:", sess.run(global_step))
print("current w1", sess.run([w1, ema.average(w1)]))
out:
current global_step: 0
current w1 [0.0, 0.0]
current global_step: 0
current w1 [1.0, 0.9]
current global_step: 0
current w1 [10.0, 9.09]
Process finished with exit code 0
- 正则化:在损失函数中给每个参数 w 加上权重,引入模型复杂度指标,从而抑制模型噪声,减小
过拟合。loss = loss(y 与 y_) + REGULARIZER*loss(w)
L1 正则化: 𝒍𝒐𝒔𝒔𝑳𝟏 = ∑𝒊|𝒘𝒊| 用 Tesnsorflow 函数表示:loss(w) = tf.contrib.layers.l1_regularizer(REGULARIZER)(w)
L2 正则化: 𝒍𝒐𝒔𝒔𝑳𝟐 = ∑𝒊|𝒘𝒊|^2用 Tesnsorflow 函数表示:loss(w) = tf.contrib.layers.l2_regularizer(REGULARIZER)(w)
tf.add_to_collection(‘losses’, tf.contrib.layers.l2_regularizer(regularizer)(w)
loss = cem + tf.add_n(tf.get_collection(‘losses’))
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
BATCH_SIZE = 30
X = np.random.randn(300,2)
Y_ = [int(x0**2+x1**2<2) for (x0,x1) in X]
Y_c = [['red' if y else 'blue'] for y in Y_]
X = np.vstack(X).reshape(-1,2)
Y_ = np.vstack(Y_).reshape(-1,1)
plt.scatter(X[:,0], X[:,1], c=np.squeeze(Y_c))
plt.show()
def get_weight(shape,regularizer):
w = tf.Variable(tf.random_normal(shape),dtype = tf.float32)
tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(regularizer)(w))
return w
def get_bias(shape):
b = tf.Variable(tf.constant(0.01,shape=shape))
return b
x = tf.placeholder(tf.float32,shape=[None,2])
y_ = tf.placeholder(tf.float32,shape=[None,1])
w1 = get_weight([2,11],0.01)
b1 = get_bias([11])
y1 = tf.nn.relu(tf.matmul(x,w1)+b1)
w2 = get_weight([11,1],0.01)
b2 = get_bias([1])
y = tf.matmul(y1,w2)+b2
loss_mse = tf.reduce_mean(tf.square(y-y_))
loss_total = loss_mse + tf.add_n(tf.get_collection('losses'))
train_step = tf.train.AdamOptimizer(0.0001).minimize(loss_mse)
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
STEPS = 20000
for i in range(STEPS):
start = (i*BATCH_SIZE) % 300
end = start + BATCH_SIZE
sess.run(train_step, feed_dict={x:X[start:end], y_:Y_[start:end]})
if i % 2000 == 0:
loss_mse_v = sess.run(loss_mse, feed_dict={x:X, y_:Y_})
print("After %d steps, loss is: %f" %(i, loss_mse_v))
xx, yy = np.mgrid[-3:3:.01, -3:3:.01]
grid = np.c_[xx.ravel(), yy.ravel()]
probs = sess.run(y, feed_dict={x: grid})
probs = probs.reshape(xx.shape)
plt.scatter(X[:,0], X[:,1], c=np.squeeze(Y_c))
plt.contour(xx, yy, probs, levels=[.5])
plt.show()
#采用正则化
train_step = tf.train.AdamOptimizer(0.0001).minimize(loss_total)
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
STEPS = 20000
for i in range(STEPS):
start = (i*BATCH_SIZE) % 300
end = start + BATCH_SIZE
sess.run(train_step, feed_dict={x: X[start:end], y_:Y_[start:end]})
if i % 2000 == 0:
loss_v = sess.run(loss_total, feed_dict={x:X,y_:Y_})
print("After %d steps, loss is: %f" %(i, loss_v))
xx, yy = np.mgrid[-3:3:.01, -3:3:.01]
grid = np.c_[xx.ravel(), yy.ravel()]
probs = sess.run(y, feed_dict={x:grid})
probs = probs.reshape(xx.shape)
plt.scatter(X[:,0], X[:,1], c=np.squeeze(Y_c))
plt.contour(xx, yy, probs, levels=[.5])
plt.show()
- 模块化编程:1.generateds 2.forward 3. backward
#generateds
import numpy as np
seed = 2
def generateds():
# 基于seed产生随机数
rdm = np.random.RandomState(seed)
# 随机数返回300行2列的矩阵,表示300组坐标点(x0,x1)作为输入数据集
X = rdm.randn(300, 2)
# 从X这个300行2列的矩阵中取出一行,判断如果两个坐标的平方和小于2,给Y赋值1,其余赋值0
# 作为输入数据集的标签(正确答案)
Y_ = [int(x0 * x0 + x1 * x1 < 2) for (x0, x1) in X]
# 对数据集X和标签Y进行形状整理,第一个元素为-1表示跟随第二列计算,第二个元素表示多少列,可见X为两列,Y为1列
X = np.vstack(X).reshape(-1, 2)
Y_ = np.vstack(Y_).reshape(-1, 1)
return X, Y_
#forward
#0导入模块 ,生成模拟数据集
import tensorflow as tf
# 定义神经网络的输入、参数和输出,定义前向传播过程
def get_weight(shape, regularizer):
w = tf.Variable(tf.random_normal(shape), dtype=tf.float32)
tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w))
return w
def get_bias(shape):
b = tf.Variable(tf.constant(0.01, shape=shape))
return b
def forward(x, regularizer):
w1 = get_weight([2, 11], regularizer)
b1 = get_bias([11])
y1 = tf.nn.relu(tf.matmul(x, w1) + b1)
w2 = get_weight([11, 1], regularizer)
b2 = get_bias([1])
y = tf.matmul(y1, w2) + b2
return y
#backward
#0导入模块 ,生成模拟数据集
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import forward
import generateds
STEPS = 40000
BATCH_SIZE = 30
LEARNING_RATE_BASE = 0.001
LEARNING_RATE_DECAY = 0.999
REGULARIZER = 0.01
def backward():
x = tf.placeholder(tf.float32, shape=(None, 2))
y_ = tf.placeholder(tf.float32, shape=(None, 1))
X, Y_= generateds.generateds()
y = forward.forward(x, REGULARIZER)
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
300 / BATCH_SIZE,
LEARNING_RATE_DECAY,
staircase=True)
# 定义损失函数
loss_mse = tf.reduce_mean(tf.square(y - y_))
loss_total = loss_mse + tf.add_n(tf.get_collection('losses'))
# 定义反向传播方法:包含正则化
train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss_total)
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
for i in range(STEPS):
start = (i * BATCH_SIZE) % 300
end = start + BATCH_SIZE
sess.run(train_step, feed_dict={x: X[start:end], y_: Y_[start:end]})
if i % 2000 == 0:
loss_v = sess.run(loss_total, feed_dict={x: X, y_: Y_})
print("After %d steps, loss is: %f" % (i, loss_v))
if __name__ == '__main__':
backward()
After 30000 steps, loss is: 0.090862
After 32000 steps, loss is: 0.090853
After 34000 steps, loss is: 0.090845
After 36000 steps, loss is: 0.090838
After 38000 steps, loss is: 0.090823