模块化搭建神经网络
前向传播:由输入到输出,搭建完整的网络结构,描述前向传播的过程需要定义三个函数:
(1) 第一个函数 forward()完成网络结构的设计,从输入到输出搭建完整的网络结构,实现前向传播过程。
该函数中,参数 x 为输入,regularizer 为正则化权重,返回值为预测或分类结果 y。def forward(x, regularizer): w= b= y= return y
(2) 第二个函数 get_weight()对参数 w 设定。
该函数中,参数 shape 表示参数 w 的形状,regularizer表示正则化权重,返回值为参数 w。
其中,tf.variable()给 w 赋初值,tf.add_to_collection()表示将参数 w 正则化损失加到总损失 losses 中。def get_weight(shape, regularizer): w = tf.Variable() tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w)) return w
(3) 第三个函数 get_bias()对参数 b 进行设定。
该函数中,参数 shape 表示参数 b 的形状,返回值为参数b。
其中,tf.variable()表示给 b 赋初值。def get_bias(shape): b = tf.Variable() return b
反向传播:训练网络,优化网络参数,提高模型准确性。
函数 backward()中, placeholder()实现对数据集 x 和标准答案 y_占位, forward.forward()实现前向传播的网络结构,参数 global_step 表示训练轮数,设置为不可训练型参数。def backward( ): x = tf.placeholder() y_ = tf.placeholder() y = forward.forward(x, REGULARIZER) global_step = tf.Variable(0, trainable=False) loss =
注意:在训练网络模型时,常将正则化、指数衰减学习率和滑动平均这三个方法作为模型优化方法。其中,滑动平均和指数衰减学习率中的 global_step 为同一个参数。
例如:在前一篇文章的例子中,我们加入指数衰减学习率优化效率,加入正则化提高泛化性,并使用模块化设计方法,把红色点和蓝色点分开。
代码总共分为三个模块 : 生成数据集(generateds.py) 、 前向传播 (forward.py) 、 反向传播(backward.py)。
(1) generateds.py#coding:utf-8 import numpy as np SEED = 2 def generateds(): rdm = np.random.RandomState(SEED) X = rdm.randn(300, 2) Y_ = [int(xi[0]*xi[0] + xi[1]*xi[1] < 2) for xi in X] Y_c = [['red' if yi else 'blue'] for yi in Y_] X = np.vstack(X).reshape(-1, 2) Y_ = np.vstack(Y_).reshape(-1, 1) return X, Y_, Y_c
(2) forward.py
#coding:utf-8 import tensorflow as tf def get_weight(shape, regularizer): w = tf.Variable(tf.random_normal(shape), dtype=tf.float32) tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w)) return w def get_bias(shape): b = tf.Variable(tf.constant(0.01, shape=shape)) return b def forward(x, regularizer): w1 = get_weight([2, 11], regularizer) b1 = get_bias([11]) y1 = tf.nn.relu(tf.matmul(x, w1) + b1) w2 = get_weight([11, 1], regularizer) b2 = get_bias([1]) y = tf.matmul(y1, w2) + b2 return y
(3) backward.py
#coding:utf-8 import tensorflow as tf import numpy as np import matplotlib.pyplot as plt import generateds import forward DATA_NUM = 300 BATCH_SIZE = 30 REGULARIZER = 0.01 LR = 0.001 LR_DECAY_STEPS = DATA_NUM // BATCH_SIZE LR_DECAY_RATE = 0.999 STEPS = 40000 def backward(): x = tf.placeholder(tf.float32, [None, 2]) y_ = tf.placeholder(tf.float32, [None, 1]) global_step = tf.Variable(0, trainable=False) X, Y_, Y_c = generateds.generateds() y = forward.forward(x, REGULARIZER) lr = tf.train.exponential_decay( learning_rate = LR, global_step = global_step, decay_steps = LR_DECAY_STEPS, decay_rate = LR_DECAY_RATE, staircase = True ) loss_mse = tf.reduce_mean(tf.square(y-y_)) loss_total = loss_mse + tf.add_n(tf.get_collection('losses')) train_step = tf.train.AdamOptimizer(lr).minimize(loss_total, global_step) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in range(STEPS): start = (i*BATCH_SIZE) % 300 end = min(start+BATCH_SIZE, 300) sess.run(train_step, feed_dict={x:X[start:end], y_:Y_[start:end]}) if i % 2000 == 0: loss_v = sess.run(loss_total, feed_dict={x:X, y_:Y_}) print('After %d stpes, loss is: %f' % (i, loss_v)) xx, yy = np.mgrid[-3:3:0.01, -3:3:0.01] grid = np.c_[xx.ravel(), yy.ravel()] probs = sess.run(y, feed_dict={x: grid}) probs = probs.reshape(xx.shape) plt.scatter(X[:,0], X[:,1], c=np.squeeze(Y_c)) plt.contour(xx, yy, probs, levels=[.5]) plt.show() if __name__ == '__main__': backward()
运行结果图: