原理篇可以参考这几个:
https://www.cnblogs.com/guoyaohua/p/8724433.html
https://www.cnblogs.com/makefile/p/batch-norm.html
https://blog.csdn.net/qq_25737169/article/details/79048516
本文调用的batch normal接口是 tf.layers.batch_normalization,需要注意的是tf.layers.batch_normalization中的training参数,因为在正态归一化后,还有两个变量用来放缩和位移,这两个变量在需要训练。所以training这个参数在训练阶段需要为True,在预测阶段为False。另外,计算loss时,要添加以下代码(即添加update_ops到最后的train_op中),tf.layers.batch_normalization
会自动将 update_ops 添加到 tf.GraphKeys.UPDATE_OPS
这个 collection 中(注:training
参数为 True 时,才会添加,False 时不添加)
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
self.train_op = tf.train.AdamOptimizer(self.lr).minimize(self.loss)
完整代码:
import tensorflow as tf
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
class NN_BN:
def __init__(self, in_dim, lr=0.01):
self.in_dim = in_dim
self.lr = lr
self.X = tf.placeholder(dtype=tf.float32, shape=[None, self.in_dim], name='input_x')
self.y = tf.placeholder(dtype=tf.float32, shape=[None, 1], name='input_y')
self.training = tf.placeholder_with_default(False, shape=(), name='training')
self.yhat = self._build_graph()
self.loss = tf.reduce_mean(tf.square(self.y - self.yhat))
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
self.train_op = tf.train.AdamOptimizer(self.lr).minimize(self.loss)
self.sess = tf.Session()
self.sess.run(tf.global_variables_initializer())
def _build_graph(self):
l1 = tf.layers.dense(self.X, 256, activation=tf.nn.relu)
bn1 = tf.layers.batch_normalization(l1, training=self.training)
l2 = tf.layers.dense(bn1, 128, activation=tf.nn.relu)
bn2 = tf.layers.batch_normalization(l2, training=self.training)
l3 = tf.layers.dense(bn2, 64, activation=tf.nn.relu)
output = tf.layers.dense(l3, 1)
return output
def fit(self, X, y, epoch):
for i in range(epoch):
loss = self.sess.run(self.loss, feed_dict={self.X: X, self.y: y, self.training: True})
print('Epoch', i, ', loss:', loss)
self.sess.run(self.train_op, feed_dict={self.X: X, self.y: y, self.training: True})
def predict(self, X):
return self.sess.run(self.yhat, feed_dict={self.X: X, self.training: False})
def run():
X, y = load_boston(return_X_y=True)
y = y.reshape(-1, 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
model = NN_BN(X.shape[1])
model.fit(X_train, y_train, epoch=256)
y_pred = model.predict(X_test)
loss = np.mean(np.square(y_pred - y_test))
print('Test loss:', loss)
if __name__ == '__main__':
run()
使用公开数据集:波士顿房价,一共354条数据,按7 : 3 随机划分训练集和测试集。
测试:
与不使用batch normal相比,训练误差明显降低。但测试误差多次测试发现并不稳定,与不使用BN没有明显的提升(猜测原因:一是可能与训练数据量有关,这个数据集太小,在训练集正态归一化的时候,可能存在误差;二是网络结构比较简单,层数太少,本来就基本不存在梯度消失或梯度爆炸问题,所以加上BN的效果也不是很大)。