神经网络学习（Octave转换为Python）

最新推荐文章于 2024-04-26 11:29:53 发布

Damon0626

最新推荐文章于 2024-04-26 11:29:53 发布

阅读量1.7k

点赞数 1

分类专栏：小实验文章标签：神经网络机器学习 TensorFlow

本文链接：https://blog.csdn.net/u013617229/article/details/84933642

版权

小实验专栏收录该内容

11 篇文章 0 订阅

订阅专栏

详细代码参考：github

训练3层神经网络实现手写数字识别功能

实例：
训练一个输入层400单元，隐藏层25单元，输出层10单元的简单神经网络，实现手写数字识别。

1.对原代码的更改

原Octave代码中，计算神经网络权重theta的自构函数fmincg太复杂，一时转换过来很麻烦，所以在原有代码的基础上，利用TensorFlow建立了一个3层的简单神经网络，逐步优化损失函数，得到权重和偏置，并将第一层中的损失函数theta1可视化。

对损失函数、梯度的验证还是在原有的代码基础上进行，只有在求Theta1和Theta2时，利用了TensorFlow。

2.载入数据

5000张图片中，随机选取100张进行绘制，每张中间设置一条1像素的白色边界，横向10张，纵向10张，如下图所示。由于每次都是随机，显示结果可能不一致。
在这里插入图片描述

参考代码：

def loadData(self, path):
	self.data = scio.loadmat(path)
	self.x = self.data["X"]  # (5000, 400)  # 原100训练
	self.y = self.data["y"]  # (5000, 1)
	index = random.sample([i for i in range(5000)], 100)  # 随机100个没有重复的数字
	self.pics = self.x[index, :]  # (100, 400)

# 为了能显示Theta1，对函数做了一点小修改
def display100Data(pics):
	example_width = int(np.sqrt(pics.shape[1]))  # 每张图片的宽
	example_hight = pics.shape[1] // example_width

	display_rows = int(np.sqrt(pics.shape[0]))  # 每行显示几张图片
	display_cols = pics.shape[0] // display_rows
	# print(pics[45, :])
	display_array = np.ones((1+display_rows*(example_hight+1), 1+display_cols*(example_width+1)))*200
	curr_ex = 0  # 当前每行张数
	for i in range(display_rows):
		for j in range(display_cols):
			if curr_ex >= pics.shape[0]:
				break
			max_val = np.max(np.abs(pics[curr_ex, :]))
			display_array[1+j*(example_hight+1):(j+1)*(example_hight+1), 1+i*(example_width+1):(i+1)*(example_width+1)] = \
				pics[curr_ex, :].reshape((20, 20)).transpose()/max_val*255
			curr_ex += 1

		if curr_ex >= pics.shape[0]:
			break
	plt.xticks([])
	plt.yticks([])
	plt.title("What the W1 look like from the NN Learning")
	plt.imshow(display_array, cmap='gray')
	plt.show()

3.神经网络构建

神经网络共3层，输入层，1层隐藏层，输出层：输入层401个输入（第1个为1），隐藏层26个单元，输出层10个单元（对应着0-9），如下图
在这里插入图片描述
参考代码：

def nnCostFunction(self, theta, x, y, lamda):
	m = x.shape[0]
	theta1 = np.reshape(theta[:self.hidden_layer_size*(self.input_layer_size+1)], (self.hidden_layer_size, self.input_layer_size+1))
	theta2 = np.reshape(theta[self.hidden_layer_size*(self.input_layer_size+1)::], (self.num_labels, self.hidden_layer_size+1))
	y = self.handleYtoOne(y)
	a1 = np.hstack([np.ones((m, 1)), x])  # 5000, 401
	z2 = a1.dot(theta1.T)  # 5000*25
	a2 = self.sigmoid(z2)
	n = a2.shape[0]  # 5000
	a2 = np.hstack([np.ones((n, 1)), a2])  # 5000*26
	z3 = a2.dot(theta2.T)
	a3 = self.sigmoid(z3)  # 5000*10

上述代码中有个非常重要的地方，就是y = self.handleYtoOne(y)函数的应用。
因为在Octave中源代码使用了y = eye(num_labels)(y,:);一行代码，大体上是这样的：
如果y = [10,10,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9]
那么上述得到的结果如下。因为Octave中索引一般都是从1开始，为了让索引和数值匹配，用10代表0。

   0   0   0   0   0   0   0   0   0   1
   0   0   0   0   0   0   0   0   0   1
   1   0   0   0   0   0   0   0   0   0
   1   0   0   0   0   0   0   0   0   0
   0   1   0   0   0   0   0   0   0   0
   0   1   0   0   0   0   0   0   0   0
   0   0   1   0   0   0   0   0   0   0
   0   0   1   0   0   0   0   0   0   0
   0   0   0   1   0   0   0   0   0   0
   0   0   0   1   0   0   0   0   0   0
   0   0   0   0   1   0   0   0   0   0
   0   0   0   0   1   0   0   0   0   0
   0   0   0   0   0   1   0   0   0   0
   0   0   0   0   0   1   0   0   0   0
   0   0   0   0   0   0   1   0   0   0
   0   0   0   0   0   0   1   0   0   0
   0   0   0   0   0   0   0   1   0   0
   0   0   0   0   0   0   0   1   0   0
   0   0   0   0   0   0   0   0   1   0
   0   0   0   0   0   0   0   0   1   0

4.计算损失函数和梯度

对损失函数进行了正则化，详细代码如下，为了详细，把每一层都拆分，略显臃肿。

def nnCostFunction(self, theta, x, y, lamda):
	m = x.shape[0]
	theta1 = np.reshape(theta[:self.hidden_layer_size*(self.input_layer_size+1)], (self.hidden_layer_size, self.input_layer_size+1))
	theta2 = np.reshape(theta[self.hidden_layer_size*(self.input_layer_size+1)::], (self.num_labels, self.hidden_layer_size+1))
	y = self.handleYtoOne(y)
	a1 = np.hstack([np.ones((m, 1)), x])  # 5000, 401
	z2 = a1.dot(theta1.T)  # 5000*25
	a2 = self.sigmoid(z2)
	n = a2.shape[0]  # 5000
	a2 = np.hstack([np.ones((n, 1)), a2])  # 5000*26
	z3 = a2.dot(theta2.T)
	a3 = self.sigmoid(z3)  # 5000*10

	J = np.sum(np.sum(-y*np.log(a3)-(1-y)*np.log(1-a3), axis=0))/m

	regularized1 = np.sum(np.sum(theta1[:, 1::]**2, axis=0))
	regularized2 = np.sum(np.sum(theta2[:, 1::]**2, axis=0))
	regularized = lamda/(2*m)*(regularized1 + regularized2)
	return J + regularized

利用反向传播计算梯度，该梯度值有两部分组成，Theta1部分和Theta2部分。
详细代码如下：

	def nnGradient(self, theta, x, y, lamda):
		m = x.shape[0]
		theta1 = np.reshape(theta[:self.hidden_layer_size*(self.input_layer_size+1)], (self.hidden_layer_size, self.input_layer_size+1))
		theta2 = np.reshape(theta[self.hidden_layer_size*(self.input_layer_size+1)::], (self.num_labels, self.hidden_layer_size+1))
		y = self.handleYtoOne(y)
		a1 = np.hstack([np.ones((m, 1)), x])  # 5000, 401
		z2 = a1.dot(theta1.T)  # 5000*25
		a2 = self.sigmoid(z2)
		n = a2.shape[0]  # 5000
		a2 = np.hstack([np.ones((n, 1)), a2])  # 5000*26
		z3 = a2.dot(theta2.T)
		a3 = self.sigmoid(z3)  # 5000*10

		delta3 = a3 - y
		delta2 = delta3.dot(theta2)
		delta2 = delta2[:, 1::]
		delta2 = delta2*self.sigmoidGradient(z2)  # 5000*25
		Delta1 = np.zeros(theta1.shape)
		Delta2 = np.zeros(theta2.shape)

		Delta1 = Delta1 + delta2.T.dot(a1)
		Delta2 = Delta2 + delta3.T.dot(a2)

		Theta1_grad = 1/m*Delta1
		Theta2_grad = 1/m*Delta2

		Regularized_T1 = lamda/m*theta1
		Regularized_T2 = lamda/m*theta2
		Regularized_T1[:, 0] = np.zeros((Regularized_T1.shape[0], ))
		Regularized_T2[:, 0] = np.zeros((Regularized_T2.shape[0], ))

		Theta1_grad += Regularized_T1
		Theta2_grad += Regularized_T2
		grade = np.hstack([Theta1_grad.flatten(), Theta2_grad.flatten()])
		return grade

5.梯度检查

在神经网络中，我们要最小化损失函数J, 而J是Theta的函数，我们在Theta周围找个很小的值e=0.0001，近似的计算下梯度，和步骤4中计算出的梯度进行简单的比较，如果差别不大，证明梯度求解没问题。原理公式如下：
在这里插入图片描述
参考代码：

def computeNumericalGradient(self, theta, x, y, lamda):  # (f(x+delta)-f(x-delta))/(2*delta)
	e = 0.0001
	numgrad = np.zeros(theta.shape)
	perturb = np.zeros(theta.shape)
	for i in range(theta.size):
		perturb[i] = e
		loss1 = self.nnCostFunction(theta - perturb, x, y, lamda)
		loss2 = self.nnCostFunction(theta + perturb, x, y, lamda)
		numgrad[i] = ((np.array(loss2) - np.array(loss1))/(2*e))
		perturb[i] = 0
	return numgrad

def checkNNGradients(self, lamda):
	self.input_layer_size = 3
	self.hidden_layer_size = 5
	self.num_labels = 3
	m = 5
	theta1 = self.debugInitializeWeights(self.hidden_layer_size, self.input_layer_size)
	theta2 = self.debugInitializeWeights(self.num_labels, self.hidden_layer_size)
	x = self.debugInitializeWeights(m, self.input_layer_size-1)
	y = 1 + np.mod([i+1 for i in range(m)], self.num_labels).T
	theta = np.hstack([theta1.flatten(), theta2.flatten()])
	cost = self.nnCostFunction(theta, x, y, lamda)
	grad = self.nnGradient(theta, x, y, lamda)
	numgrad = self.computeNumericalGradient(theta, x, y, lamda)
	# 求解最大奇异值
	diff = max((numgrad-grad)/(numgrad+grad))
	print(np.hstack([grad.reshape(-1, 1), numgrad.reshape(-1, 1)]))
	print("Relative Difference:", diff)

代码中求解到了最大奇异值，同时计算了相对差异，两个值的差异数量级小于1e-9，说明了两个值很接近，也证明了计算得到的梯度值是正确的。

6.利用TensorFlow建立神经网络

由于原代码中fmincg函数不好实现，所以利用TensorFlow来计算，同原码略有不同：
(1).原代码输入（5000， 401），tf中(5000, 400);
(2).权重的维度也简单的做了改变，Theta1:（26， 400)=>(25，400)，Theta2（10， 26）=>（10， 25）；
(3).损失函数更改为交叉熵；
(4).80%训练集，20%测试集。
网络结构：

X = tf.placeholder(tf.float32, [None, 400])
Y = tf.placeholder(tf.float32, [None, 10])

h1 = tf.Variable(tf.random_normal([400, 25]))
h2 = tf.Variable(tf.random_normal([25, 10]))

b1 = tf.Variable(tf.random_normal([25]))
b2 = tf.Variable(tf.random_normal([10]))


def neural_net(x):
	layer_1 = tf.add(tf.matmul(x, h1), b1)
	output_layer = tf.add(tf.matmul(layer_1, h2), b2)
	return output_layer

7.迭代训练

迭代训练100次，然后利用测试集对训练得到的模型进行验证。对于训练集，模型精度可以达到95%，使用测试集，模型精度达到89.5%，整体来说，还可以。吴老师给的参数，精度在96%，确实厉害。自己在调参的路上还需要继续努力和总结经验。
参考代码：

with tf.Session() as sess:
	sess.run(init)
	x1, y0 = loadData('ex4data1.mat')
	y1 = handleYtoOne(y0)
	index = random.sample([i for i in range(5000)], 4000)  # 80%training 20%testing
	train_x = x1[index, :]
	train_y = y1[index, :]
	test_x = np.delete(x1, index, 0)
	test_y = np.delete(y1, index, 0)

	for i in range(100):
		sess.run(train_op, feed_dict={X: train_x, Y: train_y})
		loss, acc = sess.run([loss_op, accuracy], feed_dict={X: train_x, Y: train_y})
		print("\r训练{}次: 损失函数{:.4f} ｜ 精度{:.4f}".format(i, loss, acc), end="")  # 精度可达94%
	print("\nTest Accuracy:%.4f%%" % (sess.run(accuracy, feed_dict={X: test_x, Y: test_y})))  # 精度89.5%
	pics = sess.run(h1)
	display100Data(pics.T)