快枯了,在枯之前终于写出来了。
一、前向传播和代价函数
我们可以看出,神经网络最终输出层 h θ h_{\theta} hθ为(5000,10),意思是5000个样本,它们对应分到(0,9)的概率大小,每一横行(i)代表对应的第i个样本输出为(0,9)的概率。
这里的代价函数的含义与logistic regression里的是一样的,由于Y是(5000,1),且输出为0-9,故为计算
J
(
θ
)
J_{(\theta)}
J(θ),我们应该将Y化简称下述形式:
然后我们就可以对
J
(
θ
)
J_{(\theta)}
J(θ)进行计算了,这里我为了计算方便,将matrix 转成了vector。
(参考https://blog.csdn.net/weixin_38705903/article/details/82957565)
// An highlighted block
'''二、计算代价,K为输出层个数,layer为隐藏层个数'''
m = X1.shape[0]
# 把Y变成[00100..]的形式,输出为Z
Z = np.zeros((m, K))
for i in range(m):
Z[i][Y[i]-1] = 1
# 计算J_theta
h = a3.reshape(m*K, 1)
Y1 = Z.reshape(m*K, 1)
# 正则化项要去掉theta的第一列
Theta11 = Theta1[:, 1:].reshape(n*layer, 1)
Theta22 = Theta2[:, 1:].reshape(K*layer, 1)
J = (-Y1.T.dot(np.log(h)) - (1 - Y1).T.dot(np.log(1 - h))) / m + (Theta11.T.dot(Theta11)+ Theta22.T.dot(Theta22))*lmda/2/m
二、反向传播
1.sigmoid函数
// An highlighted block
def g(z):
g = 1/(1+np.exp(-z))
return g
def g_d(z):
g_d = g(z)*(1-g(z))
return g_d
2.对
θ
\theta
θ进行随机初始化(神经网络不能初始化为0,要随机初始化成接近0的数)
// An highlighted block
import numpy as np
import math
def randInitializeWeights(L_in, L_out):
epsilon_init = math.sqrt(6)/math.sqrt(L_in+L_out)
theta_init = np.random.random ((L_out, L_in+1))*(2*epsilon_init)-epsilon_init
return theta_init
3.反向传播计算梯度
先前向传播得a和z,再反向传播得误差项
δ
\delta
δ,然后代入下式计算梯度。
// An highlighted block
'''一、前向传播计算'''
#a1 = X # (5000,401)
z2 = X.dot(Theta1.T) # (5000,25)
a2 = g(z2) # (5000, 25)
# 增加一个偏置维度
a2 = np.hstack((np.ones((a2.shape[0])).reshape(a2.shape[0], 1), a2)) #(5000, 26)
z3 = a2.dot(Theta2.T) # (5000, 10)
a3 = g(z3)
'''三、反向传播计算梯度'''
delta3 = a3 - Z # (5000,10)
z2 = np.hstack((np.ones((z2.shape[0])).reshape(z2.shape[0], 1), z2)) # (5000,26)
delta2 = delta3.dot(Theta2)*g_d(z2) # (5000,10)*(10,26)*(5000, 26)
delta2 = delta2[:, 1:] # (5000,25)
# 反向传播计算计算梯度
theta1_d = np.zeros(Theta1.shape) # (25, 401)
theta1_d = (theta1_d + delta2.T.dot(X))/m+ lmda/m*(np.column_stack((np.zeros(Theta1.shape[0]),Theta1[:,1:])))
theta2_d = np.zeros(Theta2.shape) # (10, 26)
theta2_d = (theta2_d + delta3.T.dot(a2))/m + lmda/m*(np.column_stack((np.zeros(Theta2.shape[0]),Theta2[:,1:])))
# 返回梯度,展成一维向量,方便最优化
grad = np.concatenate([theta1_d.flatten(), theta2_d.flatten()])
4.梯度检验,为了知道反向传播计算出来的梯度是否正确,我们需要用数值方法计算梯度,与反向传播得到的进行比较。
数值计算梯度:
// An highlighted block
def compute_numerial_gradient(cost_func, theta):
np.set_printoptions(precision=20)
numgrad = np.zeros(theta.size)
perturb = np.zeros(theta.size)
e = 1e-4
for p in range(theta.size):
perturb[p] = e
loss1, grad1 = cost_func(theta - perturb)
loss2, grad2 = cost_func(theta + perturb)
numgrad[p] = (loss2 - loss1) / (2 * e)
perturb[p] = 0
return numgrad
重新确定一组小样本来进行梯度检验
// An highlighted block
import numpy as np
import randInitializeWeights as diw
import nnCostFunction as ncf
import computeNumericalGradient as cng
def check_nn_gradients(lmd):
np.set_printoptions(precision=20)
input_layer_size = 3
hidden_layer_size = 5
num_labels = 3
m = 5
# We generatesome 'random' test data
theta1 = diw.randInitializeWeights(input_layer_size, hidden_layer_size)
theta2 = diw.randInitializeWeights(hidden_layer_size, num_labels)
# Reusing debugInitializeWeights to genete X
X = diw.randInitializeWeights(input_layer_size - 1, m)
X1 = np.hstack((np.ones((X.shape[0])).reshape(X.shape[0], 1), X))
y = 1 + np.mod(np.arange(1, m + 1), num_labels)
# Unroll parameters
nn_params = np.concatenate([theta1.flatten(), theta2.flatten()])
def cost_func(p):
return ncf.ex3_nn(X, y, p, num_labels, theta1, theta2, X1, hidden_layer_size, lmd)
cost, grad = cost_func(nn_params)
numgrad = cng.compute_numerial_gradient(cost_func, nn_params)
print(np.c_[grad, numgrad])
检验结果如下:
可以看出反向传播与数值检验得到的梯度近似相同,这证明我们的反向传播算法是正确的,然后关闭梯度检验。
三、最优化训练网络权值并预测
# 训练网络
lmda = 1
def cost_func(t):
return ex3_nn(X1, Y, t, K, X, layer, lmda)[0]
def grad_func(t):
return ex3_nn(X1, Y, t, K, X, layer, lmda)[1]
# 使用opt.fmin_cg()来获得最优解(不知道为什么bfgs不可以)
print('I am training...')
nn_parameter, cost, *unused = opt.fmin_cg(f=cost_func, fprime=grad_func, x0=rand_nn_parameters, maxiter=100, full_output=True, disp=True)
Theta1 = nn_parameter[:layer * (n + 1)].reshape(layer, n + 1)
Theta2 = nn_parameter[layer * (n + 1):].reshape(K, layer + 1)
# 看看神经网络5000组样本的预测准确性预测值
pred11, O_2 = nn_predict(Theta1, Theta2, X)
print('nnTraining set accurayc:{}'.format(np.mean(pred11 == Y)*100))
代码已上传:https://github.com/hellobigorange/my_own_machineLearning