使用神经网络进行机器学习
在本次实验中,你将学会如何实现神经网络的误差反传训练算法,并应用它进行手写数字识别。
# 查看当前挂载的数据集目录
!ls /home/aistudio/data
data6559
数据介绍
本次练习所用的数据集有5000个训练样本,每个样本对应于20x20大小的灰度图像。这些训练样本包括了9-0共十个数字的手写图像。这些样本中每个像素都用浮点数表示。加载得到的数据中,每幅图像都被展开为一个400维的向量,构成了数据矩阵中的一行。完整的训练数据是一个5000x400的矩阵,其每一行为一个训练样本(数字的手写图像)。数据中,对应于数字"0"的图像被标记为"10",而数字"1"到"9"按照其自然顺序被分别标记为"1"到"9"。数据集保存在NN_data.mat
.
模型表示
我们准备训练的神经网络是一个三层的结构,一个输入层,一个隐层以及一个输出层。由于我们训练样本(图像)是20x20的,所以输入层单元数为400(不考虑额外的偏置项,如果考虑单元个数需要+1)。在我们的程序中,数据会被加载到变量 X X X 和 y y y 里。
本项练习提供了一组训练好的网络参数
(
Θ
(
1
)
,
Θ
(
2
)
)
(\Theta^{(1)}, \Theta^{(2)})
(Θ(1),Θ(2)) 。这些数据存储在数据文件 NN_weights.mat
,在程序中被加载到变量 Theta1
与 Theta2
中。参数的维度对应于第二层有25个单元、10个输出单元(对应于10个数字 的类别)的网络。
import numpy as np
import scipy.io as sio
from scipy.optimize import fmin_cg
import matplotlib.pyplot as plt
def display_data(data, img_width=20):
"""将图像数据 data 按照矩阵形式显示出来"""
plt.figure()
# 计算数据尺寸相关数据
n_rows, n_cols = data.shape
img_height = n_cols // img_width
# 计算显示行数与列数
disp_rows = int(np.sqrt(n_rows))
disp_cols = (n_rows + disp_rows - 1) // disp_rows
# 图像行与列之间的间隔
pad = 1
disp_array = np.ones((pad + disp_rows*(img_height + pad),
pad + disp_cols*(img_width + pad)))
idx = 0
for row in range(disp_rows):
for col in range(disp_cols):
if idx > m:
break
# 复制图像块
rb = pad + row*(img_height + pad)
cb = pad + col*(img_width + pad)
disp_array[rb:rb+img_height, cb:cb+img_width] = data[idx].reshape((img_height, -1), order='F')
# 获得图像块的最大值,对每个训练样本分别归一化
max_val = np.abs(data[idx].max())
disp_array[rb:rb+img_height, cb:cb+img_width] /= max_val
idx += 1
plt.imshow(disp_array)
plt.gray()
plt.axis('off')
plt.savefig('data-array.png', dpi=150)
plt.show()
前向传播与代价函数
现在你需要实现神经网络的代价函数及其梯度。首先需要使得函数 nn_cost_function
能够返回正确的代价值。
神经网络的代价函数(不包括正则化项)的定义为:
J
(
θ
)
=
1
m
∑
i
=
1
m
∑
k
=
1
K
[
−
y
k
(
i
)
log
(
(
h
θ
(
x
(
i
)
)
)
k
)
−
(
1
−
y
k
(
i
)
)
log
(
1
−
(
h
θ
(
x
(
i
)
)
)
k
)
]
J(\theta) = \frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} \left[-y_k^{(i)} \log\left((h_{\theta}(x^{(i)}))_k\right) -(1 - y_k^{(i)}) \log\left(1 - (h_{\theta}(x^{(i)}))_k\right) \right]
J(θ)=m1i=1∑mk=1∑K[−yk(i)log((hθ(x(i)))k)−(1−yk(i))log(1−(hθ(x(i)))k)]
其中
h
θ
(
x
(
i
)
)
h_{\theta}(x^{(i)})
hθ(x(i)) 的计算如神经网络结构图所示,
K
=
10
K=10
K=10 是 所有可能的类别数。这里的
y
y
y 使用了one-hot 的表达方式。
运行程序,使用预先训练好的网络参数,确认你得到的代价函数是正确的。(正确的代价约为0.287629)。
代价函数的正则化
神经网络包括正则化项的代价函数为:
J
(
θ
)
=
1
m
∑
i
=
1
m
∑
k
=
1
K
[
−
y
k
(
i
)
log
(
(
h
θ
(
x
(
i
)
)
)
k
)
−
(
1
−
y
k
(
i
)
)
log
(
1
−
(
h
θ
(
x
(
i
)
)
)
k
)
]
+
λ
2
m
[
∑
j
=
1
25
∑
k
=
1
400
(
Θ
j
,
k
(
1
)
)
2
+
∑
j
=
1
10
∑
k
=
1
25
(
Θ
j
,
k
(
2
)
)
2
]
J(\theta) = \frac{1}{m}\sum_{i=1}^{m} \sum_{k=1}^{K} \left[-y_k^{(i)} \log\left((h_{\theta}(x^{(i)}))_k\right) -(1 - y_k^{(i)}) \log\left(1 - (h_{\theta}(x^{(i)}))_k\right) \right] + \frac{\lambda}{2m} \left[\sum_{j=1}^{25} \sum_{k=1}^{400} (\Theta_{j,k}^{(1)})^2 +\sum_{j=1}^{10} \sum_{k=1}^{25} (\Theta_{j,k}^{(2)})^2 \right]
J(θ)=m1i=1∑mk=1∑K[−yk(i)log((hθ(x(i)))k)−(1−yk(i))log(1−(hθ(x(i)))k)]+2mλ[j=1∑25k=1∑400(Θj,k(1))2+j=1∑10k=1∑25(Θj,k(2))2]
注意在上面式子中,正则化项的加和形式与练习中设定的网络结构一致。但是你的代码实现要保证能够用于任意大小的神经网络。
此外,还需要注意,对应于偏置项的参数不能包括在正则化项中。对于矩阵 Theta1
与 Theta2
而言,这些项对应于矩阵的第一列。
运行程序,使用预先训练好的权重数据,设置正则化系数
λ
=
1
\lambda=1
λ=1 (lmb
) 确认你得到的代价函数是正确的。(正确的代价约为0.383770)。
此步练习需要你补充实现 nn_cost_function
。
def nn_cost_function(nn_params, *args):
"""神经网络的损失函数"""
# Unpack parameters from *args
input_layer_size, hidden_layer_size, num_labels, lmb, X, y = args
# Unroll weights of neural networks from nn_params
Theta1 = nn_params[:hidden_layer_size*(input_layer_size + 1)]
Theta1 = Theta1.reshape((hidden_layer_size, input_layer_size + 1)) # (25,401)
Theta2 = nn_params[hidden_layer_size*(input_layer_size + 1):]
Theta2 = Theta2.reshape((num_labels, hidden_layer_size + 1)) # (10,26)
# 设置变量
m = X.shape[0]
# You need to return the following variable correctly
J = 0.0
# ====================== 你的代码 ======================
# 计算损失函数J的值
a_2 = sigmoid(np.dot(Theta1,np.insert(X.T, 0, values=1, axis=0))) # (25,5000) # 需要补一行偏置
h = sigmoid(np.dot(Theta2,np.insert(a_2, 0, values=1, axis=0))) # (10,5000)
# for i in range(m):
# for k in range(1, num_labels+1): # 注意k的范围,(one-hot顺序:1,2,3,4,5,6,7,8,9,0)
# y_i_k = (int)(y[i]==k)
# J += -y_i_k*np.log(h[k-1, i]) - (1-y_i_k)*np.log(1-h[k-1, i])
# J /= m
y = y.reshape(m,1)
Y = np.array([[i for i in range(1,num_labels+1)]for j in range(m)]) # (5000,10)
Y = (Y==y).astype(int).T # (10,5000)
J = 1/m * ( -1*Y*np.log(h) - (1-Y)*np.log(1-h) ).sum()
regular = (lmb/(2*m)) * ( (np.delete(Theta1, 0, axis=1)**2).sum() + (np.delete(Theta2, 0, axis=1)**2).sum() )
J += regular
# ======================================================
return J
误差反传训练算法 (Backpropagation)
现在你需要实现误差反传训练算法。误差反传算法的思想大致可以描述如下。对于一个训练样本
(
x
(
t
)
,
y
(
t
)
)
(x^{(t)}, y^{(t)})
(x(t),y(t)) ,我们首先使用前向传播计算网络中所有单元(神经元)的激活值(activation),包括假设输出
h
Θ
(
x
)
h_{\Theta}(x)
hΘ(x) 。那么,对于第
l
l
l 层的第
j
j
j 个节点,我们期望计算出一个“误差项”
δ
j
(
l
)
\delta_{j}^{(l)}
δj(l) 用于衡量该节点对于输出的误差的“贡献”。
对于输出节点,我们可以直接计算网络的激活值与真实目标值之间的误差。对于我们所训练的第3层为输出层的网络,这个误差定义了 δ j ( 3 ) \delta_{j}^{(3)} δj(3) 。对于隐层单元,需要根据第 l + 1 l+1 l+1 层的节点的误差的加权平均来计算 δ j ( l ) \delta_{j}^{(l)} δj(l) 。
下面是误差反传训练算法的细节(如图3所示)。你需要在一个循环中实现步骤1至4。循环的每一步处理一个训练样本。第5步将累积的梯度除以 m m m 以得到神经网络代价函数的梯度。
-
设输入层的值 a ( 1 ) a^{(1)} a(1) 为第 t t t 个训练样本 x ( t ) x^{(t)} x(t) 。执行前向传播,计算第2层与第3层各节点的激活值( z ( 2 ) , a ( 2 ) , z ( 3 ) , a ( 3 ) z^{(2)}, a^{(2)}, z^{(3)}, a^{(3)} z(2),a(2),z(3),a(3) )。注意你需要在 a ( 1 ) a^{(1)} a(1) 与 a ( 2 ) a^{(2)} a(2) 增加一个全部为 +1 的向量,以确保包括了偏置项。在
numpy
中可以使用函数ones
,hstack
,vstack
等完成(向量化版本)。 -
对第3层中的每个输出单元 k k k ,计算
δ k ( 3 ) = a k ( 3 ) − y k \delta_{k}^{(3)} = a_{k}^{(3)} - y_k δk(3)=ak(3)−yk
其中 y k ∈ { 0 , 1 } y_k \in \{0, 1\} yk∈{0,1} 表示当前训练样本是否是第 k k k 类。 -
对隐层 l = 2 l=2 l=2 , 计算
δ ( 2 ) = ( Θ ( 2 ) ) T δ ( 3 ) . ∗ g ′ ( z ( 2 ) ) \delta^{(2)} = \left( \Theta^{(2)} \right)^T \delta^{(3)} .* g^{\prime} (z^{(2)}) δ(2)=(Θ(2))Tδ(3).∗g′(z(2))
其中 g ′ g^{\prime} g′ 表示 Sigmoid 函数的梯度,
.*
在numpy
中是通 常的逐个元素相乘的乘法,矩阵乘法应当使用numpy.dot
函数。 -
使用下式将当前样本梯度进行累加:
Δ ( l ) = Δ ( l ) + δ ( l + 1 ) ( a ( l ) ) T \Delta^{(l)} = \Delta^{(l)} + \delta^{(l+1)}(a^{(l)})^T Δ(l)=Δ(l)+δ(l+1)(a(l))T
在numpy
中,数组可以使用+=
运算。 -
计算神经网络代价函数的(未正则化的)梯度,
∂ ∂ Θ i j ( l ) J ( Θ ) = D i j ( l ) = 1 m Δ i j ( l ) \frac{\partial}{\partial \Theta_{ij}^{(l)}} J(\Theta) = D_{ij}^{(l)} = \frac{1}{m} \Delta_{ij}^{(l)} ∂Θij(l)∂J(Θ)=Dij(l)=m1Δij(l)
这里,你需要(部分)完成函数 nn_grad_function
。程序将使用函数 check_nn_gradients
来检查你的实现是否正确。在使用循环的方式完成函数 nn_grad_function
后,建议尝试使用向量化的方式重新实现这个函数。
神经网络的正则化
你正确实现了误差反传训练算法之后,应当在梯度中加入正则化项。
假设你在误差反传算法中计算了 Δ i j ( l ) \Delta_{ij}^{(l)} Δij(l) ,你需要增加的正则化项为
∂ ∂ Θ i j ( l ) J ( Θ ) = D i j ( l ) = 1 m Δ i j ( l ) for j = 0 ∂ ∂ Θ i j ( l ) J ( Θ ) = D i j ( l ) = 1 m Δ i j ( l ) + λ m Θ i j ( l ) for j ≥ 1 \frac{\partial}{\partial \Theta_{ij}^{(l)}} J(\Theta) = D_{ij}^{(l)} = \frac{1}{m} \Delta_{ij}^{(l)} \qquad \text{for } j = 0 \frac{\partial}{\partial \Theta_{ij}^{(l)}} J(\Theta) = D_{ij}^{(l)} = \frac{1}{m} \Delta_{ij}^{(l)} + \frac{\lambda}{m} \Theta_{ij}^{(l)} \qquad \text{for } j \geq 1 ∂Θij(l)∂J(Θ)=Dij(l)=m1Δij(l)for j=0∂Θij(l)∂J(Θ)=Dij(l)=m1Δij(l)+mλΘij(l)for j≥1
注意你不应该正则化 Θ ( l ) \Theta^{(l)} Θ(l) 的第一列,因其对应于偏置项。
此步练习需要你补充实现函数 nn_grad_function
。
def nn_grad_function(nn_params, *args):
"""神经网络的损失函数梯度计算 """
# 获得参数信息
input_layer_size, hidden_layer_size, num_labels, lmb, X, y = args
# 得到各个参数的权重值
Theta1 = nn_params[:hidden_layer_size*(input_layer_size + 1)]
Theta1 = Theta1.reshape((hidden_layer_size, input_layer_size + 1)) # (25,401)
Theta2 = nn_params[hidden_layer_size*(input_layer_size + 1):]
Theta2 = Theta2.reshape((num_labels, hidden_layer_size + 1)) # (10,26)
# 设置变量
m = X.shape[0]
# ====================== 你的代码 =====================
# 计算Theta1,Theta2的梯度值
# 循环已经在nn_cost_function尝试过,这里直接使用矩阵运算实现
a_1 = np.insert(X.T, 0, values=1, axis=0) # (401,5000)
z_2 = np.dot(Theta1,a_1) # (25,5000)
a_2 = np.insert(sigmoid(z_2), 0, values=1, axis=0) # (26,5000)
z_3 = np.dot(Theta2, a_2) # (10,5000)
a_3 = sigmoid(z_3) # (10,5000)
y = y.reshape(m,1)
Y = np.array([[i for i in range(1,num_labels+1)]for j in range(m)]) # (5000,10)
Y = (Y==y).astype(int).T # (10,5000)
derta_3 = a_3-Y # (10,5000)
derta_2 = np.dot(Theta2[:, 1:].T, derta_3)*sigmoid_gradient(z_2) # (26,5000) ???
Derta_1 = np.dot(derta_2, a_1.T) # (26,400)
Derta_2 = np.dot(derta_3, a_2.T) # (10,26)
regular_1 = (lmb/m) * Theta1
regular_1[:,0] = 0
regular_2 = (lmb/m) * Theta2
regular_2[:,0] = 0
Theta1_grad = Derta_1/m + regular_1
Theta2_grad = Derta_2/m + regular_2
# =====================================================
grad = np.hstack((Theta1_grad.flatten(), Theta2_grad.flatten()))
return grad
误差反传训练算法
Sigmoid
函数及其梯度
Sigmoid 函数定义为
sigmoid ( z ) = g ( z ) = 1 1 + exp ( − z ) \text{sigmoid}(z) = g(z) = \frac{1}{1+\exp(-z)} sigmoid(z)=g(z)=1+exp(−z)1
Sigmoid 函数的梯度可以按照下式进行计算
g
′
(
z
)
=
d
d
z
g
(
z
)
=
g
(
z
)
(
1
−
g
(
z
)
)
g^{\prime}(z) = \frac{d}{dz} g(z) = g(z)(1-g(z))
g′(z)=dzdg(z)=g(z)(1−g(z))
为验证你的实现是正确的,以下事实可供你参考。当
z
=
0
z=0
z=0 是,梯度的精确值为 0.25 。当
z
z
z 的值很大(可正可负)时,梯度值接近于0。
这里,你需要补充完成函数 sigmoid
与 sigmoid_gradient
。 你需要保证实现的函数的输入参数可以为矢量和矩阵( numpy.ndarray
)。
网络参数的随机初始化
训练神经网络时,使用随机数初始化网络参数非常重要。一个非常有效的随机初始化策略为,在范围 [ − ϵ i n i t , ϵ i n i t ] [ -\epsilon_{init}, \epsilon_{init} ] [−ϵinit,ϵinit] 内按照均匀分布随机选择参数 Θ ( l ) \Theta^{(l)} Θ(l) 的初始值。这里你需要设置 ϵ i n i t = 0.12 \epsilon_{init} = 0.12 ϵinit=0.12 。这个范围保证了参数较小且训练过程高效。
你需要补充实现函数 rand_initialize_weigths
。
对于一般的神经网络,如果第 l l l 层的输入单元数为 L i n L_{in} Lin ,输出单元数为 L o u t L_{out} Lout ,则 ϵ i n i t = 6 / L i n + L o u t \epsilon_{init} = {\sqrt{6}}/{\sqrt{L_{in} + L_{out}}} ϵinit=6/Lin+Lout 可以做为有效的指导策略。
def sigmoid(z):
"""Sigmoid 函数"""
return 1.0/(1.0 + np.exp(-np.asarray(z)))
def sigmoid_gradient(z):
"""计算Sigmoid 函数的梯度"""
g = np.zeros_like(z)
# ====================== 你的代码 ======================
# 计算Sigmoid 函数的梯度g的值
g = sigmoid(z)*(1-sigmoid(z))
# =======================================================
return g
def rand_initialize_weights(L_in, L_out):
""" 初始化网络层权重参数"""
# You need to return the following variables correctly
W = np.zeros((L_out, 1 + L_in))
# ====================== 你的代码 ======================
#初始化网络层的权重参数
W = np.random.uniform(-0.12, 0.12, (L_out, 1 + L_in))
# ======================================================
return W
def debug_initialize_weights(fan_out, fan_in):
"""Initalize the weights of a layer with
fan_in incoming connections and
fan_out outgoing connection using a fixed strategy."""
W = np.linspace(1, fan_out*(fan_in+1), fan_out*(fan_in+1))
W = 0.1*np.sin(W).reshape(fan_out, fan_in + 1)
return W
def compute_numerical_gradient(cost_func, theta):
"""Compute the numerical gradient of the given cost_func
at parameter theta"""
numgrad = np.zeros_like(theta)
perturb = np.zeros_like(theta)
eps = 1.0e-4
for idx in range(len(theta)):
perturb[idx] = eps
loss1 = cost_func(theta - perturb)
loss2 = cost_func(theta + perturb)
numgrad[idx] = (loss2 - loss1)/(2*eps)
perturb[idx] = 0.0
return numgrad
检查梯度
在神经网络中,需要最小化代价函数 J ( Θ ) J(\Theta) J(Θ) 。为了检查梯度计算是否正确,考虑把参数 Θ ( 1 ) \Theta^{(1)} Θ(1) 和 Θ ( 2 ) \Theta^{(2)} Θ(2) 展开为一个长的向量 θ \theta θ 。假设函数 f i ( θ ) f_i(\theta) fi(θ) 表示 ∂ ∂ θ i J ( θ ) \frac{\partial}{\partial \theta_i} J(\theta) ∂θi∂J(θ) 。
令
θ
(
i
+
)
=
θ
+
[
0
0
⋮
ϵ
⋮
0
]
θ
(
i
−
)
=
θ
−
[
0
0
⋮
ϵ
⋮
0
]
\theta^{(i+)} = \theta + \begin{bmatrix} 0 \\ 0 \\ \vdots \\ \epsilon \\ \vdots \\ 0 \end{bmatrix} \qquad \theta^{(i-)} = \theta - \begin{bmatrix} 0 \\ 0 \\ \vdots \\ \epsilon \\ \vdots \\ 0 \end{bmatrix}
θ(i+)=θ+⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡00⋮ϵ⋮0⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤θ(i−)=θ−⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡00⋮ϵ⋮0⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤
上式中,
θ
(
i
+
)
\theta^{(i+)}
θ(i+) 除了第
i
i
i 个元素增加了
ϵ
\epsilon
ϵ 之 外,其他元素均与
θ
\theta
θ 相同。类似的,
θ
(
i
−
)
\theta^{(i-)}
θ(i−) 中仅第
i
i
i 个元素减少了
ϵ
\epsilon
ϵ 。可以使用数值近似验证
f
i
(
θ
)
f_i(\theta)
fi(θ) 计算是否正确:
f
i
(
θ
)
≈
J
(
θ
(
i
+
)
)
−
J
(
θ
(
i
−
)
)
2
ϵ
f_i(\theta) \approx \frac{J(\theta^{(i+)}) - J(\theta^{(i-)})}{2\epsilon}
fi(θ)≈2ϵJ(θ(i+))−J(θ(i−))
如果设
ϵ
=
1
0
−
4
\epsilon=10^{-4}
ϵ=10−4 ,通常上式左右两端的差异出现于第4位有效数字之后(经常会有更高的精度)。
在练习的程序代码中,函数 compute_numerical_gradient
已经实现,建议你认真阅读该函数并理解其实现原理与方案。
之后,程序将执行 check_nn_gradients
函数。该函数将创建一个较小的神经网络用于检测你的误差反传训练算法所计算得到的梯度是否正确。如果你的实现是正确的,你得到的 梯度与数值梯度之后的绝对误差(各分量的绝对值差之和)应当小于
1
0
−
9
10^{-9}
10−9 。
def check_nn_gradients(lmb=0.0):
"""Creates a small neural network to check the backgropagation
gradients."""
input_layer_size, hidden_layer_size = 3, 5
num_labels, m = 3, 5
Theta1 = debug_initialize_weights(hidden_layer_size, input_layer_size)
Theta2 = debug_initialize_weights(num_labels, hidden_layer_size)
X = debug_initialize_weights(m, input_layer_size - 1)
y = np.array([1 + (t % num_labels) for t in range(m)])
nn_params = np.hstack((Theta1.flatten(), Theta2.flatten()))
cost_func = lambda x: nn_cost_function(x,
input_layer_size,
hidden_layer_size,
num_labels, lmb, X, y)
grad = nn_grad_function(nn_params,
input_layer_size, hidden_layer_size,
num_labels, lmb, X, y)
numgrad = compute_numerical_gradient(cost_func, nn_params)
print(np.vstack((numgrad, grad)).T, np.sum(np.abs(numgrad - grad)))
print('The above two columns you get should be very similar.')
print('(Left-Your Numerical Gradient, Right-Analytical Gradient)')
def predict(Theta1, Theta2, X):
"""模型预测"""
m = X.shape[0]
# num_labels = Theta2.shape[0]
p = np.zeros((m,1), dtype=int)
# ====================== 你的代码============================
# 神经网络模型预测
a_2 = sigmoid(np.dot(Theta1,np.insert(X.T, 0, values=1, axis=0))) # (25,5000) # 需要补一行偏置
h2 = sigmoid(np.dot(Theta2,np.insert(a_2, 0, values=1, axis=0))).T # (5000,10)
# ============================================================
# print(h1.shape, h2.shape)
p = np.argmax(h2, axis=1) + 1.0
return p
# Parameters
input_layer_size = 400 # 20x20 大小的输入图像,图像内容为手写数字
hidden_layer_size = 25 # 25 hidden units
num_labels = 10 # 10 类标号 从1到10
加载数据集
# =========== 第一部分 ===============
# 加载训练数据
print("Loading and Visualizing Data...")
data = sio.loadmat('data/data6559/NN_data.mat')
X, y = data['X'], data['y']
m = X.shape[0]
# 随机选取100个数据显示
rand_indices = np.array(range(m))
np.random.shuffle(rand_indices)
X_sel = X[rand_indices[:100]]
display_data(X_sel)
Loading and Visualizing Data...
加载神经网络模型的权重
# =========== 第二部分 ===============
print('Loading Saved Neural Network Parameters ...')
# Load the weights into variables Theta1 and Theta2
data = sio.loadmat('data/data6559/NN_weights.mat')
Theta1, Theta2 = data['Theta1'], data['Theta2']
print(Theta1.shape) #(hidden_layer_size, input_layer_size + 1)
print(Theta2.shape) #(num_labels, hidden_layer_size + 1)
Loading Saved Neural Network Parameters ...
(25, 401)
(10, 26)
# ================ Part 3: Compute Cost (Feedforward) ================
# To the neural network, you should first start by implementing the
# feedforward part of the neural network that returns the cost only. You
# should complete the code in nnCostFunction.m to return cost. After
# implementing the feedforward to compute the cost, you can verify that
# your implementation is correct by verifying that you get the same cost
# as us for the fixed debugging parameters.
#
# We suggest implementing the feedforward cost *without* regularization
# first so that it will be easier for you to debug. Later, in part 4, you
# will get to implement the regularized cost.
print('Feedforward Using Neural Network ...')
# Weight regularization parameter (we set this to 0 here).
lmb = 0.0
nn_params = np.hstack((Theta1.flatten(), Theta2.flatten()))
J = nn_cost_function(nn_params,
input_layer_size, hidden_layer_size,
num_labels, lmb, X, y)
print('Cost at parameters (loaded from PRML_NN_weights): %f ' % J)
print('(this value should be about 0.287629)')
Feedforward Using Neural Network ...
Cost at parameters (loaded from PRML_NN_weights): 0.287629
(this value should be about 0.287629)
# =============== Part 4: Implement Regularization ===============
print('Checking Cost Function (w/ Regularization) ... ')
lmb = 1.0
J = nn_cost_function(nn_params,
input_layer_size, hidden_layer_size,
num_labels, lmb, X, y)
print('Cost at parameters (loaded from PRML_NN_weights): %f ' % J)
print('(this value s1·hould be about 0.383770)')
Checking Cost Function (w/ Regularization) ...
Cost at parameters (loaded from PRML_NN_weights): 0.383770
(this value s1·hould be about 0.383770)
# ================ Part 5: Sigmoid Gradient ================
print('Evaluating sigmoid gradient...')
g = sigmoid_gradient([1, -0.5, 0, 0.5, 1])
print('Sigmoid gradient evaluated at [1 -0.5 0 0.5 1]: ', g)
Evaluating sigmoid gradient...
Sigmoid gradient evaluated at [1 -0.5 0 0.5 1]: [0.19661193 0.23500371 0.25 0.23500371 0.19661193]
神经网络参数初始化
# ================ Part 6: Initializing Pameters ================
print('Initializing Neural Network Parameters ...')
initial_Theta1 = rand_initialize_weights(input_layer_size, hidden_layer_size)
initial_Theta2 = rand_initialize_weights(hidden_layer_size, num_labels)
# Unroll parameters
initial_nn_params = np.hstack((initial_Theta1.flatten(),
initial_Theta2.flatten()))
Initializing Neural Network Parameters ...
# =============== Part 7: Implement Backpropagation ===============
print('Checking Backpropagation... ')
# Check gradients by running checkNNGradients
check_nn_gradients()
Checking Backpropagation...
[[ 1.27220311e-02 1.27220311e-02]
[ 1.58832807e-04 1.58832809e-04]
[ 2.17690455e-04 2.17690455e-04]
[ 7.64045005e-05 7.64045009e-05]
[ 6.46352264e-03 6.46352265e-03]
[ 2.34983744e-05 2.34983735e-05]
[-3.74199094e-05 -3.74199098e-05]
[-6.39344999e-05 -6.39345006e-05]
[-5.74199923e-03 -5.74199923e-03]
[-1.34052023e-04 -1.34052019e-04]
[-2.59146271e-04 -2.59146269e-04]
[-1.45982635e-04 -1.45982634e-04]
[-1.26792390e-02 -1.26792390e-02]
[-1.67913188e-04 -1.67913187e-04]
[-2.41809015e-04 -2.41809017e-04]
[-9.33867494e-05 -9.33867522e-05]
[-7.94573534e-03 -7.94573535e-03]
[-4.76254503e-05 -4.76254501e-05]
[-2.64923639e-06 -2.64923844e-06]
[ 4.47626713e-05 4.47626708e-05]
[ 1.09347722e-01 1.09347722e-01]
[ 5.67965185e-02 5.67965185e-02]
[ 5.25298306e-02 5.25298306e-02]
[ 5.53542907e-02 5.53542907e-02]
[ 5.59290833e-02 5.59290833e-02]
[ 5.23534682e-02 5.23534682e-02]
[ 1.08133003e-01 1.08133003e-01]
[ 5.67319602e-02 5.67319602e-02]
[ 5.14442931e-02 5.14442931e-02]
[ 5.48296085e-02 5.48296085e-02]
[ 5.56926532e-02 5.56926532e-02]
[ 5.11795651e-02 5.11795651e-02]
[ 3.06270372e-01 3.06270372e-01]
[ 1.59463135e-01 1.59463135e-01]
[ 1.45570264e-01 1.45570264e-01]
[ 1.56700533e-01 1.56700533e-01]
[ 1.56043968e-01 1.56043968e-01]
[ 1.45771544e-01 1.45771544e-01]] 9.987739364044175e-11
The above two columns you get should be very similar.
(Left-Your Numerical Gradient, Right-Analytical Gradient)
# =============== Part 8: Implement Regularization ===============
print('Checking Backpropagation (w/ Regularization) ... ')
# Check gradients by running checkNNGradients
lmb = 3.0
check_nn_gradients(lmb)
Checking Backpropagation (w/ Regularization) ...
[[ 0.01272203 0.01272203]
[ 0.05471668 0.05471668]
[ 0.00868489 0.00868489]
[-0.04533175 -0.04533175]
[ 0.00646352 0.00646352]
[-0.01674143 -0.01674143]
[ 0.03938178 0.03938178]
[ 0.05929756 0.05929756]
[-0.005742 -0.005742 ]
[-0.03277532 -0.03277532]
[-0.06025856 -0.06025856]
[-0.03234036 -0.03234036]
[-0.01267924 -0.01267924]
[ 0.05926853 0.05926853]
[ 0.03877546 0.03877546]
[-0.01736759 -0.01736759]
[-0.00794574 -0.00794574]
[-0.04510686 -0.04510686]
[ 0.00898998 0.00898998]
[ 0.05482148 0.05482148]
[ 0.10934772 0.10934772]
[ 0.11135436 0.11135436]
[ 0.06099703 0.06099703]
[ 0.00994614 0.00994614]
[-0.00160637 -0.00160637]
[ 0.03558854 0.03558854]
[ 0.108133 0.108133 ]
[ 0.11609346 0.11609346]
[ 0.0761714 0.0761714 ]
[ 0.02218834 0.02218834]
[-0.00430676 -0.00430676]
[ 0.01898519 0.01898519]
[ 0.30627037 0.30627037]
[ 0.21889958 0.21889958]
[ 0.18458753 0.18458753]
[ 0.13942633 0.13942633]
[ 0.09836012 0.09836012]
[ 0.10071231 0.10071231]] 1.0709039384437791e-10
The above two columns you get should be very similar.
(Left-Your Numerical Gradient, Right-Analytical Gradient)
训练神经网络
# =================== Part 8: Training NN ===================
print('Training Neural Network...')
lmb, maxiter = 1.0, 200
args = (input_layer_size, hidden_layer_size, num_labels, lmb, X, y)
nn_params, cost_min, _, _, _ = fmin_cg(nn_cost_function,
initial_nn_params,
fprime=nn_grad_function,
args=args,
maxiter=maxiter,
full_output=True)
Theta1 = nn_params[:hidden_layer_size*(input_layer_size + 1)]
Theta1 = Theta1.reshape((hidden_layer_size, input_layer_size + 1))
Theta2 = nn_params[hidden_layer_size*(input_layer_size + 1):]
Theta2 = Theta2.reshape((num_labels, hidden_layer_size + 1))
Training Neural Network...
Warning: Maximum number of iterations has been exceeded.
Current function value: 0.326676
Iterations: 200
Function evaluations: 456
Gradient evaluations: 456
模型预测
# ================= Part 9: Implement Predict =================
pred = predict(Theta1, Theta2, X)
# print(pred.shape, y.shape)
# print(np.hstack((pred, y)))
print('Training Set Accuracy:', np.mean(pred == y[:, 0])*100.0)
Training Set Accuracy: 99.5