简单神经网络练习
本文是对神经网络的练习和应用,本文利用神经网络进行图像识别和分类,一些包含0和1的手写体的识别和分类
1.导入
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import matplotlib.pyplot as plt
from autils import *
%matplotlib inline
import logging
logging.getLogger("tensorflow").setLevel(logging.ERROR)
tf.autograph.set_verbosity(0)
2.神经网络
2.1 数据集载入
-
下面显示的
load_data()
函数将数据加载到变量X
和y中
-
数据集包含1000个手写数字 1 ^1 1的训练示例,此处限制为零和一。
-
每个训练示例都是一个20像素x 20像素的数字灰度图像。
-
每个像素由一个浮点数表示,表示该位置的灰度强度。
-
20 x 20像素网格“展开”为400维矢量。
-
每个训练示例都成为数据矩阵“X”中的一行。
-
这给了我们一个1000 x 400矩阵“x”,其中每一行都是手写数字图像的训练示例。
X = ( − − − ( x ( 1 ) ) − − − − − − ( x ( 2 ) ) − − − ⋮ − − − ( x ( m ) ) − − − ) X = \left(\begin{array}{cc} --- (x^{(1)}) --- \\ --- (x^{(2)}) --- \\ \vdots \\ --- (x^{(m)}) --- \end{array}\right) X=⎝ ⎛−−−(x(1))−−−−−−(x(2))−−−⋮−−−(x(m))−−−⎠ ⎞ -
训练集的第二部分是一个1000 x 1维向量“y”,其中包含训练集的标签
-
y=0
如果图像为数字0
,y=1
如果图像是数字`1’。
# load dataset
X, y = load_data()
2.2 数据集数据可视化
检查数据
print ('The first element of X is: ', X[0])
Output exceeds the size limit. Open the full output data in a text editor
The first element of X is: [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 8.56059680e-06
1.94035948e-06 -7.37438725e-04 -8.13403799e-03 -1.86104473e-02
-1.87412865e-02 -1.87572508e-02 -1.90963542e-02 -1.64039011e-02
-3.78191381e-03 3.30347316e-04 1.27655229e-05 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 1.16421569e-04 1.20052179e-04
-1.40444581e-02 -2.84542484e-02 8.03826593e-02 2.66540339e-01
2.73853746e-01 2.78729541e-01 2.74293607e-01 2.24676403e-01
2.77562977e-02 -7.06315478e-03 2.34715414e-04 0.00000000e+00
...
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
print ('The first element of y is: ', y[0,0])
print ('The last element of y is: ', y[-1,0])
The first element of y is: 0
The last element of y is: 1
检查数据维度
print ('The shape of X is: ' + str(X.shape))
print ('The shape of y is: ' + str(y.shape))
The shape of X is: (1000, 400)
The shape of y is: (1000, 1)
可视化数据
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cell
m, n = X.shape
fig, axes = plt.subplots(8,8, figsize=(8,8))
fig.tight_layout(pad=0.1)
for i,ax in enumerate(axes.flat):
# Select random indices
random_index = np.random.randint(m)
# Select rows corresponding to the random indices and
# reshape the image
X_random_reshaped = X[random_index].reshape((20,20)).T
# Display the image
ax.imshow(X_random_reshaped, cmap='gray')
# Display the label above the image
ax.set_title(y[random_index,0])
ax.set_axis_off()
2.3 模型表示法
您将在本次作业中使用的神经网络如下图所示。
- 这有三层dense的sigmoid激活函数。
- 我们的输入是数字图像的像素值。
- 由于图像的大小为
20
×
20
20\times20
20×20,因此我们可以输入
400
400
400
- 这些参数的尺寸大小适用于一个神经网络,第一层为
25
25
25单元,第二层为
15
15
15单元,而第三层为
1
1
1输出单元。
- 回想一下,这些参数的尺寸确定如下:
- 如果网络在一个层中有
s
i
n
s_{in}
sin个单元,在下一层中有
s
o
u
t
s_{out}
sout个单位,那么
- W W W的维度为 s i n × s o u t s_{in}\times s_{out} sin×sout。
- b b b将是一个包含 s o u t s_{out} sout元素的向量
- 如果网络在一个层中有
s
i
n
s_{in}
sin个单元,在下一层中有
s
o
u
t
s_{out}
sout个单位,那么
- 因此,
W
和b
的形状为- layer1:
W1
的形状为(400,25),b1
的形式为(25,) - layer2:
W2
的形状为(25,15),b2
的形状是:(15,) - layer3:
W3
的形状为(15,1),b3
的形式为:(1,)
- layer1:
- 回想一下,这些参数的尺寸确定如下:
注:偏移向量
b
可以表示为1-D(n,)或2-D(n、1)数组。Tensorflow使用一维表示,本文将保持这种惯例。
2.4 模型的Tensorflow实现
# UNQ_C1
# GRADED CELL: Sequential model
model = Sequential(
[
tf.keras.Input(shape=(400,)), #specify input size
### START CODE HERE ###
Dense(25, activation='sigmoid', name = 'layer1'),
Dense(15, activation='sigmoid', name = 'layer2'),
Dense(1, activation='sigmoid', name = 'layer3'),
### END CODE HERE ###
], name = "my_model"
)
model.summary()
Model: "my_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
layer1 (Dense) (None, 25) 10025
layer2 (Dense) (None, 15) 390
layer3 (Dense) (None, 1) 16
=================================================================
Total params: 10,431
Trainable params: 10,431
Non-trainable params: 0
_________________________________________________________________
右边的三个参数来自于:
L1_num_params = 400 * 25 + 25 # W1 parameters + b1 parameters
L2_num_params = 25 * 15 + 15 # W2 parameters + b2 parameters
L3_num_params = 15 * 1 + 1 # W3 parameters + b3 parameters
print("L1 params = ", L1_num_params, ", L2 params = ", L2_num_params, ", L3 params = ", L3_num_params )
L1 params = 10025 , L2 params = 390 , L3 params = 16
进一步验证Tensorflow产生的权重与计算所得的权重相同:
[layer1, layer2, layer3] = model.layers
#### Examine Weights shapes
W1,b1 = layer1.get_weights()
W2,b2 = layer2.get_weights()
W3,b3 = layer3.get_weights()
print(f"W1 shape = {W1.shape}, b1 shape = {b1.shape}")
print(f"W2 shape = {W2.shape}, b2 shape = {b2.shape}")
print(f"W3 shape = {W3.shape}, b3 shape = {b3.shape}")
W1 shape = (400, 25), b1 shape = (25,)
W2 shape = (25, 15), b2 shape = (15,)
W3 shape = (15, 1), b3 shape = (1,)
xx.get_weights
返回NumPy数组。还可以直接以Tensor形式访问权重。请注意最后一层中Tensor的形状。
print(model.layers[2].weights)
[<tf.Variable 'layer3/kernel:0' shape=(15, 1) dtype=float32, numpy=
array([[ 2.0668435e-01],
[ 3.0981123e-02],
[ 1.5515453e-01],
[-4.5015967e-01],
[ 1.1071807e-01],
[ 5.0223887e-02],
[-3.4112835e-01],
[ 3.1129056e-01],
[ 3.3140182e-04],
[ 7.3278010e-02],
[-3.6888242e-01],
[ 2.8538823e-02],
[-1.9153926e-01],
[ 5.5546862e-01],
[ 2.4924773e-01]], dtype=float32)>, <tf.Variable 'layer3/bias:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>]
模型编译和拟合
model.compile(
loss=tf.keras.losses.BinaryCrossentropy(),
optimizer=tf.keras.optimizers.Adam(0.001),
)
model.fit(
X,y,
epochs=20
)
看看预测的效果
prediction = model.predict(X[0].reshape(1,400)) # a zero
print(f" predicting a zero: {prediction}")
prediction = model.predict(X[500].reshape(1,400)) # a one
print(f" predicting a one: {prediction}")
模型的输出被解释为概率。在上面的第一个示例中,输入为零。该模型预测输入为1的概率几乎为零。
在第二个示例中,输入是一个1。该模型预测输入为1的概率接近1。
与逻辑回归的情况一样,将概率与阈值进行比较,以做出最终预测。
if prediction >= 0.5:
yhat = 1
else:
yhat = 0
print(f"prediction after threshold: {yhat}")
对64个样本的模型预测结果和实际结果进行对比
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cell
m, n = X.shape
fig, axes = plt.subplots(8,8, figsize=(8,8))
fig.tight_layout(pad=0.1,rect=[0, 0.03, 1, 0.92]) #[left, bottom, right, top]
for i,ax in enumerate(axes.flat):
# Select random indices
random_index = np.random.randint(m)
# Select rows corresponding to the random indices and
# reshape the image
X_random_reshaped = X[random_index].reshape((20,20)).T
# Display the image
ax.imshow(X_random_reshaped, cmap='gray')
# Predict using the Neural Network
prediction = model.predict(X[random_index].reshape(1,400))
if prediction >= 0.5:
yhat = 1
else:
yhat = 0
# Display the label above the image
ax.set_title(f"{y[random_index,0]},{yhat}")
ax.set_axis_off()
fig.suptitle("Label, yhat", fontsize=16)
plt.show()
3.numpy实现模型
3.1 numpy实现模型函数
这里主要是为了熟悉底层的实现,比如dense函数
使用for循环访问层中的每个单元(j
),并执行该单元权重的点积(W[:,j]
),然后求出单元(b[j]
)的偏差之和,形成z
。然后将激活函数“g(z)”应用于该结果。
练习:
# UNQ_C2
# GRADED FUNCTION: my_dense
def my_dense(a_in, W, b, g):
"""
Computes dense layer
Args:
a_in (ndarray (n, )) : Data, 1 example
W (ndarray (n,j)) : Weight matrix, n features per unit, j units
b (ndarray (j, )) : bias vector, j units
g activation function (e.g. sigmoid, relu..)
Returns
a_out (ndarray (j,)) : j units
"""
units = W.shape[1]
a_out = np.zeros(units)
### START CODE HERE ###
for j in range(units):
w = W[:, j] # 注意这里是小w,小w的数据类型是(ndarray (j, ))
z = np.dot(a_in,w) + b[j]
a_out[j] = g(z)
### END CODE HERE ###
return(a_out)
自定义sequential函数:
def my_sequential(x, W1, b1, W2, b2, W3, b3):
a1 = my_dense(x, W1, b1, sigmoid)
a2 = my_dense(a1, W2, b2, sigmoid)
a3 = my_dense(a2, W3, b3, sigmoid)
return(a3)
尝试使用自定义函数来预测:
W1_tmp,b1_tmp = layer1.get_weights()
W2_tmp,b2_tmp = layer2.get_weights()
W3_tmp,b3_tmp = layer3.get_weights()
# make predictions
prediction = my_sequential(X[0], W1_tmp, b1_tmp, W2_tmp, b2_tmp, W3_tmp, b3_tmp )
if prediction >= 0.5:
yhat = 1
else:
yhat = 0
print( "yhat = ", yhat, " label= ", y[0,0])
prediction = my_sequential(X[500], W1_tmp, b1_tmp, W2_tmp, b2_tmp, W3_tmp, b3_tmp )
if prediction >= 0.5:
yhat = 1
else:
yhat = 0
print( "yhat = ", yhat, " label= ", y[500,0])
把自定义函数和TensorFlow提供的以及实际值进行对比
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cell
m, n = X.shape
fig, axes = plt.subplots(8,8, figsize=(8,8))
fig.tight_layout(pad=0.1,rect=[0, 0.03, 1, 0.92]) #[left, bottom, right, top]
for i,ax in enumerate(axes.flat):
# Select random indices
random_index = np.random.randint(m)
# Select rows corresponding to the random indices and
# reshape the image
X_random_reshaped = X[random_index].reshape((20,20)).T
# Display the image
ax.imshow(X_random_reshaped, cmap='gray')
# Predict using the Neural Network implemented in Numpy
my_prediction = my_sequential(X[random_index], W1_tmp, b1_tmp, W2_tmp, b2_tmp, W3_tmp, b3_tmp )
my_yhat = int(my_prediction >= 0.5)
# Predict using the Neural Network implemented in Tensorflow
tf_prediction = model.predict(X[random_index].reshape(1,400))
tf_yhat = int(tf_prediction >= 0.5)
# Display the label above the image
ax.set_title(f"{y[random_index,0]},{tf_yhat},{my_yhat}")
ax.set_axis_off()
fig.suptitle("Label, yhat Tensorflow, yhat Numpy", fontsize=16)
plt.show()
3.2 向量化的numpy实现模型
我们可以使用上面的示例X
和W1
、b1
参数来演示这一点。我们使用np.matmul
执行矩阵乘法。注意,如上图所示,x和W的尺寸必须兼容。
x = X[0].reshape(-1,1) # column vector (400,1)
z1 = np.matmul(x.T,W1) + b1 # (1,400)(400,25) = (1,25)
a1 = sigmoid(z1)
print(a1.shape)
dense函数的向量化实现
# UNQ_C3
# GRADED FUNCTION: my_dense_v
def my_dense_v(A_in, W, b, g):
"""
Computes dense layer
Args:
A_in (ndarray (m,n)) : Data, m examples, n features each
W (ndarray (n,j)) : Weight matrix, n features per unit, j units
b (ndarray (j,1)) : bias vector, j units
g activation function (e.g. sigmoid, relu..)
Returns
A_out (ndarray (m,j)) : m examples, j units
"""
### START CODE HERE ###
Z = np.matmul(A_in,W) + b
A_out = g(Z)
### END CODE HERE ###
return(A_out)
sequential函数
def my_sequential_v(X, W1, b1, W2, b2, W3, b3):
A1 = my_dense_v(X, W1, b1, sigmoid)
A2 = my_dense_v(A1, W2, b2, sigmoid)
A3 = my_dense_v(A2, W3, b3, sigmoid)
return(A3)
获取weights
W1_tmp,b1_tmp = layer1.get_weights()
W2_tmp,b2_tmp = layer2.get_weights()
W3_tmp,b3_tmp = layer3.get_weights()
使用新模型预测
Prediction = my_sequential_v(X, W1_tmp, b1_tmp, W2_tmp, b2_tmp, W3_tmp, b3_tmp )
Prediction.shape
TensorShape([1000, 1])
阈值归类
Yhat = (Prediction >= 0.5).numpy().astype(int)
print("predict a zero: ",Yhat[0], "predict a one: ", Yhat[500])
将预测可视化
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cell
m, n = X.shape
fig, axes = plt.subplots(8, 8, figsize=(8, 8))
fig.tight_layout(pad=0.1, rect=[0, 0.03, 1, 0.92]) #[left, bottom, right, top]
for i, ax in enumerate(axes.flat):
# Select random indices
random_index = np.random.randint(m)
# Select rows corresponding to the random indices and
# reshape the image
X_random_reshaped = X[random_index].reshape((20, 20)).T
# Display the image
ax.imshow(X_random_reshaped, cmap='gray')
# Display the label above the image
ax.set_title(f"{y[random_index,0]}, {Yhat[random_index, 0]}")
ax.set_axis_off()
fig.suptitle("Label, Yhat", fontsize=16)
plt.show()
虽然但是,有一些不伦不类的还是有误差的,注意numpy的where函数的使用,里面能填写条件表达式
fig = plt.figure(figsize=(1, 1))
errors = np.where(y != Yhat)
random_index = errors[0][0]
X_random_reshaped = X[random_index].reshape((20, 20)).T
plt.imshow(X_random_reshaped, cmap='gray')
plt.title(f"{y[random_index,0]}, {Yhat[random_index, 0]}")
plt.axis('off')
plt.show()
3.3 numpy的broadcasting(扩展)
NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when
- they are equal, or
- one of them is 1
具体文档
如果不满足这些条件,则会引发ValueError:操作数无法一起扩展,表明数组具有不兼容的形状。结果数组的大小不是输入轴上的1。
上面一堆有点难理解,来看看例子
扩展图示
扩展前
扩展后:
如下列例子,当一个矩阵加一个数时,矩阵所有元素都加上这个数得到结果
a = np.array([1,2,3]).reshape(-1,1) #(3,1)
b = 5
print(f"(a + b).shape: {(a + b).shape}, \na + b = \n{a + b}")
例子2:
a = np.array([1,2,3,4]).reshape(-1,1)
b = np.array([1,2,3]).reshape(1,-1)
print(a)
print(b)
print(f"(a + b).shape: {(a + b).shape}, \na + b = \n{a + b}")