C2_W1_Assignment吴恩达_中英_Pytorch

salvation～

已于 2024-02-27 00:47:34 修改

阅读量1.6k

点赞数 54

分类专栏：机器学习_吴恩达文章标签： pytorch 人工智能 python tensorflow keras 深度学习机器学习

于 2024-02-24 02:05:24 首次发布

本文链接：https://blog.csdn.net/2303_79620604/article/details/136266088

版权

机器学习_吴恩达专栏收录该内容

11 篇文章 1 订阅

订阅专栏

Neural Networks for Handwritten Digit Recognition, Binary(用于手写数字识别的神经网络, 二分类)

In this exercise, you will use a neural network to recognize the hand-written digits zero and one.
在本次练习中，您将使用神经网络来识别手写数字零和一。

1 - Packages

First, let’s run the cell below to import all the packages that you will need during this assignment.
首先，运行下面的单元格以导入本练习中所需的全部包。

numpy is the fundamental package for scientific computing with Python.
matplotlib is a popular library to plot graphs in Python.
tensorflow a popular platform for machine learning.

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import matplotlib.pyplot as plt
from autils import *
%matplotlib inline

import logging
logging.getLogger("tensorflow").setLevel(logging.ERROR)
tf.autograph.set_verbosity(0)

Tensorflow and Keras
Tensorflow is a machine learning package developed by Google. In 2019, Google integrated Keras into Tensorflow and released Tensorflow 2.0. Keras is a framework developed independently by François Chollet that creates a simple, layer-centric interface to Tensorflow. This course will be using the Keras interface.
Tensorflow是谷歌开发的一种机器学习包。2019年，谷歌将Keras集成到Tensorflow中，并发布了Tensorflow 2.0。Keras是由François Chollet独立开发的框架，它创建了一个简单、基于层的界面，以Tensorflow为目标。本课程将使用Keras接口。

2 - Neural Networks

In Course 1, you implemented logistic regression. This was extended to handle non-linear boundaries using polynomial regression. For even more complex scenarios such as image recognition, neural networks are preferred.
在课程1中，您实现了逻辑回归。这被扩展到使用多项式回归处理非线性边界。对于更复杂的场景，如图像识别，首选神经网络。

2.1 Problem Statement

In this exercise, you will use a neural network to recognize two handwritten digits, zero and one. This is a binary classification task. Automated handwritten digit recognition is widely used today - from recognizing zip codes (postal codes) on mail envelopes to recognizing amounts written on bank checks. You will extend this network to recognize all 10 digits (0-9) in a future assignment.
在本练习中，您将使用神经网络来识别两个手写数字，零和一。这是一个二进制分类任务。自动手写数字识别广泛应用于当今世界——从识别信封上的邮政编码到识别支票上的金额。您将在未来的作业中扩展此网络以识别所有10个数字（0-9）。

This exercise will show you how the methods you have learned can be used for this classification task.
练习将向您展示您已经学习的方法如何用于这个分类任务。

2.2 Dataset

You will start by loading the dataset for this task.
你将从加载这个任务的dataset开始。

The load_data() function shown below loads the data into variables X and y （load_data()函数将数据加载到变量X和y中）
The data set contains 1000 training examples of handwritten digits $^1$ , here limited to zero and one.（数据集包含1000个手写数字 $^1$ 的训练样例，这里限制为0和1）
- Each training example is a 20-pixel x 20-pixel grayscale image of the digit.（每个训练样例是数字的20像素x 20像素灰度图像。）
  - Each pixel is represented by a floating-point number indicating the grayscale intensity at that location.（每个像素用一个浮点数表示，表示该位置的灰度强度）
  - The 20 by 20 grid of pixels is “unrolled” into a 400-dimensional vector.（20 × 20的像素网格被“展开”成一个400维向量。）
  - Each training example becomes a single row in our data matrix X.（每个训练样例都成为数据矩阵X中的一行）
  - This gives us a 1000 x 400 matrix X where every row is a training example of a handwritten digit image.（这给了我们一个1000 x 400矩阵x，其中每一行都是一个手写数字图像的训练示例。）

$\left(\begin{array}{cc} --- (x^{(1)}) --- \\ --- (x^{(2)}) --- \\ \vdots \\ --- (x^{(m)}) --- \end{array}\right)$

The second part of the training set is a 1000 x 1 dimensional vector y that contains labels for the training set（训练集的第二部分是一个1000 x 1维向量“y”，其中包含训练集的标签）
- y = 0 if the image is of the digit 0, y = 1 if the image is of the digit 1.（如果图像为数字“0”，则’ y = 0 ‘;如果图像为数字“1”，则’ y = 1 '。）

$^1$ _{This is a subset of the MNIST handwritten digit dataset（这是MNIST手写数字数据集的一个子集） (http://yann.lecun.com/exdb/mnist/)}

# load dataset
X, y = load_data()

2.2.1 View the variables

Let’s get more familiar with your dataset.
让我们更加熟悉你的数据集

A good place to start is to print out each variable and see what it contains.
- 一个好的起点是打印出每个变量并看看它包含什么。

The code below prints elements of the variables X and y.
下面的代码打印出变量X和y的元素。

print ('The first element of X is: ', X[0])

The first element of X is:  [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  ...
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00]

print ('The first element of y is: ', y[0,0])
print ('The last element of y is: ', y[-1,0])

The first element of y is:  0
The last element of y is:  1

2.2.2 Check the dimensions of your variables（查看变量的大小）

Another way to get familiar with your data is to view its dimensions. Please print the shape of X and y and see how many training examples you have in your dataset.
另一个熟悉你的数据集的方法是查看它的维度。请打印出X和y的形状，并查看你数据集中有多少个训练示例。

print ('The shape of X is: ' + str(X.shape))
print ('The shape of y is: ' + str(y.shape))

The shape of X is: (1000, 400)
The shape of y is: (1000, 1)

2.2.3 Visualizing the Data

You will begin by visualizing a subset of the training set.
你将从可视化一个训练集的子集开始

In the cell below, the code randomly selects 64 rows from X, maps each row back to a 20 pixel by 20 pixel grayscale image and displays the images together.
- 下面的代码从X中随机选择64行，将每一行映射回20像素x20像素的灰度图像，并一起显示图像。
The label for each image is displayed above the image
- 每个图像的标签显示在图像的上方

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cell

m, n = X.shape

fig, axes = plt.subplots(8,8, figsize=(8,8))
fig.tight_layout(pad=0.1)

for i,ax in enumerate(axes.flat):
    # Select random indices
    random_index = np.random.randint(m)
    
    # Select rows corresponding to the random indices and
    # reshape the image
    X_random_reshaped = X[random_index].reshape((20,20)).T
    
    # Display the image
    ax.imshow(X_random_reshaped, cmap='gray')
    
    # Display the label above the image
    ax.set_title(y[random_index,0])
    ax.set_axis_off()

显示错误

2.3 Model representation

The neural network you will use in this assignment is shown in the figure below.
你将在这个作业中使用的神经网络如下图所示。

This has three dense layers with sigmoid activations.（他有3个sigmoid激活函数的全连接层）
- Recall that our inputs are pixel values of digit images.（回想一下，我们的输入是数字图像的像素值）
- Since the images are of size $20\times20$ , this gives us $400$ inputs（由于图像的大小为 $20 \times 20$ ，因此我们得到 $400$ 的输入）

显示错误

The parameters have dimensions that are sized for a neural network with $25$ units in layer 1, $15$ units in layer 2 and $1$ output unit in layer 3.（这些参数的维度是神经网络的大小，第一层为 $25$ 单位，第二层为 $15$ 单位，第三层为 $1$ 输出单位。）
- Recall that the dimensions of these parameters are determined as follows:（回想一下，这些参数的维度是这样确定的）
  - If network has $s_{in}$ units in a layer and $s_{out}$ units in the next layer, then（如果网络在一层有 $s_{in}$ 单元，在下一层有 $s_{out}$ 单元，则）
    - $W$ will be of dimension $s_{in} \times s_{out}$ .（ $W$ 将是维度 $s_{in} \times s_{out}$ 。）
    - $b$ will a vector with $s_{out}$ elements（ $b$ 表示一个包含 $s_{out}$ 元素的向量）
- Therefore, the shapes of W, and b, are
  - layer1: The shape of W1 is (400, 25) and the shape of b1 is (25,)
  - layer2: The shape of W2 is (25, 15) and the shape of b2 is: (15,)
  - layer3: The shape of W3 is (15, 1) and the shape of b3 is: (1,)

Note: The bias vector b could be represented as a 1-D (n,) or 2-D (n,1) array. Tensorflow utilizes a 1-D representation and this lab will maintain that convention.
注意: 偏置向量b可以表示为1- d (n，)或2-D (n,1)数组。Tensorflow使用1-D表示，本实验室将保持这一惯例

2.4 Tensorflow Model Implementation

Tensorflow models are built layer by layer. A layer’s input dimensions ( $s_{in}$ above) are calculated for you. You specify a layer’s output dimensions and this determines the next layer’s input dimension. The input dimension of the first layer is derived from the size of the input data specified in the model.fit statment below.
Tensorflow模型是按层构建的。层的输入维度( $s_{in}$ 以上)是为你计算的。你指定一个层的输出维度，这决定了下一层的输入维度。第一层的输入维度由在下面的model.fit语句中指定的输入数据的大小决定。

Note: It is also possible to add an input layer that specifies the input dimension of the first layer. For example:
tf.keras.Input(shape=(400,)), #specify input shape
We will include that here to illuminate some model sizing.
注意: 也可以添加一个输入层，该层指定第一层的输入维度。例如:
tf.keras.Input(shape=(400,)), #指定输入形状
我们会在这里包括它，以 illuminate一些模型大小。

Exercise 1

Below, using Keras Sequential model and Dense Layer with a sigmoid activation to construct the network described above.
在下面，使用Keras Sequential model和Dense Layer来构建上面描述的网络。

# UNQ_C1
# GRADED CELL: Sequential model

model = Sequential(
    [               
        tf.keras.Input(shape=(400,)),    #specify input size
        ### START CODE HERE ### 
        Dense(25, activation='sigmoid'),
        Dense(15, activation='sigmoid'),
        Dense(1,activation= 'sigmoid')
        ### END CODE HERE ### 
    ], name = "my_model" 
)

model.summary()

Model: "my_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 25)                10025     
_________________________________________________________________
dense_1 (Dense)              (None, 15)                390       
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 16        
=================================================================
Total params: 10,431
Trainable params: 10,431
Non-trainable params: 0
_________________________________________________________________

Expected Output (Click to Expand)

The model.summary() function displays a useful summary of the model. Because we have specified an input layer size, the shape of the weight and bias arrays are determined and the total number of parameters per layer can be shown. Note, the names of the layers may vary as they are auto-generated.

model.summary() 函数显示模型的有用摘要。因为我们已经指定了输入层的大小，权重和偏置阵列的形状就确定了，每层参数的总数就可以显示出来。注意，图层的名称可能会因它们是自动生成的而有所不同。

Model: "my_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 25)                10025     
_________________________________________________________________
dense_1 (Dense)              (None, 15)                390       
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 16        
=================================================================
Total params: 10,431
Trainable params: 10,431
Non-trainable params: 0
_________________________________________________________________

Click for hints As described in the lecture:

model = Sequential(                      
    [                                   
        tf.keras.Input(shape=(400,)),    # specify input size (optional)
        Dense(25, activation='sigmoid'), 
        Dense(15, activation='sigmoid'), 
        Dense(1,  activation='sigmoid')  
    ], name = "my_model"                                    
)

# UNIT TESTS
from public_tests import * 

test_c1(model)

tests passed!

The parameter counts shown in the summary correspond to the number of elements in the weight and bias arrays as shown below.
摘要中显示的参数计数对应于权重和偏置数组中的元素数量，如下所示。

L1_num_params = 400 * 25 + 25  # W1 parameters  + b1 parameters
L2_num_params = 25 * 15 + 15   # W2 parameters  + b2 parameters
L3_num_params = 15 * 1 + 1     # W3 parameters  + b3 parameters
print("L1 params = ", L1_num_params, ", L2 params = ", L2_num_params, ",  L3 params = ", L3_num_params )

L1 params =  10025 , L2 params =  390 ,  L3 params =  16

Let’s further examine the weights to verify that tensorflow produced the same dimensions as we calculated above.
让我们进一步检查权重，以验证tensorflow生成了与我们计算相同的尺寸。

[layer1, layer2, layer3] = model.layers

#### Examine Weights shapes
W1,b1 = layer1.get_weights()
W2,b2 = layer2.get_weights()
W3,b3 = layer3.get_weights()
print(f"W1 shape = {W1.shape}, b1 shape = {b1.shape}")
print(f"W2 shape = {W2.shape}, b2 shape = {b2.shape}")
print(f"W3 shape = {W3.shape}, b3 shape = {b3.shape}")

W1 shape = (400, 25), b1 shape = (25,)
W2 shape = (25, 15), b2 shape = (15,)
W3 shape = (15, 1), b3 shape = (1,)

Expected Output

W1 shape = (400, 25), b1 shape = (25,)  
W2 shape = (25, 15), b2 shape = (15,)  
W3 shape = (15, 1), b3 shape = (1,)

xx.get_weights returns a NumPy array. One can also access the weights directly in their tensor form. Note the shape of the tensors in the final layer.
xx.get_weights返回NumPy数组。您也可以直接在它们的张量形式中访问权重。请注意最后一层张量的形状

print(model.layers[2].weights)

[<tf.Variable 'dense_2/kernel:0' shape=(15, 1) dtype=float32, numpy=
array([[ 0.02292085],
       [ 0.05047107],
       [-0.54231304],
       [ 0.27297366],
       [-0.20304939],
       [ 0.14754218],
       [-0.18506527],
       [ 0.3251729 ],
       [-0.00183654],
       [ 0.2951706 ],
       [-0.58827853],
       [-0.2511043 ],
       [ 0.48695832],
       [-0.3000403 ],
       [ 0.37903446]], dtype=float32)>, <tf.Variable 'dense_2/bias:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>]

The following code will define a loss function and run gradient descent to fit the weights of the model to the training data. This will be explained in more detail in the following week.
下面的代码将定义一个损失函数，并运行梯度下降来拟合模型的权重以适应训练数据。这将在下一周详细解释。

model.compile(
    loss=tf.keras.losses.BinaryCrossentropy(),
    optimizer=tf.keras.optimizers.Adam(0.001),
)

model.fit(
    X,y,
    epochs=20
)

Epoch 1/20
32/32 [==============================] - 2s 2ms/step - loss: 0.6361
Epoch 2/20
32/32 [==============================] - 0s 2ms/step - loss: 0.4901
Epoch 3/20
32/32 [==============================] - 0s 2ms/step - loss: 0.3357
Epoch 4/20
32/32 [==============================] - 0s 2ms/step - loss: 0.2240
Epoch 5/20
32/32 [==============================] - 0s 2ms/step - loss: 0.1581
Epoch 6/20
32/32 [==============================] - 0s 1ms/step - loss: 0.1194
Epoch 7/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0946
Epoch 8/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0773
Epoch 9/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0649
Epoch 10/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0559
Epoch 11/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0487
Epoch 12/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0430
Epoch 13/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0385
Epoch 14/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0348
Epoch 15/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0318
Epoch 16/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0292
Epoch 17/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0271
Epoch 18/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0252
Epoch 19/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0235
Epoch 20/20
32/32 [==============================] - 0s 1ms/step - loss: 0.0222





<keras.callbacks.History at 0x1b90431dca0>

To run the model on an example to make a prediction, use Keras predict. The input to predict is an array so the single example is reshaped to be two dimensional.
要在示例上运行模型以进行预测，请使用Keras ’ predict '。’ predict '的输入是一个数组，因此单个示例被重塑为二维。

prediction = model.predict(X[0].reshape(1,400))  # a zero
print(f" predicting a zero: {prediction}")
prediction = model.predict(X[500].reshape(1,400))  # a one
print(f" predicting a one:  {prediction}")

 predicting a zero: [[0.0165375]]
 predicting a one:  [[0.98365146]]

The output of the model is interpreted as a probability. In the first example above, the input is a zero. The model predicts the probability that the input is a one is nearly zero.
模型输出的解释作为概率。在上面的第一个例子中，输入是一个零。模型预测输入是一的概率几乎是零。
In the second example, the input is a one. The model predicts the probability that the input is a one is nearly one.
在第二个例子中，输入是一个一。模型预测输入是一的概率几乎是1。
As in the case of logistic regression, the probability is compared to a threshold to make a final prediction.
在逻辑回归的情况下，概率与阈值进行比较以做出最后的预测。

if prediction >= 0.5:
    yhat = 1
else:
    yhat = 0
print(f"prediction after threshold: {yhat}")

prediction after threshold: 1

Let’s compare the predictions vs the labels for a random sample of 64 digits. This takes a moment to run.
让我们比较随机样本的64个数字的预测与标签。这需要一段时间才能运行。

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cell

m, n = X.shape

fig, axes = plt.subplots(8,8, figsize=(8,8))
fig.tight_layout(pad=0.1,rect=[0, 0.03, 1, 0.92]) #[left, bottom, right, top]

for i,ax in enumerate(axes.flat):
    # Select random indices
    random_index = np.random.randint(m)
    
    # Select rows corresponding to the random indices and
    # reshape the image
    X_random_reshaped = X[random_index].reshape((20,20)).T
    
    # Display the image
    ax.imshow(X_random_reshaped, cmap='gray')
    
    # Predict using the Neural Network
    prediction = model.predict(X[random_index].reshape(1,400))
    if prediction >= 0.5:
        yhat = 1
    else:
        yhat = 0
    
    # Display the label above the image 
    ax.set_title(f"{y[random_index,0]},{yhat}")
    ax.set_axis_off()
fig.suptitle("Label, yhat", fontsize=16)
plt.show()

显示错误

2.5 NumPy Model Implementation (Forward Prop in NumPy) NumPy模型实现（NumPy中的前向传播）

As described in lecture, it is possible to build your own dense layer using NumPy. This can then be utilized to build a multi-layer neural network.
如讲座中所述，可以使用NumPy构建自己的密集层。这可以用来构建一个多层神经网络。

显示错误

Exercise 2

Below, build a dense layer subroutine. The example in lecture utilized a for loop to visit each unit (j) in the layer and perform the dot product of the weights for that unit (W[:,j]) and sum the bias for the unit (b[j]) to form z. An activation function g(z) is then applied to that result. This section will not utilize some of the matrix operations described in the optional lectures. These will be explored in a later section.
在下面，构建一个密集层子程序。讲座中的示例使用了一个for循环来访问该层中的每个单元(j)，并对该单元的权重(W[:,j])和偏差(b[j])进行点积，以形成z。然后，将激活函数g(z)应用于该结果。本节将不使用在本课中描述的一些矩阵操作。这些将在后面的章节中探讨。

# UNQ_C2
# GRADED FUNCTION: my_dense

def my_dense(a_in, W, b, g):
    """
    Computes dense layer
    Args:
      a_in (ndarray (n, )) : Data, 1 example 
      W    (ndarray (n,j)) : Weight matrix, n features per unit, j units
      b    (ndarray (j, )) : bias vector, j units  
      g    activation function (e.g. sigmoid, relu..)
    Returns
      a_out (ndarray (j,))  : j units
    """
    units = W.shape[1]
    a_out = np.zeros(units)
### START CODE HERE ### 
    for j in range(units):
        w = W[:,j]
        z = np.dot(w,a_in) + b[j]
        a_out[j] = g(z)    
### END CODE HERE ### 
    return(a_out)

# Quick Check
x_tst = 0.1*np.arange(1,3,1).reshape(2,)  # (1 examples, 2 features)
W_tst = 0.1*np.arange(1,7,1).reshape(2,3) # (2 input features, 3 output features)
b_tst = 0.1*np.arange(1,4,1).reshape(3,)  # (3 features)
A_tst = my_dense(x_tst, W_tst, b_tst, sigmoid)
print(A_tst)

[0.54735762 0.57932425 0.61063923]

Expected Output

[0.54735762 0.57932425 0.61063923]

Click for hints As described in the lecture:

def my_dense(a_in, W, b, g):
    """
    Computes dense layer
    Args:
      a_in (ndarray (n, )) : Data, 1 example 
      W    (ndarray (n,j)) : Weight matrix, n features per unit, j units
      b    (ndarray (j, )) : bias vector, j units  
      g    activation function (e.g. sigmoid, relu..)
    Returns
      a_out (ndarray (j,))  : j units
    """
    units = W.shape[1]
    a_out = np.zeros(units)
    for j in range(units):             
        w =                            # Select weights for unit j. These are in column j of W
        z =                            # dot product of w and a_in + b
        a_out[j] =                     # apply activation to z
    return(a_out)

Click for more hints

def my_dense(a_in, W, b, g):
    """
    Computes dense layer
    Args:
      a_in (ndarray (n, )) : Data, 1 example 
      W    (ndarray (n,j)) : Weight matrix, n features per unit, j units
      b    (ndarray (j, )) : bias vector, j units  
      g    activation function (e.g. sigmoid, relu..)
    Returns
      a_out (ndarray (j,))  : j units
    """
    units = W.shape[1]
    a_out = np.zeros(units)
    for j in range(units):             
        w = W[:,j]                     
        z = np.dot(w, a_in) + b[j]     
        a_out[j] = g(z)                
    return(a_out)

# UNIT TESTS
test_c2(my_dense)

[92mAll tests passed!

The following cell builds a three-layer neural network utilizing the my_dense subroutine above.
下面的单元格构建了一个使用my_dense子例程的三层神经网络。

def my_sequential(x, W1, b1, W2, b2, W3, b3):
    a1 = my_dense(x,  W1, b1, sigmoid)
    a2 = my_dense(a1, W2, b2, sigmoid)
    a3 = my_dense(a2, W3, b3, sigmoid)
    return(a3)

We can copy trained weights and biases from Tensorflow.
我们可以从Tensorflow复制训练过的权重和偏差。

W1_tmp,b1_tmp = layer1.get_weights()
W2_tmp,b2_tmp = layer2.get_weights()
W3_tmp,b3_tmp = layer3.get_weights()

# make predictions
prediction = my_sequential(X[0], W1_tmp, b1_tmp, W2_tmp, b2_tmp, W3_tmp, b3_tmp )
if prediction >= 0.5:
    yhat = 1
else:
    yhat = 0
print( "yhat = ", yhat, " label= ", y[0,0])
prediction = my_sequential(X[500], W1_tmp, b1_tmp, W2_tmp, b2_tmp, W3_tmp, b3_tmp )
if prediction >= 0.5:
    yhat = 1
else:
    yhat = 0
print( "yhat = ", yhat, " label= ", y[500,0])

yhat =  0  label=  0
yhat =  1  label=  1

Run the following cell to see predictions from both the Numpy model and the Tensorflow model. This takes a moment to run.
运行下面的单元格，看看来自Numpy模型的预测和来自Tensorflow模型的预测。这需要一段时间才能运行。

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cell

m, n = X.shape

fig, axes = plt.subplots(8,8, figsize=(8,8))
fig.tight_layout(pad=0.1,rect=[0, 0.03, 1, 0.92]) #[left, bottom, right, top]

for i,ax in enumerate(axes.flat):
    # Select random indices
    random_index = np.random.randint(m)
    
    # Select rows corresponding to the random indices and
    # reshape the image
    X_random_reshaped = X[random_index].reshape((20,20)).T
    
    # Display the image
    ax.imshow(X_random_reshaped, cmap='gray')

    # Predict using the Neural Network implemented in Numpy
    my_prediction = my_sequential(X[random_index], W1_tmp, b1_tmp, W2_tmp, b2_tmp, W3_tmp, b3_tmp )
    my_yhat = int(my_prediction >= 0.5)

    # Predict using the Neural Network implemented in Tensorflow
    tf_prediction = model.predict(X[random_index].reshape(1,400))
    tf_yhat = int(tf_prediction >= 0.5)
    
    # Display the label above the image
    ax.set_title(f"{y[random_index,0]},{tf_yhat},{my_yhat}")
    ax.set_axis_off() 
fig.suptitle("Label, yhat Tensorflow, yhat Numpy", fontsize=16)
plt.show()

在这里插入图片描述

2.6 Vectorized NumPy Model Implementation (Optional) 矢量化NumPy模型实现(可选)

The optional lectures described vector and matrix operations that can be used to speed the calculations.
可选实验讲解矢量和矩阵操作，可以用来加速计算。
Below describes a layer operation that computes the output for all units in a layer on a given input example:
下面描述了一个层操作，计算给定输入示例中所有单元的输出：

显示错误

We can demonstrate this using the examples X and the W1,b1 parameters above. We use np.matmul to perform the matrix multiply. Note, the dimensions of x and W must be compatible as shown in the diagram above.
我们可以使用上面的例子X和W1，b1参数来演示这个操作。我们使用np.matmul来执行矩阵乘法。注意，x和W的维度必须是兼容的，如上图所示。

x = X[0].reshape(-1,1)         # column vector (400,1)
z1 = np.matmul(x.T,W1) + b1    # (1,400)(400,25) = (1,25)
a1 = sigmoid(z1)
print(a1.shape)

(1, 25)

You can take this a step further and compute all the units for all examples in one Matrix-Matrix operation.
你可以更进一步，在一个矩阵-矩阵操作中计算所有示例的所有单元。

显示错误

The full operation is $\ mathbf{Z}=\mathbf{XW}+\mathbf{b}$ This will utilize NumPy broadcasting to expand $\mathbf{b}$ to $m$ rows. If this is unfamiliar, a short tutorial is provided at the end of the notebook.

完整的操作是 $\ mathbf{Z}=\mathbf{XW}+\mathbf{b}$ 。这将使用NumPy广播来扩展 $\mathbf{b}$ 到 $m$ 行。如果这很陌生，可以在笔记本的末尾找到一个简短的教程。

Exercise 3

Below, compose a new my_dense_v subroutine that performs the layer calculations for a matrix of examples. This will utilize np.matmul().
在下面，编写一个新的my_dense_v子程序，用于执行矩阵示例的层计算。这将使用np.matmul()。

# UNQ_C3
# GRADED FUNCTION: my_dense_v

def my_dense_v(A_in, W, b, g):
    """
    Computes dense layer
    Args:
      A_in (ndarray (m,n)) : Data, m examples, n features each
      W    (ndarray (n,j)) : Weight matrix, n features per unit, j units
      b    (ndarray (j,1)) : bias vector, j units  
      g    activation function (e.g. sigmoid, relu..)
    Returns
      A_out (ndarray (m,j)) : m examples, j units
    """
### START CODE HERE ### 
    z = np.matmul(A_in, W) + b
    A_out = g(z)
        
    
### END CODE HERE ### 
    return(A_out)

X_tst = 0.1*np.arange(1,9,1).reshape(4,2) # (4 examples, 2 features)
W_tst = 0.1*np.arange(1,7,1).reshape(2,3) # (2 input features, 3 output features)
b_tst = 0.1*np.arange(1,4,1).reshape(1,3) # (3 features, 1)
A_tst = my_dense_v(X_tst, W_tst, b_tst, sigmoid)
print(A_tst)

tf.Tensor(
[[0.54735762 0.57932425 0.61063923]
 [0.57199613 0.61301418 0.65248946]
 [0.5962827  0.64565631 0.6921095 ]
 [0.62010643 0.67699586 0.72908792]], shape=(4, 3), dtype=float64)

Expected Output

[[0.54735762 0.57932425 0.61063923]
 [0.57199613 0.61301418 0.65248946]
 [0.5962827  0.64565631 0.6921095 ]
 [0.62010643 0.67699586 0.72908792]]

Click for hints In matrix form, this can be written in one or two lines.

   Z = np.matmul of A_in and W plus b    
   A_out is g(Z)

Click for code

def my_dense_v(A_in, W, b, g):
    """
    Computes dense layer
    Args:
      A_in (ndarray (m,n)) : Data, m examples, n features each
      W    (ndarray (n,j)) : Weight matrix, n features per unit, j units
      b    (ndarray (j,1)) : bias vector, j units  
      g    activation function (e.g. sigmoid, relu..)
    Returns
      A_out (ndarray (m,j)) : m examples, j units
    """
    Z = np.matmul(A_in,W) + b    
    A_out = g(Z)                 
    return(A_out)

# UNIT TESTS
test_c3(my_dense_v)

[92mAll tests passed!

The following cell builds a three-layer neural network utilizing the my_dense_v subroutine above.
下面的单元格构建了一个使用my_dense_v子例程的三层神经网络。

def my_sequential_v(X, W1, b1, W2, b2, W3, b3):
    A1 = my_dense_v(X,  W1, b1, sigmoid)
    A2 = my_dense_v(A1, W2, b2, sigmoid)
    A3 = my_dense_v(A2, W3, b3, sigmoid)
    return(A3)

We can again copy trained weights and biases from Tensorflow.
我们可以从Tensorflow再次复制训练过的权重和偏差。

W1_tmp,b1_tmp = layer1.get_weights()
W2_tmp,b2_tmp = layer2.get_weights()
W3_tmp,b3_tmp = layer3.get_weights()

Let’s make a prediction with the new model. This will make a prediction on all of the examples at once. Note the shape of the output.
让我们用新模型做一个预测。这将一次性对所有示例进行预测。注意输出的形状。

Prediction = my_sequential_v(X, W1_tmp, b1_tmp, W2_tmp, b2_tmp, W3_tmp, b3_tmp )
Prediction.shape

TensorShape([1000, 1])

We’ll apply a threshold of 0.5 as before, but to all predictions at once.
我们再次应用阈值0.5，但一次性对所有预测。

Yhat = (Prediction >= 0.5).numpy().astype(int)
print("predict a zero: ",Yhat[0], "predict a one: ", Yhat[500])

predict a zero:  [0] predict a one:  [1]

Run the following cell to see predictions. This will use the predictions we just calculated above. This takes a moment to run.
运行下面的单元格以查看预测。这将使用我们刚才计算的预测。这需要一段时间才能运行。

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cell

m, n = X.shape

fig, axes = plt.subplots(8, 8, figsize=(8, 8))
fig.tight_layout(pad=0.1, rect=[0, 0.03, 1, 0.92]) #[left, bottom, right, top]

for i, ax in enumerate(axes.flat):
    # Select random indices
    random_index = np.random.randint(m)
    
    # Select rows corresponding to the random indices and
    # reshape the image
    X_random_reshaped = X[random_index].reshape((20, 20)).T
    
    # Display the image
    ax.imshow(X_random_reshaped, cmap='gray')
   
    # Display the label above the image
    ax.set_title(f"{y[random_index,0]}, {Yhat[random_index, 0]}")
    ax.set_axis_off() 
fig.suptitle("Label, Yhat", fontsize=16)
plt.show()

显示错误

You can see how one of the misclassified images looks.
你可以看到其中一个被误分类的图像如何。

fig = plt.figure(figsize=(1, 1))
errors = np.where(y != Yhat)
random_index = errors[0][0]
X_random_reshaped = X[random_index].reshape((20, 20)).T
plt.imshow(X_random_reshaped, cmap='gray')
plt.title(f"{y[random_index,0]}, {Yhat[random_index, 0]}")
plt.axis('off')
plt.show()

显示错误

2.7 Congratulations!

You have successfully built and utilized a neural network.
你已经成功构建并利用了一个神经网络。

2.8 NumPy Broadcasting Tutorial (Optional) Numpy广播教程（可选）

In the last example, $\mathbf{Z}=\mathbf{XW} + \mathbf{b}$ utilized NumPy broadcasting to expand the vector $\mathbf{b}$ . If you are not familiar with NumPy Broadcasting, this short tutorial is provided.
在最后一个例子中， $\mathbf{Z}=\mathbf{XW} + \mathbf{b}$ 利用NumPy Broadcasting来扩展向量 $\mathbf{b}$ 。如果你不熟悉NumPy广播，提供了一个简短的教程。

$\mathbf{XW}$ is a matrix-matrix operation with dimensions $m,j_1)(j_1,j_2)$ which results in a matrix with dimension $m,j_2)$ . To that, we add a vector $\mathbf{b}$ with dimension $j_2,)$ . $\mathbf{b}$ must be expanded to be a $m,j_2)$ matrix for this element-wise operation to make sense. This expansion is accomplished for you by NumPy broadcasting.
$\mathbf{XW}$ 是一个具有维度 $m,j_1)(j_1,j_2)$ 的矩阵-矩阵操作，结果是一个具有维度 $m,j_2)$ 的矩阵。然后，我们添加一个具有维度 $j_2,)$ 的向量 $\mathbf{b}$ ，以进行逐元素操作。为了使这个元素-wise操作有意义，必须将 $\mathbf{b}$ 扩展为 $m,j_2)$ 的矩阵。这个扩展是由NumPy广播完成的。

Broadcasting applies to element-wise operations.
Broadcasting适用于逐元素操作。
Its basic operation is to ‘stretch’ a smaller dimension by replicating elements to match a larger dimension.
它的基本操作是通过复制元素以匹配更大的维度来“拉伸”较小的维度。

More specifically:
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when
更多具体:
当对两个数组进行操作时，NumPy会逐个比较它们的形状。它从末尾(即最右边)维度开始，然后向左工作。两个维度是兼容的

they are equal, or（它们是相等的，或者）
one of them is 1（其中一个是1）

If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the size that is not 1 along each axis of the inputs.
如果没有满足这些条件，将抛出ValueError: operands could not be broadcast together异常，表明输入数组的形状不兼容。结果数组的尺寸是输入数组中不是1的每个轴的尺寸。

Here are some examples:
这里是几个例子:

在这里插入图片描述

The graphic below describes expanding dimensions. Note the red text below:
下面的图表描述了扩展维度。注意下面的红色文本:

显示错误

The graphic above shows NumPy expanding the arguments to match before the final operation. Note that this is a notional description. The actual mechanics of NumPy operation choose the most efficient implementation.
上面的图表显示了NumPy将参数扩展以匹配最终操作之前。请注意，这是一个概念上的描述。NumPy操作的实际机制选择最有效的实现。

For each of the following examples, try to guess the size of the result before running the example.
对于以下每个示例，在运行示例之前尝试猜测结果的大小。

a = np.array([1,2,3]).reshape(-1,1)  #(3,1)
b = 5
print(f"(a + b).shape: {(a + b).shape}, \na + b = \n{a + b}")

(a + b).shape: (3, 1), 
a + b = 
[[6]
 [7]
 [8]]

Note that this applies to all element-wise operations:
这适用于所有逐元素操作:

a = np.array([1,2,3]).reshape(-1,1)  #(3,1)
b = 5
print(f"(a * b).shape: {(a * b).shape}, \na * b = \n{a * b}")

(a * b).shape: (3, 1), 
a * b = 
[[ 5]
 [10]
 [15]]

Row-Column Element-Wise Operations（）
在这里插入图片描述

a = np.array([1,2,3,4]).reshape(-1,1)
b = np.array([1,2,3]).reshape(1,-1)
print(a)
print(b)
print(f"(a + b).shape: {(a + b).shape}, \na + b = \n{a + b}")

[[1]
 [2]
 [3]
 [4]]
[[1 2 3]]
(a + b).shape: (4, 3), 
a + b = 
[[2 3 4]
 [3 4 5]
 [4 5 6]
 [5 6 7]]

This is the scenario in the dense layer you built above. Adding a 1-D vector $b$ to a (m,j) matrix.
这是你在上面构建的密集层中的情况。将1-D向量 $b$ 添加到一个(m,j)矩阵中。
显示错误
Matrix + 1-D Vector（矩阵+一维向量）

2.9 Pytorch模型实现

练习 1：构建网络模型

在本次练习中你将构建的模型如下：
显示错误

import torch
import numpy as np
from torch import nn
from torch.utils.data import DataLoader, Dataset,TensorDataset

class HandWritingDataset(nn.model):
    def __init__(self):
        super(HandWritingDataset,self).__init__()
        self.model = nn.Sequential(
            ### 开始代码

            ### 结束代码
        )

    def forward(self,x):
        x = self.model(x)
        return x

提示

class HandWritingModel(nn.Module):
    def __init__(self):
        super(HandWritingModel,self).__init__()
        self.model = nn.Sequential(
            nn.Linear(400,25),
            nn.Sigmoid(),
            nn.Linear(25,15),
            nn.Sigmoid(),
            nn.Linear(15,1),
            nn.Sigmoid()
        )

    def forward(self,x):
        x = self.model(x)
        return x

在gpu上运行

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# 查看权重和偏置
hz_HWM = HandWritingModel()
hz_HWM.to(device)

w1, b1= hz_HWM.model[0].weight, hz_HWM.model[0].bias
w2, b2= hz_HWM.model[2].weight, hz_HWM.model[2].bias
w3, b3= hz_HWM.model[4].weight, hz_HWM.model[4].bias

print(f"w1:\n{w1.shape}\nb1:\n{b1.shape}\nw2:\n{w2.shape}\nb2:\n{b2.shape}\nw3:\n{w3.shape}\nb3:\n{b3.shape}")

w1:
torch.Size([25, 400])
b1:
torch.Size([25])
w2:
torch.Size([15, 25])
b2:
torch.Size([15])
w3:
torch.Size([1, 15])
b3:
torch.Size([1])

下面的代码加载手写体（0 1）数据集


def load_data():
    X = np.load("data/X.npy")
    y = np.load("data/y.npy")
    X = X[0:1000]
    y = y[0:1000]
    return X, y

#构建训练集
X, y = load_data()
X = torch.tensor(X,dtype=torch.float32)
y = torch.tensor(y,dtype=torch.float32)
data = TensorDataset(X, y)
data_loader = DataLoader(data, batch_size=10, shuffle=False) #shuffle是否打乱数据

解释：

TensorDataset：将数据进行打包整合（数据格式为Tensor）与python中zip函数类似。
DataLoader:用来分批次向模型中传入数据，也就是把 Dataset中的全部数据分批次送入模型中。

下面的例子，将展示TensorDataset构建的效果

x = torch.Tensor([[1, 4, 7, 10], [2, 5, 8, 11], [3, 6, 9, 12]])
y = torch.Tensor([1, 2, 0])
tensorDataset = TensorDataset(x, y)

print(tensorDataset)

<torch.utils.data.dataset.TensorDataset object at 0x00000201B8E979A0>

for i, j in tensorDataset:
    print(i, j)

tensor([ 1.,  4.,  7., 10.]) tensor(1.)
tensor([ 2.,  5.,  8., 11.]) tensor(2.)
tensor([ 3.,  6.,  9., 12.]) tensor(0.)

正如结果所示，这正是我们想要数据的格式。

下面我们来训练数据

注意：

在pytorch中使用优化器优化模型时，在训练时必须optimizer.zero_grad(),loss.backward(),optimizer.step()这三个按照顺序执行。因为网络的传播过程是前向传播得到变量的梯度值，然后通过反向传播得到梯度值，最后通过优化器更新参数就可以使模型收敛。
optimizer.zero_grad()就是把原来的梯度清零再去计算新的梯度值。通过loss.backward()进行反向传播，在每个新的梯度被计算出来之后通过optimizer.step()更新参数。如果不进行清零操作，那么梯度就会越来越大，导致模型无法收敛也称为梯度爆炸。


# 添加损失函数和优化器
loss_fn = nn.BCELoss()
loss_fn.to(device)

learning_rate = 1e-2
optimizer = torch.optim.Adam(hz_HWM.parameters(), lr=learning_rate)  

#记录训练的次数
total_train_step = 0

#训练轮数
epoch = 20

for i in range(epoch):
    print("---------------------第{}轮训练开始了-------------------------".format(i+1))
    hz_HWM.train()
    for data,target in data_loader:
        data = data.to(device)
        target = target.to(device)
        output = hz_HWM(data)
        loss = loss_fn(output, target)

        #优化器优化模型
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_train_step += 1
        if total_train_step % 100 == 0:
            print("训练次数：{},loss:{}".format(total_train_step,loss.item()))

    if i == 10:
        torch.save(hz_HWM, "model_{}.pth".format(i+1))
        torch.save(hz_HWM.state_dict(), "model_dict_{}.pth".format(i+1))
        print("模型已保存")

---------------------第1轮训练开始了-------------------------
训练次数：100,loss:0.0012543974444270134
---------------------第2轮训练开始了-------------------------
训练次数：200,loss:0.0011438236106187105
---------------------第3轮训练开始了-------------------------
训练次数：300,loss:0.001060314942151308
---------------------第4轮训练开始了-------------------------
训练次数：400,loss:0.0009814611403271556
---------------------第5轮训练开始了-------------------------
训练次数：500,loss:0.0007359710871241987
---------------------第6轮训练开始了-------------------------
训练次数：600,loss:0.0005468075978569686
---------------------第7轮训练开始了-------------------------
训练次数：700,loss:0.000650236033834517
---------------------第8轮训练开始了-------------------------
训练次数：800,loss:0.00045749294804409146
---------------------第9轮训练开始了-------------------------
训练次数：900,loss:0.0003325061989016831
---------------------第10轮训练开始了-------------------------
训练次数：1000,loss:0.00025803802418522537
---------------------第11轮训练开始了-------------------------
训练次数：1100,loss:0.00020820880308747292
模型已保存
---------------------第12轮训练开始了-------------------------


训练次数：1200,loss:0.0001726418995531276
---------------------第13轮训练开始了-------------------------
训练次数：1300,loss:0.00014589898637495935
---------------------第14轮训练开始了-------------------------
训练次数：1400,loss:0.000125117992865853
---------------------第15轮训练开始了-------------------------
训练次数：1500,loss:0.0001085101903299801
---------------------第16轮训练开始了-------------------------
训练次数：1600,loss:9.499047155259177e-05
---------------------第17轮训练开始了-------------------------
训练次数：1700,loss:8.38434134493582e-05
---------------------第18轮训练开始了-------------------------
训练次数：1800,loss:7.453242869814858e-05
---------------------第19轮训练开始了-------------------------
训练次数：1900,loss:6.669983122264966e-05
---------------------第20轮训练开始了-------------------------
训练次数：2000,loss:5.9999842051183805e-05

torch.save(model,'path') 保存整个模型，
torch.save(model.state_dict(),'path')保存模型的参数（权重w和偏置b）。

model = HandWritingModel()
model.load_state_dict(torch.load("model_dict_11.pth",map_location=torch.device('cpu')))
model.eval()

HandWritingModel(
  (model): Sequential(
    (0): Linear(in_features=400, out_features=25, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=25, out_features=15, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=15, out_features=1, bias=True)
    (5): Sigmoid()
  )
)

保存模型的推理过程的时候，只需要保存模型训练好的参数，使用torch.save()保存state_dict，能够方便模型的加载。因此推荐使用这种方式进行模型保存。model.eval()固定dropout和归一化层，否则每次推理会生成不同的结果。


model = torch.load("model_11.pth",map_location=torch.device('cpu'))

prediction = model(X[0].reshape(1, -1))
print(f"prediction a zero:{prediction}")
prediction = model(X[500].reshape(1, -1))
print(f"prediction a one:{prediction}")

prediction a zero:tensor([[0.0004]], grad_fn=<SigmoidBackward0>)
prediction a one:tensor([[0.9992]], grad_fn=<SigmoidBackward0>)

yhat = np.where(prediction > 0.5, 1, 0)
print(f"yhat:{yhat}")

yhat:[[1]]

800,loss:7.453242869814858e-05
---------------------第19轮训练开始了-------------------------
训练次数：1900,loss:6.669983122264966e-05
---------------------第20轮训练开始了-------------------------
训练次数：2000,loss:5.9999842051183805e-05

torch.save(model,'path') 保存整个模型，
torch.save(model.state_dict(),'path')保存模型的参数（权重w和偏置b）。

model = HandWritingModel()
model.load_state_dict(torch.load("model_dict_11.pth",map_location=torch.device('cpu')))
model.eval()

HandWritingModel(
  (model): Sequential(
    (0): Linear(in_features=400, out_features=25, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=25, out_features=15, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=15, out_features=1, bias=True)
    (5): Sigmoid()
  )
)


model = torch.load("model_11.pth",map_location=torch.device('cpu'))

prediction = model(X[0].reshape(1, -1))
print(f"prediction a zero:{prediction}")
prediction = model(X[500].reshape(1, -1))
print(f"prediction a one:{prediction}")

prediction a zero:tensor([[0.0004]], grad_fn=<SigmoidBackward0>)
prediction a one:tensor([[0.9992]], grad_fn=<SigmoidBackward0>)

yhat = np.where(prediction > 0.5, 1, 0)
print(f"yhat:{yhat}")

yhat:[[1]]

salvation～

关注

54
点赞
踩
20

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录