吴恩达深度学习作业04

最新推荐文章于 2023-02-21 10:54:10 发布

长命百岁️

最新推荐文章于 2023-02-21 10:54:10 发布

阅读量607

点赞数 1

分类专栏：深度学习文章标签： python 深度学习

本文链接：https://blog.csdn.net/qq_52852138/article/details/121026433

版权

深度学习专栏收录该内容

23 篇文章 3 订阅

订阅专栏

Step by Step

卷积

补全zero_pad

X_pad = np.pad(X,((0,0),(pad,pad),(pad,pad),(0,0)),'constant',constant_values = 0) #第二个参数是对哪些维度的两边加怎么样的padding

补全conv_single_step

### START CODE HERE ### (≈ 2 lines of code)
# Element-wise product between a_slice and W. Add bias.
s = np.multiply(a_slice_prev , W)
# Sum over all entries of the volume s
Z = np.sum(s) + float(b)
### END CODE HERE ###

将卷积核与要卷积的部分，对应位置相乘之后求和，再加上一个bias

补全conv_forward

按卷积核提取出要卷积的数据
卷积并赋值

 ### START CODE HERE ###
    # Retrieve dimensions from A_prev's shape (≈1 line)  
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    
    # Retrieve dimensions from W's shape (≈1 line)
    (f, f, n_C_prev, n_C) = W.shape
    
    # Retrieve information from "hparameters" (≈2 lines)
    stride = hparameters['stride']
    pad = hparameters['pad']
    
    # Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
    #计算卷积之后的结果矩阵的大小
    n_H = int((n_H_prev - f + 2 * pad) / stride + 1)
    n_W = int((n_W_prev - f + 2 * pad) / stride + 1)
    
    # Initialize the output volume Z with zeros. (≈1 line)
    Z = np.zeros((m , n_H , n_W , n_C))
    
    # Create A_prev_pad by padding A_prev
    A_prev_pad = zero_pad(A_prev , pad)
    
    for i in range(m):# 遍历所有训练数据                               
        a_prev_pad = A_prev_pad[i]#选出第 i 个训练数据 , 大小为 (n_H, n_W, n_C)
        for h in range(n_H):                           # loop over vertical axis of the output volume
            for w in range(n_W):                       # loop over horizontal axis of the output volume
                for c in range(n_C):                   # loop over channels (= #filters) of the output volume
                    
                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    # 选取要进行卷积的片段 
                    a_slice_prev = a_prev_pad[vert_start:vert_end , horiz_start:horiz_end , :]
                    
                    # 进行单步卷积
                    Z[i, h, w, c] = conv_single_step(a_slice_prev , W[:,:,:,c] , b[:,:,:,c])
                                        
    ### END CODE HERE ###

池化

在这里插入图片描述

补全pool_forward

提取数据的方式和卷积相同，先提取要池化的数据
然后进行池化，并赋值

### START CODE HERE ###
    for i in range(m):                         # loop over the training examples
        for h in range(n_H):                     # loop on the vertical axis of the output volume
            for w in range(n_W):                 # loop on the horizontal axis of the output volume
                for c in range (n_C):            # loop over the channels of the output volume
                    
                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    # Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)
                    a_prev_slice = A_prev[i , vert_start:vert_end , horiz_start:horiz_end , c]
                    
                    # Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.
                    if mode == "max":
                        A[i, h, w, c] = np.max(a_prev_slice)
                    elif mode == "average":
                        A[i, h, w, c] = np.average(a_prev_slice)
    
    ### END CODE HERE ###

卷积BP

BP推导

首先求损失函数对卷积层输入的梯度，令卷积层输入为 $A$ , 卷积层输出为 $Z$
- $\sum _{h=0} ^{n_H} \sum_{w=0} ^{n_W} W_c \times dZ_{hw}$
  - 这里的 $W_c$ 就是一个卷积，也就是一个与原数据进行点乘的矩阵
  - $dZ_{hw}$ 指的是损失函数对卷积结果矩阵中元素 (h , w) 的梯度。
    - 容易看出的是，损失是个标量， $Z_{hw}$ 也是标量，因此标量对标量求导 $dZ_{hw}$ 也是标量
  - 对黄色区域数据来说， $dA_{yellow} = W_c \times dZ_{00}$
  - 我们遍历右边的卷积结果矩阵，每个位置都会产生 $d A$ ，总的结果是将所有位置产生的梯度相加
然后求 $dW_c$
- $dW_c += \sum _{h=0} ^{n_H} \sum_{w=0} ^ {n_W} a_{slice} \times dZ_{hw}$
  - 容易看出，对 $W$ 的梯度只是将对 $A$ 的梯度中的 $W_c$ 换成了 $\alpha_{slice}$
  - 这很容易理解，对上图黄色区域来说 $Z_{hw} = W_c \times \alpha_{slice}$ ，因为是点乘，因此 $Z_{hw}$ 对 $W_c$ 的导数是 $\alpha_{slice}$ , 对 $\alpha_{slice}$ 的导数是 $W_c$
最后求 $d b$
- $\sum_h \sum_w dZ_{hw}$
- 与上面不同的是， $Z_{hw}$ 对 $b$ 的导数是 1

补全conv_backward

根据上面的推导，该函数很容易补全

    ### START CODE HERE ###
    # Retrieve information from "cache"
    (A_prev, W, b, hparameters) = cache
    
    # Retrieve dimensions from A_prev's shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    
    # Retrieve dimensions from W's shape
    (f, f, n_C_prev, n_C) = W.shape
    
    # Retrieve information from "hparameters"
    stride = hparameters['stride']
    pad = hparameters['pad']
    
    # Retrieve dimensions from dZ's shape
    (m, n_H, n_W, n_C) = dZ.shape
    
    # Initialize dA_prev, dW, db with the correct shapes
    dA_prev = np.zeros(A_prev.shape)                           
    dW = np.zeros(W.shape)
    db = np.zeros(b.shape)

    # Pad A_prev and dA_prev
    A_prev_pad = zero_pad(A_prev , pad)
    dA_prev_pad = zero_pad(dA_prev , pad)
    
    for i in range(m):                       # loop over the training examples
        
        # select ith training example from A_prev_pad and dA_prev_pad
        a_prev_pad = A_prev_pad[i]
        da_prev_pad = dA_prev_pad[i]
        
        for h in range(n_H):                   # loop over vertical axis of the output volume
            for w in range(n_W):               # loop over horizontal axis of the output volume
                for c in range(n_C):           # loop over the channels of the output volume
                    
                    # Find the corners of the current "slice"
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    # Use the corners to define the slice from a_prev_pad
                    a_slice = a_prev_pad[vert_start:vert_end , horiz_start:horiz_end , :]

                    # Update gradients for the window and the filter's parameters using the code formulas given above
                    da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i , h , w , c]
                    dW[:,:,:,c] += a_slice * dZ[i , h , w , c]
                    db[:,:,:,c] += dZ[i , h , w, c]
                    
        # Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])
        dA_prev[i, :, :, :] = da_prev_pad[pad:-pad , pad:-pad]
    ### END CODE HERE ###

池化BP

最大值池化

在这里插入图片描述

可见，对上图来说，最大值池化只有最大值起作用，因此池化结果的梯度由池化输入的最大值造成。这时，我们可以在池化之前的梯度上加上一个mask，用于只让最大值对梯度产生影响。

对于一次池化来说池化结果 $Z$ 与池化的输入 $A$ 之间的关系为 $Z = m a x (A)$

def create_mask_from_window(x):
    """
    Creates a mask from an input matrix x, to identify the max entry of x.
    
    Arguments:
    x -- Array of shape (f, f)
    
    Returns:
    mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x.
    """
    
    ### START CODE HERE ### (≈1 line)
    mask = (x == np.max(x))
    ### END CODE HERE ###
    
    return mask

我们将mask添加到梯度的计算上
在这里插入图片描述

if mode == "max":
    a_prev_slice = a_prev[vert_start:vert_end,horiz_start:horiz_end,c]#看作上图左上角蓝色区域
    
    mask = create_mask_from_window(a_prev_slice)#计算该区域对应的mask，即除了 7 的位置为 1 外，其余位置都是 0
   #池化结果的梯度由 7 造成，因此我们将原梯度点乘 mask，只让 7 发挥作用
    dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += mask * dA[i,h,w,c]

平均值池化

在这里插入图片描述

对上面左上角区域来说，令池化的输出为 $Z$ , 输入为 $A$ ，那么 $\frac{1}{4}sum(A) = \frac{1}{4}(a_1+a_2+a_3+a_4)$ 。也就是说 $A$ 中的四个位置的变化对 $Z$ 的影响程度是相同的。比如左上角的 2 变成 6 ，那么输出结果就是 5。如果不是 2 变而是 3 变成 7 ，输出结果同样是 5。因此 $Z$ 的变化是由 $A$ 中元素共同影响的，且影响相同。同样，由公式，可得 $\frac{1}{4}(da_1+da_2+da_3+da_4)$

也就是说，池化结果的梯度（也就是变化程度）由池化的输入元素均摊

这里我们写一个影响均摊函数，返回一个 mask

def distribute_value(dz, shape):
    """
    Distributes the input value in the matrix of dimension shape
    
    Arguments:
    dz -- input scalar
    shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz
    
    Returns:
    a -- Array of size (n_H, n_W) for which we distributed the value of dz
    """
    
    ### START CODE HERE ###
    # Retrieve dimensions from shape (≈1 line)
    (n_H, n_W) = shape
    
    # Compute the value to distribute on the matrix (≈1 line)
    average = dz / (n_H * n_W)
    
    # Create a matrix where every entry is the "average" value (≈1 line)
    a = np.zeros(shape) + average
    ### END CODE HERE ###
    
    return a

我们将 mask 添加到计算上

在这里插入图片描述

elif mode == "average":               da = dA[i,h,w,c] #可以看做上图右边的4的梯度        shape = [f,f] #均摊框的大小    #输出 4 的变化由左上角四个元素均摊    dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da,shape)

Application

TensorFlow要求您为运行会话时将输入到模型中的输入数据创建占位符。现在我们要实现创建占位符的函数，因为我们使用的是小批量数据块，输入的样本数量可能不固定，所以我们在数量那里我们要使用None作为可变数量。输入X的维度为[None,n_H0,n_W0,n_C0]，对应的Y是[None,n_y]。

def create_placeholders(n_H0, n_W0, n_C0, n_y):    """    Creates the placeholders for the tensorflow session.        Arguments:    n_H0 -- scalar, height of an input image    n_W0 -- scalar, width of an input image    n_C0 -- scalar, number of channels of the input    n_y -- scalar, number of classes            Returns:    X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0] and dtype "float"    Y -- placeholder for the input labels, of shape [None, n_y] and dtype "float"    """    ### START CODE HERE ### (≈2 lines)    X = tf.placeholder(tf.float32 , [None , n_H0 , n_W0 , n_C0])    Y = tf.placeholder(tf.float32 , [None , n_y])    ### END CODE HERE ###        return X, Y

补全initialize_parameters

# GRADED FUNCTION: initialize_parametersdef initialize_parameters():    """    Initializes weight parameters to build a neural network with tensorflow. The shapes are:                        W1 : [4, 4, 3, 8]                        W2 : [2, 2, 8, 16]    Returns:    parameters -- a dictionary of tensors containing W1, W2    """        tf.set_random_seed(1)                              # so that your "random" numbers match ours            ### START CODE HERE ### (approx. 2 lines of code)    W1 = tf.get_variable("W1",[4,4,3,8],initializer = tf.contrib.layers.xavier_initializer(seed = 0))    W2 = tf.get_variable("W2",[2,2,8,16],initializer = tf.contrib.layers.xavier_initializer(seed = 0))    ### END CODE HERE ###    parameters = {"W1": W1,                  "W2": W2}        return parameters

补全 forward_propagation

tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = ‘SAME’)#给定输入X和一组过滤器W1，这个函数将会自动使用W1来对X进行卷积，第三个输入参数是[1,s,s,1]是指对于输入 (m, n_H_prev, n_W_prev, n_C_prev)而言，每次滑动的步伐

tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = ‘SAME’)#给定输入X，该函数将会使用大小为（f,f）以及步伐为(s,s)的窗口对其进行滑动取最大值

```
tf.nn.relu(Z1)#计算Z1的ReLU激活
```

tf.contrib.layers.flatten(P)#给定一个输入P，此函数将会把每个样本转化成一维的向量，然后返回一个tensor变量，其维度为（batch_size,k）

tf.contrib.layers.fully_connected(F, num_outputs)#给定一个已经一维化了的输入F，此函数将会返回一个由全连接层计算过后的输出#全连接层会自动初始化权值且在你训练模型的时候它也会一直参与，所以当我们初始化参数的时候我们不需要专门去初始化它的权值

# GRADED FUNCTION: forward_propagationdef forward_propagation(X, parameters):    """    Implements the forward propagation for the model:    CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED        Arguments:    X -- input dataset placeholder, of shape (input size, number of examples)    parameters -- python dictionary containing your parameters "W1", "W2"                  the shapes are given in initialize_parameters    Returns:    Z3 -- the output of the last LINEAR unit    """        # Retrieve the parameters from the dictionary "parameters"     W1 = parameters['W1']    W2 = parameters['W2']        ### START CODE HERE ###    # CONV2D: stride of 1, padding 'SAME'    Z1 = tf.nn.conv2d(X , W1 , strides=[1,1,1,1] , padding='SAME')    # RELU    A1 = tf.nn.relu(Z1)    # MAXPOOL: window 8x8, sride 8, padding 'SAME'    P1 = tf.nn.max_pool(A1, ksize = [1,8,8,1], strides = [1,8,8,1], padding = 'SAME')    # CONV2D: filters W2, stride 1, padding 'SAME'    Z2 = tf.nn.conv2d(P1,W2, strides = [1,1,1,1], padding = 'SAME')    # RELU    A2 = tf.nn.relu(Z2)    # MAXPOOL: window 4x4, stride 4, padding 'SAME'    P2 = tf.nn.max_pool(A2, ksize = [1,4,4,1],strides = [1,4,4,1],padding = "SAME")    # FLATTEN    P2 = tf.contrib.layers.flatten(P2)    # FULLY-CONNECTED without non-linear activation function (not not call softmax).    # 6 neurons in output layer. Hint: one of the arguments should be "activation_fn=None"     Z3 = tf.contrib.layers.fully_connected(P2, 6,activation_fn = None)    ### END CODE HERE ###    return Z3

补全compute_cost

tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y)#计算softmax的损失函数。#这个函数既计算softmax的激活，也计算其损失

tf.reduce_mean()#计算的是所有样本损失的平均值

# GRADED FUNCTION: compute_cost def compute_cost(Z3, Y):    """    Computes the cost        Arguments:    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)    Y -- "true" labels vector placeholder, same shape as Z3        Returns:    cost - Tensor of the cost function    """        ### START CODE HERE ### (1 line of code)    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y))    ### END CODE HERE ###        return cost

搭建并训练整个模型

为 X , Y 创建占位符
初始化参数
前向传播
计算cost
创建一个优化器
minibatch训练

# GRADED FUNCTION: modeldef model(X_train, Y_train, X_test, Y_test, learning_rate = 0.009,          num_epochs = 100, minibatch_size = 64, print_cost = True):    """    Implements a three-layer ConvNet in Tensorflow:    CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED        Arguments:    X_train -- training set, of shape (None, 64, 64, 3)    Y_train -- test set, of shape (None, n_y = 6)    X_test -- training set, of shape (None, 64, 64, 3)    Y_test -- test set, of shape (None, n_y = 6)    learning_rate -- learning rate of the optimization    num_epochs -- number of epochs of the optimization loop    minibatch_size -- size of a minibatch    print_cost -- True to print the cost every 100 epochs        Returns:    train_accuracy -- real number, accuracy on the train set (X_train)    test_accuracy -- real number, testing accuracy on the test set (X_test)    parameters -- parameters learnt by the model. They can then be used to predict.    """        ops.reset_default_graph()                         # to be able to rerun the model without overwriting tf variables    tf.set_random_seed(1)                             # to keep results consistent (tensorflow seed)    seed = 3                                          # to keep results consistent (numpy seed)    (m, n_H0, n_W0, n_C0) = X_train.shape                 n_y = Y_train.shape[1]                                costs = []                                        # To keep track of the cost        # Create Placeholders of the correct shape    ### START CODE HERE ### (1 line)    X, Y = create_placeholders(n_H0, n_W0, n_C0, n_y)    ### END CODE HERE ###    # Initialize parameters    ### START CODE HERE ### (1 line)    parameters = initialize_parameters()    ### END CODE HERE ###        # Forward propagation: Build the forward propagation in the tensorflow graph    ### START CODE HERE ### (1 line)    Z3 = forward_propagation(X, parameters)    ### END CODE HERE ###        # Cost function: Add cost function to tensorflow graph    ### START CODE HERE ### (1 line)    cost = compute_cost(Z3, Y)    ### END CODE HERE ###        # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer that minimizes the cost.    ### START CODE HERE ### (1 line)    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)    ### END CODE HERE ###        # Initialize all the variables globally    init = tf.global_variables_initializer()         # Start the session to compute the tensorflow graph    with tf.Session() as sess:                # Run the initialization        sess.run(init)                # Do the training loop        for epoch in range(num_epochs):            minibatch_cost = 0.            num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set            seed = seed + 1            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)            for minibatch in minibatches:                # Select a minibatch                (minibatch_X, minibatch_Y) = minibatch                # IMPORTANT: The line that runs the graph on a minibatch.                # Run the session to execute the optimizer and the cost, the feedict should contain a minibatch for (X,Y).                ### START CODE HERE ### (1 line)                _ , temp_cost = sess.run([optimizer,cost],feed_dict={X:minibatch_X, Y:minibatch_Y})                ### END CODE HERE ###                                minibatch_cost += temp_cost / num_minibatches                            # Print the cost every epoch            if print_cost == True and epoch % 5 == 0:                print ("Cost after epoch %i: %f" % (epoch, minibatch_cost))            if print_cost == True and epoch % 1 == 0:                costs.append(minibatch_cost)                        # plot the cost        plt.plot(np.squeeze(costs))        plt.ylabel('cost')        plt.xlabel('iterations (per tens)')        plt.title("Learning rate =" + str(learning_rate))        plt.show()        # Calculate the correct predictions        predict_op = tf.argmax(Z3, 1)        correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))                # Calculate accuracy on the test set        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))        print(accuracy)        train_accuracy = accuracy.eval({X: X_train, Y: Y_train})        test_accuracy = accuracy.eval({X: X_test, Y: Y_test})        print("Train Accuracy:", train_accuracy)        print("Test Accuracy:", test_accuracy)                        return train_accuracy, test_accuracy, parameters

在这里插入图片描述

Tensor(“Mean_1:0”, shape=(), dtype=float32)

Train Accuracy: 0.86851853

Test Accuracy: 0.73333335

最后，点个赞

在这里插入图片描述

Residual Networks

深层网络可以拟合特别复杂的函数，但是容易造成梯度消失问题。

Residual Networks加了 ‘shortcut’ ，以便于梯度能够直接传播到前面的层

本次实验的Residual Networks主要由两种block构成

identity block
- 是ResNets中的一个标准block，使用在输入与输出的维度相同的时候
- 在 ‘shortcut’ 路径上没有进行卷积

在这里插入图片描述

这里我们根据实验中的设计，补全identity_block函数

# GRADED FUNCTION: identity_blockdef identity_block(X, f, filters, stage, block):    """    Implementation of the identity block as defined in Figure 3        Arguments:    X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)    f -- integer, specifying the shape of the middle CONV's window for the main path    filters -- python list of integers, defining the number of filters in the CONV layers of the main path    stage -- integer, used to name the layers, depending on their position in the network    block -- string/character, used to name the layers, depending on their position in the network        Returns:    X -- output of the identity block, tensor of shape (n_H, n_W, n_C)    """        # defining name basis    conv_name_base = 'res' + str(stage) + block + '_branch'    bn_name_base = 'bn' + str(stage) + block + '_branch'        # Retrieve Filters    F1, F2, F3 = filters        # Save the input value. You'll need this later to add back to the main path.     X_shortcut = X        # First component of main path    X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)    X = Activation('relu')(X)        ### START CODE HERE ###        # Second component of main path (≈3 lines)    X = Conv2D(filters=F2,kernel_size =(f,f),strides=(1,1),padding = 'SAME',name = conv_name_base +'2b',kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis=3 , name = bn_name_base + '2b')(X)    X = Activation('relu')(X)    # Third component of main path (≈2 lines)    X = Conv2D(filters=F3,kernel_size =(1,1),strides=(1,1),padding = 'VALID',name = conv_name_base +'2c',kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis=3 , name = bn_name_base + '2c')(X)    # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)    X = Add()([X,X_shortcut])    X = Activation('relu')(X)        ### END CODE HERE ###        return X

convolutional block
- 使用在输入与输出的维度不同时
- 与上面identity block的唯一区别就是在‘shortcut’路径上也进行了卷积

在这里插入图片描述

这里我们根据实验中的设计，补全convolutional_block函数

# GRADED FUNCTION: convolutional_blockdef convolutional_block(X, f, filters, stage, block, s = 2):    """    Implementation of the convolutional block as defined in Figure 4        Arguments:    X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)    f -- integer, specifying the shape of the middle CONV's window for the main path    filters -- python list of integers, defining the number of filters in the CONV layers of the main path    stage -- integer, used to name the layers, depending on their position in the network    block -- string/character, used to name the layers, depending on their position in the network    s -- Integer, specifying the stride to be used        Returns:    X -- output of the convolutional block, tensor of shape (n_H, n_W, n_C)    """        # defining name basis    conv_name_base = 'res' + str(stage) + block + '_branch'    bn_name_base = 'bn' + str(stage) + block + '_branch'        # Retrieve Filters    F1, F2, F3 = filters        # Save the input value    X_shortcut = X    ##### MAIN PATH #####    # First component of main path     X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)    X = Activation('relu')(X)        ### START CODE HERE ###    # Second component of main path (≈3 lines)    X = Conv2D(F2,(f,f),strides=(1,1),padding='SAME',name = conv_name_base+'2b',kernel_initializer=glorot_uniform(seed=0))(X)    X = BatchNormalization(axis=3,name = bn_name_base+'2b')(X)    X = Activation('relu')(X)    # Third component of main path (≈2 lines)    X = Conv2D(F3,(1,1),strides=(1,1),padding='VALID',name=conv_name_base+'2c',kernel_initializer=glorot_uniform(seed=0))(X)    X = BatchNormalization(axis=3,name=bn_name_base+'2c')(X)    ##### SHORTCUT PATH #### (≈2 lines)    X_shortcut = Conv2D(F3,(1,1),strides=(s,s),padding='VALID',name=conv_name_base+'1',kernel_initializer = glorot_uniform(seed=0))(X_shortcut)    X_shortcut = BatchNormalization(axis=3,name = bn_name_base+'1')(X_shortcut)    # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)    X = Add()([X,X_shortcut])    X = Activation('relu')(X)        ### END CODE HERE ###        return X

最后我们根据上面构造的两个 ResNets 基本块，进行整个网络的搭建与训练

补全ResNet50函数

# GRADED FUNCTION: ResNet50def ResNet50(input_shape = (64, 64, 3), classes = 6):    """    Implementation of the popular ResNet50 the following architecture:    CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3    -> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL -> TOPLAYER    Arguments:    input_shape -- shape of the images of the dataset    classes -- integer, number of classes    Returns:    model -- a Model() instance in Keras    """        # Define the input as a tensor with shape input_shape    X_input = Input(input_shape)        # Zero-Padding    X = ZeroPadding2D((3, 3))(X_input)        # Stage 1    X = Conv2D(64, (7, 7), strides = (2, 2), name = 'conv1', kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis = 3, name = 'bn_conv1')(X)    X = Activation('relu')(X)    X = MaxPooling2D((3, 3), strides=(2, 2))(X)    # Stage 2    X = convolutional_block(X, f = 3, filters = [64, 64, 256], stage = 2, block='a', s = 1)    X = identity_block(X, 3, [64, 64, 256], stage=2, block='b')    X = identity_block(X, 3, [64, 64, 256], stage=2, block='c')    ### START CODE HERE ###    # Stage 3 (≈4 lines)    X = convolutional_block(X,f=3,filters=[128,128,512],stage=3,block='a',s=2)    X = identity_block(X,3,[128,128,512],stage=3,block='b')    X = identity_block(X,3,[128,128,512],stage=3,block='c')    X = identity_block(X,3,[128,128,512],stage=3,block='d')    # Stage 4 (≈6 lines)    X = convolutional_block(X,f=3,filters=[256,256,1024],stage=4,block='a',s=2)    X = identity_block(X,3,[256,256,1024],stage=4,block='b')    X = identity_block(X,3,[256,256,1024],stage=4,block='c')    X = identity_block(X,3,[256,256,1024],stage=4,block='d')    X = identity_block(X,3,[256,256,1024],stage=4,block='e')    X = identity_block(X,3,[256,256,1024],stage=4,block='f')    # Stage 5 (≈3 lines)    X = convolutional_block(X,f=3,filters=[512,512,2048],stage=5,block='a',s=2)    X = identity_block(X,3,[512,512,2048],stage=5,block='b')    X = identity_block(X,3,[512,512,2048],stage=5,block='c')    # AVGPOOL (≈1 line). Use "X = AveragePooling2D(...)(X)"    X = AveragePooling2D((2,2),name='avg_pool')(X)        ### END CODE HERE ###    # output layer    X = Flatten()(X)    X = Dense(classes, activation='softmax', name='fc' + str(classes), kernel_initializer = glorot_uniform(seed=0))(X)       # Create model    model = Model(inputs = X_input, outputs = X, name='ResNet50')    return model