吴恩达深度学习作业04

Step by Step

卷积

  • 补全zero_pad

    X_pad = np.pad(X,((0,0),(pad,pad),(pad,pad),(0,0)),'constant',constant_values = 0) #第二个参数是对哪些维度的两边加怎么样的padding
    
  • 补全conv_single_step

    ### START CODE HERE ### (≈ 2 lines of code)
    # Element-wise product between a_slice and W. Add bias.
    s = np.multiply(a_slice_prev , W)
    # Sum over all entries of the volume s
    Z = np.sum(s) + float(b)
    ### END CODE HERE ###
    
    • 将卷积核与要卷积的部分,对应位置相乘之后求和,再加上一个bias
  • 补全conv_forward

    • 按卷积核提取出要卷积的数据
    • 卷积并赋值
     ### START CODE HERE ###
        # Retrieve dimensions from A_prev's shape (≈1 line)  
        (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
        
        # Retrieve dimensions from W's shape (≈1 line)
        (f, f, n_C_prev, n_C) = W.shape
        
        # Retrieve information from "hparameters" (≈2 lines)
        stride = hparameters['stride']
        pad = hparameters['pad']
        
        # Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
        #计算卷积之后的结果矩阵的大小
        n_H = int((n_H_prev - f + 2 * pad) / stride + 1)
        n_W = int((n_W_prev - f + 2 * pad) / stride + 1)
        
        # Initialize the output volume Z with zeros. (≈1 line)
        Z = np.zeros((m , n_H , n_W , n_C))
        
        # Create A_prev_pad by padding A_prev
        A_prev_pad = zero_pad(A_prev , pad)
        
        for i in range(m):# 遍历所有训练数据                               
            a_prev_pad = A_prev_pad[i]#选出第 i 个训练数据 , 大小为 (n_H, n_W, n_C)
            for h in range(n_H):                           # loop over vertical axis of the output volume
                for w in range(n_W):                       # loop over horizontal axis of the output volume
                    for c in range(n_C):                   # loop over channels (= #filters) of the output volume
                        
                        # Find the corners of the current "slice" (≈4 lines)
                        vert_start = h * stride
                        vert_end = vert_start + f
                        horiz_start = w * stride
                        horiz_end = horiz_start + f
                        
                        # 选取要进行卷积的片段 
                        a_slice_prev = a_prev_pad[vert_start:vert_end , horiz_start:horiz_end , :]
                        
                        # 进行单步卷积
                        Z[i, h, w, c] = conv_single_step(a_slice_prev , W[:,:,:,c] , b[:,:,:,c])
                                            
        ### END CODE HERE ###
        
    

池化

在这里插入图片描述

在这里插入图片描述

  • 补全pool_forward

    • 提取数据的方式和卷积相同,先提取要池化的数据
    • 然后进行池化,并赋值
    ### START CODE HERE ###
        for i in range(m):                         # loop over the training examples
            for h in range(n_H):                     # loop on the vertical axis of the output volume
                for w in range(n_W):                 # loop on the horizontal axis of the output volume
                    for c in range (n_C):            # loop over the channels of the output volume
                        
                        # Find the corners of the current "slice" (≈4 lines)
                        vert_start = h * stride
                        vert_end = vert_start + f
                        horiz_start = w * stride
                        horiz_end = horiz_start + f
                        
                        # Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)
                        a_prev_slice = A_prev[i , vert_start:vert_end , horiz_start:horiz_end , c]
                        
                        # Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.
                        if mode == "max":
                            A[i, h, w, c] = np.max(a_prev_slice)
                        elif mode == "average":
                            A[i, h, w, c] = np.average(a_prev_slice)
        
        ### END CODE HERE ###
    

卷积BP

BP推导
  • 首先求损失函数对卷积层输入的梯度,令卷积层输入为 A A A , 卷积层输出为 Z Z Z

    • d A + = ∑ h = 0 n H ∑ w = 0 n W W c × d Z h w dA += \sum _{h=0} ^{n_H} \sum_{w=0} ^{n_W} W_c \times dZ_{hw} dA+=h=0nHw=0nWWc×dZhw
      在这里插入图片描述

      • 这里的 W c W_c Wc 就是一个卷积,也就是一个与原数据进行点乘的矩阵
      • d Z h w dZ_{hw} dZhw 指的是损失函数对卷积结果矩阵中元素 (h , w) 的梯度。
        • 容易看出的是,损失是个标量 , Z h w Z_{hw} Zhw 也是标量,因此标量对标量求导 d Z h w dZ_{hw} dZhw 也是标量
      • 对黄色区域数据来说, d A y e l l o w = W c × d Z 00 dA_{yellow} = W_c \times dZ_{00} dAyellow=Wc×dZ00
      • 我们遍历右边的卷积结果矩阵,每个位置都会产生 d A dA dA ,总的结果是将所有位置产生的梯度相加
  • 然后求 d W c dW_c dWc

    • d W c + = ∑ h = 0 n H ∑ w = 0 n W a s l i c e × d Z h w dW_c += \sum _{h=0} ^{n_H} \sum_{w=0} ^ {n_W} a_{slice} \times dZ_{hw} dWc+=h=0nHw=0nWaslice×dZhw
      • 容易看出,对 W W W 的梯度只是将对 A A A 的梯度中的 W c W_c Wc 换成了 α s l i c e \alpha_{slice} αslice
      • 这很容易理解,对上图黄色区域来说 Z h w = W c × α s l i c e Z_{hw} = W_c \times \alpha_{slice} Zhw=Wc×αslice ,因为是点乘,因此 Z h w Z_{hw} Zhw W c W_c Wc 的导数是 α s l i c e \alpha_{slice} αslice , 对 α s l i c e \alpha_{slice} αslice 的导数是 W c W_c Wc
  • 最后求 d b db db

    • d b = ∑ h ∑ w d Z h w db = \sum_h \sum_w dZ_{hw} db=hwdZhw
    • 与上面不同的是, Z h w Z_{hw} Zhw b b b 的导数是 1
  • 补全conv_backward

    • 根据上面的推导,该函数很容易补全
        ### START CODE HERE ###
        # Retrieve information from "cache"
        (A_prev, W, b, hparameters) = cache
        
        # Retrieve dimensions from A_prev's shape
        (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
        
        # Retrieve dimensions from W's shape
        (f, f, n_C_prev, n_C) = W.shape
        
        # Retrieve information from "hparameters"
        stride = hparameters['stride']
        pad = hparameters['pad']
        
        # Retrieve dimensions from dZ's shape
        (m, n_H, n_W, n_C) = dZ.shape
        
        # Initialize dA_prev, dW, db with the correct shapes
        dA_prev = np.zeros(A_prev.shape)                           
        dW = np.zeros(W.shape)
        db = np.zeros(b.shape)
    
        # Pad A_prev and dA_prev
        A_prev_pad = zero_pad(A_prev , pad)
        dA_prev_pad = zero_pad(dA_prev , pad)
        
        for i in range(m):                       # loop over the training examples
            
            # select ith training example from A_prev_pad and dA_prev_pad
            a_prev_pad = A_prev_pad[i]
            da_prev_pad = dA_prev_pad[i]
            
            for h in range(n_H):                   # loop over vertical axis of the output volume
                for w in range(n_W):               # loop over horizontal axis of the output volume
                    for c in range(n_C):           # loop over the channels of the output volume
                        
                        # Find the corners of the current "slice"
                        vert_start = h * stride
                        vert_end = vert_start + f
                        horiz_start = w * stride
                        horiz_end = horiz_start + f
                        
                        # Use the corners to define the slice from a_prev_pad
                        a_slice = a_prev_pad[vert_start:vert_end , horiz_start:horiz_end , :]
    
                        # Update gradients for the window and the filter's parameters using the code formulas given above
                        da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i , h , w , c]
                        dW[:,:,:,c] += a_slice * dZ[i , h , w , c]
                        db[:,:,:,c] += dZ[i , h , w, c]
                        
            # Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])
            dA_prev[i, :, :, :] = da_prev_pad[pad:-pad , pad:-pad]
        ### END CODE HERE ###
    

池化BP

最大值池化

在这里插入图片描述

可见,对上图来说,最大值池化只有最大值起作用,因此池化结果的梯度由池化输入的最大值造成。这时,我们可以在池化之前的梯度上加上一个mask,用于只让最大值对梯度产生影响。

对于一次池化来说池化结果 Z Z Z 与池化的输入 A A A 之间的关系为 Z = m a x ( A ) Z = max(A) Z=max(A)

def create_mask_from_window(x):
    """
    Creates a mask from an input matrix x, to identify the max entry of x.
    
    Arguments:
    x -- Array of shape (f, f)
    
    Returns:
    mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x.
    """
    
    ### START CODE HERE ### (≈1 line)
    mask = (x == np.max(x))
    ### END CODE HERE ###
    
    return mask

我们将mask添加到梯度的计算上
在这里插入图片描述

if mode == "max":
    a_prev_slice = a_prev[vert_start:vert_end,horiz_start:horiz_end,c]#看作上图左上角蓝色区域
    
    mask = create_mask_from_window(a_prev_slice)#计算该区域对应的mask,即除了 7 的位置为 1 外,其余位置都是 0
   #池化结果的梯度由 7 造成,因此我们将原梯度点乘 mask,只让 7 发挥作用
    dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += mask * dA[i,h,w,c]
平均值池化

在这里插入图片描述

对上面左上角区域来说,令池化的输出为 Z Z Z , 输入为 A A A , 那么 Z = 1 4 s u m ( A ) = 1 4 ( a 1 + a 2 + a 3 + a 4 ) Z = \frac{1}{4}sum(A) = \frac{1}{4}(a_1+a_2+a_3+a_4) Z=41sum(A)=41(a1+a2+a3+a4)。也就是说 A A A 中的四个位置的变化对 Z Z Z 的影响程度是相同的。比如左上角的 2 变成 6 , 那么输出结果就是 5。如果不是 2 变而是 3 变成 7 , 输出结果同样是 5。因此 Z Z Z 的变化是由 A A A 中元素共同影响的,且影响相同。同样,由公式,可得 d Z = 1 4 ( d a 1 + d a 2 + d a 3 + d a 4 ) dZ = \frac{1}{4}(da_1+da_2+da_3+da_4) dZ=41(da1+da2+da3+da4)

也就是说,池化结果的梯度(也就是变化程度)由池化的输入元素均摊

这里我们写一个影响均摊函数,返回一个 mask

def distribute_value(dz, shape):
    """
    Distributes the input value in the matrix of dimension shape
    
    Arguments:
    dz -- input scalar
    shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz
    
    Returns:
    a -- Array of size (n_H, n_W) for which we distributed the value of dz
    """
    
    ### START CODE HERE ###
    # Retrieve dimensions from shape (≈1 line)
    (n_H, n_W) = shape
    
    # Compute the value to distribute on the matrix (≈1 line)
    average = dz / (n_H * n_W)
    
    # Create a matrix where every entry is the "average" value (≈1 line)
    a = np.zeros(shape) + average
    ### END CODE HERE ###
    
    return a

我们将 mask 添加到计算上

在这里插入图片描述

elif mode == "average":               da = dA[i,h,w,c] #可以看做上图右边的4的梯度        shape = [f,f] #均摊框的大小    #输出 4 的变化由左上角四个元素均摊    dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da,shape)

Application

TensorFlow要求您为运行会话时将输入到模型中的输入数据创建占位符。现在我们要实现创建占位符的函数,因为我们使用的是小批量数据块,输入的样本数量可能不固定,所以我们在数量那里我们要使用None作为可变数量。输入X的维度为[None,n_H0,n_W0,n_C0],对应的Y是[None,n_y]。

def create_placeholders(n_H0, n_W0, n_C0, n_y):    """    Creates the placeholders for the tensorflow session.        Arguments:    n_H0 -- scalar, height of an input image    n_W0 -- scalar, width of an input image    n_C0 -- scalar, number of channels of the input    n_y -- scalar, number of classes            Returns:    X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0] and dtype "float"    Y -- placeholder for the input labels, of shape [None, n_y] and dtype "float"    """    ### START CODE HERE ### (≈2 lines)    X = tf.placeholder(tf.float32 , [None , n_H0 , n_W0 , n_C0])    Y = tf.placeholder(tf.float32 , [None , n_y])    ### END CODE HERE ###        return X, Y

补全initialize_parameters

# GRADED FUNCTION: initialize_parametersdef initialize_parameters():    """    Initializes weight parameters to build a neural network with tensorflow. The shapes are:                        W1 : [4, 4, 3, 8]                        W2 : [2, 2, 8, 16]    Returns:    parameters -- a dictionary of tensors containing W1, W2    """        tf.set_random_seed(1)                              # so that your "random" numbers match ours            ### START CODE HERE ### (approx. 2 lines of code)    W1 = tf.get_variable("W1",[4,4,3,8],initializer = tf.contrib.layers.xavier_initializer(seed = 0))    W2 = tf.get_variable("W2",[2,2,8,16],initializer = tf.contrib.layers.xavier_initializer(seed = 0))    ### END CODE HERE ###    parameters = {"W1": W1,                  "W2": W2}        return parameters

补全 forward_propagation

  • tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = ‘SAME’)#给定输入X和一组过滤器W1,这个函数将会自动使用W1来对X进行卷积,第三个输入参数是[1,s,s,1]是指对于输入 (m, n_H_prev, n_W_prev, n_C_prev)而言,每次滑动的步伐
    
  • tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = ‘SAME’)#给定输入X,该函数将会使用大小为(f,f)以及步伐为(s,s)的窗口对其进行滑动取最大值
    
  • tf.nn.relu(Z1)#计算Z1的ReLU激活
    
  • tf.contrib.layers.flatten(P)#给定一个输入P,此函数将会把每个样本转化成一维的向量,然后返回一个tensor变量,其维度为(batch_size,k)
    
  • tf.contrib.layers.fully_connected(F, num_outputs)#给定一个已经一维化了的输入F,此函数将会返回一个由全连接层计算过后的输出#全连接层会自动初始化权值且在你训练模型的时候它也会一直参与,所以当我们初始化参数的时候我们不需要专门去初始化它的权值
    
# GRADED FUNCTION: forward_propagationdef forward_propagation(X, parameters):    """    Implements the forward propagation for the model:    CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED        Arguments:    X -- input dataset placeholder, of shape (input size, number of examples)    parameters -- python dictionary containing your parameters "W1", "W2"                  the shapes are given in initialize_parameters    Returns:    Z3 -- the output of the last LINEAR unit    """        # Retrieve the parameters from the dictionary "parameters"     W1 = parameters['W1']    W2 = parameters['W2']        ### START CODE HERE ###    # CONV2D: stride of 1, padding 'SAME'    Z1 = tf.nn.conv2d(X , W1 , strides=[1,1,1,1] , padding='SAME')    # RELU    A1 = tf.nn.relu(Z1)    # MAXPOOL: window 8x8, sride 8, padding 'SAME'    P1 = tf.nn.max_pool(A1, ksize = [1,8,8,1], strides = [1,8,8,1], padding = 'SAME')    # CONV2D: filters W2, stride 1, padding 'SAME'    Z2 = tf.nn.conv2d(P1,W2, strides = [1,1,1,1], padding = 'SAME')    # RELU    A2 = tf.nn.relu(Z2)    # MAXPOOL: window 4x4, stride 4, padding 'SAME'    P2 = tf.nn.max_pool(A2, ksize = [1,4,4,1],strides = [1,4,4,1],padding = "SAME")    # FLATTEN    P2 = tf.contrib.layers.flatten(P2)    # FULLY-CONNECTED without non-linear activation function (not not call softmax).    # 6 neurons in output layer. Hint: one of the arguments should be "activation_fn=None"     Z3 = tf.contrib.layers.fully_connected(P2, 6,activation_fn = None)    ### END CODE HERE ###    return Z3

补全compute_cost

tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y)#计算softmax的损失函数。#这个函数既计算softmax的激活,也计算其损失
tf.reduce_mean()#计算的是所有样本损失的平均值
# GRADED FUNCTION: compute_cost def compute_cost(Z3, Y):    """    Computes the cost        Arguments:    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)    Y -- "true" labels vector placeholder, same shape as Z3        Returns:    cost - Tensor of the cost function    """        ### START CODE HERE ### (1 line of code)    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y))    ### END CODE HERE ###        return cost

搭建并训练整个模型

  • 为 X , Y 创建占位符
  • 初始化参数
  • 前向传播
  • 计算cost
  • 创建一个优化器
  • minibatch训练
# GRADED FUNCTION: modeldef model(X_train, Y_train, X_test, Y_test, learning_rate = 0.009,          num_epochs = 100, minibatch_size = 64, print_cost = True):    """    Implements a three-layer ConvNet in Tensorflow:    CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED        Arguments:    X_train -- training set, of shape (None, 64, 64, 3)    Y_train -- test set, of shape (None, n_y = 6)    X_test -- training set, of shape (None, 64, 64, 3)    Y_test -- test set, of shape (None, n_y = 6)    learning_rate -- learning rate of the optimization    num_epochs -- number of epochs of the optimization loop    minibatch_size -- size of a minibatch    print_cost -- True to print the cost every 100 epochs        Returns:    train_accuracy -- real number, accuracy on the train set (X_train)    test_accuracy -- real number, testing accuracy on the test set (X_test)    parameters -- parameters learnt by the model. They can then be used to predict.    """        ops.reset_default_graph()                         # to be able to rerun the model without overwriting tf variables    tf.set_random_seed(1)                             # to keep results consistent (tensorflow seed)    seed = 3                                          # to keep results consistent (numpy seed)    (m, n_H0, n_W0, n_C0) = X_train.shape                 n_y = Y_train.shape[1]                                costs = []                                        # To keep track of the cost        # Create Placeholders of the correct shape    ### START CODE HERE ### (1 line)    X, Y = create_placeholders(n_H0, n_W0, n_C0, n_y)    ### END CODE HERE ###    # Initialize parameters    ### START CODE HERE ### (1 line)    parameters = initialize_parameters()    ### END CODE HERE ###        # Forward propagation: Build the forward propagation in the tensorflow graph    ### START CODE HERE ### (1 line)    Z3 = forward_propagation(X, parameters)    ### END CODE HERE ###        # Cost function: Add cost function to tensorflow graph    ### START CODE HERE ### (1 line)    cost = compute_cost(Z3, Y)    ### END CODE HERE ###        # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer that minimizes the cost.    ### START CODE HERE ### (1 line)    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)    ### END CODE HERE ###        # Initialize all the variables globally    init = tf.global_variables_initializer()         # Start the session to compute the tensorflow graph    with tf.Session() as sess:                # Run the initialization        sess.run(init)                # Do the training loop        for epoch in range(num_epochs):            minibatch_cost = 0.            num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set            seed = seed + 1            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)            for minibatch in minibatches:                # Select a minibatch                (minibatch_X, minibatch_Y) = minibatch                # IMPORTANT: The line that runs the graph on a minibatch.                # Run the session to execute the optimizer and the cost, the feedict should contain a minibatch for (X,Y).                ### START CODE HERE ### (1 line)                _ , temp_cost = sess.run([optimizer,cost],feed_dict={X:minibatch_X, Y:minibatch_Y})                ### END CODE HERE ###                                minibatch_cost += temp_cost / num_minibatches                            # Print the cost every epoch            if print_cost == True and epoch % 5 == 0:                print ("Cost after epoch %i: %f" % (epoch, minibatch_cost))            if print_cost == True and epoch % 1 == 0:                costs.append(minibatch_cost)                        # plot the cost        plt.plot(np.squeeze(costs))        plt.ylabel('cost')        plt.xlabel('iterations (per tens)')        plt.title("Learning rate =" + str(learning_rate))        plt.show()        # Calculate the correct predictions        predict_op = tf.argmax(Z3, 1)        correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))                # Calculate accuracy on the test set        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))        print(accuracy)        train_accuracy = accuracy.eval({X: X_train, Y: Y_train})        test_accuracy = accuracy.eval({X: X_test, Y: Y_test})        print("Train Accuracy:", train_accuracy)        print("Test Accuracy:", test_accuracy)                        return train_accuracy, test_accuracy, parameters

在这里插入图片描述

Tensor(“Mean_1:0”, shape=(), dtype=float32)

Train Accuracy: 0.86851853

Test Accuracy: 0.73333335

最后,点个赞

在这里插入图片描述

Residual Networks

深层网络可以拟合特别复杂的函数,但是容易造成梯度消失问题。

Residual Networks加了 ‘shortcut’ , 以便于梯度能够直接传播到前面的层

本次实验的Residual Networks主要由两种block构成

  • identity block

    • 是ResNets中的一个标准block,使用在输入与输出的维度相同的时候
    • 在 ‘shortcut’ 路径上没有进行卷积

在这里插入图片描述

  • 这里我们根据实验中的设计,补全identity_block函数

    # GRADED FUNCTION: identity_blockdef identity_block(X, f, filters, stage, block):    """    Implementation of the identity block as defined in Figure 3        Arguments:    X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)    f -- integer, specifying the shape of the middle CONV's window for the main path    filters -- python list of integers, defining the number of filters in the CONV layers of the main path    stage -- integer, used to name the layers, depending on their position in the network    block -- string/character, used to name the layers, depending on their position in the network        Returns:    X -- output of the identity block, tensor of shape (n_H, n_W, n_C)    """        # defining name basis    conv_name_base = 'res' + str(stage) + block + '_branch'    bn_name_base = 'bn' + str(stage) + block + '_branch'        # Retrieve Filters    F1, F2, F3 = filters        # Save the input value. You'll need this later to add back to the main path.     X_shortcut = X        # First component of main path    X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)    X = Activation('relu')(X)        ### START CODE HERE ###        # Second component of main path (≈3 lines)    X = Conv2D(filters=F2,kernel_size =(f,f),strides=(1,1),padding = 'SAME',name = conv_name_base +'2b',kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis=3 , name = bn_name_base + '2b')(X)    X = Activation('relu')(X)    # Third component of main path (≈2 lines)    X = Conv2D(filters=F3,kernel_size =(1,1),strides=(1,1),padding = 'VALID',name = conv_name_base +'2c',kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis=3 , name = bn_name_base + '2c')(X)    # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)    X = Add()([X,X_shortcut])    X = Activation('relu')(X)        ### END CODE HERE ###        return X
    
  • convolutional block

    • 使用在输入与输出的维度不同时

    • 与上面identity block的唯一区别就是在‘shortcut’路径上也进行了卷积

在这里插入图片描述

  • 这里我们根据实验中的设计,补全convolutional_block函数

    # GRADED FUNCTION: convolutional_blockdef convolutional_block(X, f, filters, stage, block, s = 2):    """    Implementation of the convolutional block as defined in Figure 4        Arguments:    X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)    f -- integer, specifying the shape of the middle CONV's window for the main path    filters -- python list of integers, defining the number of filters in the CONV layers of the main path    stage -- integer, used to name the layers, depending on their position in the network    block -- string/character, used to name the layers, depending on their position in the network    s -- Integer, specifying the stride to be used        Returns:    X -- output of the convolutional block, tensor of shape (n_H, n_W, n_C)    """        # defining name basis    conv_name_base = 'res' + str(stage) + block + '_branch'    bn_name_base = 'bn' + str(stage) + block + '_branch'        # Retrieve Filters    F1, F2, F3 = filters        # Save the input value    X_shortcut = X    ##### MAIN PATH #####    # First component of main path     X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)    X = Activation('relu')(X)        ### START CODE HERE ###    # Second component of main path (≈3 lines)    X = Conv2D(F2,(f,f),strides=(1,1),padding='SAME',name = conv_name_base+'2b',kernel_initializer=glorot_uniform(seed=0))(X)    X = BatchNormalization(axis=3,name = bn_name_base+'2b')(X)    X = Activation('relu')(X)    # Third component of main path (≈2 lines)    X = Conv2D(F3,(1,1),strides=(1,1),padding='VALID',name=conv_name_base+'2c',kernel_initializer=glorot_uniform(seed=0))(X)    X = BatchNormalization(axis=3,name=bn_name_base+'2c')(X)    ##### SHORTCUT PATH #### (≈2 lines)    X_shortcut = Conv2D(F3,(1,1),strides=(s,s),padding='VALID',name=conv_name_base+'1',kernel_initializer = glorot_uniform(seed=0))(X_shortcut)    X_shortcut = BatchNormalization(axis=3,name = bn_name_base+'1')(X_shortcut)    # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)    X = Add()([X,X_shortcut])    X = Activation('relu')(X)        ### END CODE HERE ###        return X
    
  • 最后我们根据上面构造的两个 ResNets 基本块,进行整个网络的搭建与训练

    • 补全ResNet50函数

      # GRADED FUNCTION: ResNet50def ResNet50(input_shape = (64, 64, 3), classes = 6):    """    Implementation of the popular ResNet50 the following architecture:    CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3    -> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL -> TOPLAYER    Arguments:    input_shape -- shape of the images of the dataset    classes -- integer, number of classes    Returns:    model -- a Model() instance in Keras    """        # Define the input as a tensor with shape input_shape    X_input = Input(input_shape)        # Zero-Padding    X = ZeroPadding2D((3, 3))(X_input)        # Stage 1    X = Conv2D(64, (7, 7), strides = (2, 2), name = 'conv1', kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis = 3, name = 'bn_conv1')(X)    X = Activation('relu')(X)    X = MaxPooling2D((3, 3), strides=(2, 2))(X)    # Stage 2    X = convolutional_block(X, f = 3, filters = [64, 64, 256], stage = 2, block='a', s = 1)    X = identity_block(X, 3, [64, 64, 256], stage=2, block='b')    X = identity_block(X, 3, [64, 64, 256], stage=2, block='c')    ### START CODE HERE ###    # Stage 3 (≈4 lines)    X = convolutional_block(X,f=3,filters=[128,128,512],stage=3,block='a',s=2)    X = identity_block(X,3,[128,128,512],stage=3,block='b')    X = identity_block(X,3,[128,128,512],stage=3,block='c')    X = identity_block(X,3,[128,128,512],stage=3,block='d')    # Stage 4 (≈6 lines)    X = convolutional_block(X,f=3,filters=[256,256,1024],stage=4,block='a',s=2)    X = identity_block(X,3,[256,256,1024],stage=4,block='b')    X = identity_block(X,3,[256,256,1024],stage=4,block='c')    X = identity_block(X,3,[256,256,1024],stage=4,block='d')    X = identity_block(X,3,[256,256,1024],stage=4,block='e')    X = identity_block(X,3,[256,256,1024],stage=4,block='f')    # Stage 5 (≈3 lines)    X = convolutional_block(X,f=3,filters=[512,512,2048],stage=5,block='a',s=2)    X = identity_block(X,3,[512,512,2048],stage=5,block='b')    X = identity_block(X,3,[512,512,2048],stage=5,block='c')    # AVGPOOL (≈1 line). Use "X = AveragePooling2D(...)(X)"    X = AveragePooling2D((2,2),name='avg_pool')(X)        ### END CODE HERE ###    # output layer    X = Flatten()(X)    X = Dense(classes, activation='softmax', name='fc' + str(classes), kernel_initializer = glorot_uniform(seed=0))(X)       # Create model    model = Model(inputs = X_input, outputs = X, name='ResNet50')    return model
      
    • 使用 SIGNS 数据集训练并得到最终结果

      • 120/120 [==============================] - 3s 24ms/step
      • Loss = 0.5301783005396525
      • Test Accuracy = 0.8666666626930237
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

长命百岁️

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值