03-(第一课)第四周作业:深层神经网络

1. Building your Deep Neural Network: Step by Step

assiagment
step2

1.1 作业大纲

outline
为了搭建你的神经网络,你将实现几个”helper functions”.这几个函数将在下次作业被用于搭建两层和 L 层神经网络.下面是这次作业的大纲,你将:

  • 为两层和L层神经网络初始化参数

    • 实现前向传播模块(下图的紫色区域)
      • 完成一层的前向传播LINEAR部分(结果为 Z[l] ).
      • 我们给出ACTIVATION 函数(relu/sigmoid).
      • 将前两步结合为[LINEAR->ACTIVATION]前向函数.
      • 执行[LINEAR->RELU]前向函数 L1 次(从1到L-1层)最后加一次[LINEAR->SIGMOID](最后一层 L ).你将得到新的L_model_forward函数
    • 计算loss
    • 实现反向传播模块(下图的红色区域)
      • 完成一层的反向向传播LINEAR部分
      • 我们给出ACTIVATION 函数的梯度(relu_backward/sigmoid_backward)
      • 将前两步结合为[LINEAR->ACTIVATION]反向函数
      • 执行[LINEAR->RELU]反向函数L1次,加一次[LINEAR->SIGMOID].你将得到新的L_model_backward函数
    • 最后更新参数
      figure1
    • 1.2 初始化

      你将写两个helper functions用于初始化你的参数模型.第一个函数被用于初始化一个两层模型的参数.第二个用于初始化 L 层.

      1.2.1 2-layer Neural Network

      Instructions:

      • The model’s structure is: LINEAR -> RELU -> LINEAR -> SIGMOID.
      • Use random initialization for the weight matrices. Use np.random.randn(shape)*0.01 with the correct shape.(使用标准正态分布随机初始化权重矩阵并最后*0.01)
      • Use zero initialization for the biases. Use np.zeros(shape).(使用0初始化bias)
      def initialize_parameters(n_x, n_h, n_y):
          """
          Argument:
          n_x -- size of the input layer   (输入层的size)
          n_h -- size of the hidden layer  (隐藏层的size)
          n_y -- size of the output layer  (输出层的size)
      
          Returns:
          parameters -- python dictionary containing your parameters:(返回一个python字典包含你的参数)
                          W1 -- weight matrix of shape (n_h, n_x)
                          b1 -- bias vector of shape (n_h, 1)
                          W2 -- weight matrix of shape (n_y, n_h)
                          b2 -- bias vector of shape (n_y, 1)
          """
      
          np.random.seed(1)
      
          ### START CODE HERE ### (≈ 4 lines of code)
          W1 = np.random.randn(n_h, n_x) * 0.01
          b1 = np.zeros((n_h, 1))
          W2 = np.random.randn(n_y, n_h) * 0.01
          b2 = np.zeros((n_y, 1))
          ### END CODE HERE ###
      
          assert(W1.shape == (n_h, n_x))      # (断言语句判断shape是否正确,如果不正确将抛出异常)
          assert(b1.shape == (n_h, 1))
          assert(W2.shape == (n_y, n_h))
          assert(b2.shape == (n_y, 1))
      
          parameters = {"W1": W1,
                        "b1": b1,
                        "W2": W2,
                        "b2": b2}
      
          return parameters    

      1.2.2 L-layer Neural Network

      Linit
      Instructions:

      • The model’s structure is [LINEAR -> RELU] × (L-1) -> LINEAR -> SIGMOID. I.e., it has L1 layers using a ReLU activation function followed by an output layer with a sigmoid activation function.(有 L1 层使用ReLU激活函数,输出层用sigmoid激活函数)

      • Use random initialization for the weight matrices. Use np.random.rand(shape) * 0.01.
      • Use zeros initialization for the biases. Use np.zeros(shape).
      • We will store n[l] , the number of units in different layers, in a variable layer_dims. For example, the layer_dims for the “Planar Data classification model” from last week would have been [2,4,1]: There were two inputs, one hidden layer with 4 hidden units, and an output layer with 1 output unit. Thus means W1’s shape was (4,2), b1 was (4,1), W2 was (1,4) and b2 was (1,1). Now you will generalize this to L layers! (我们每个层的单元数放入到变量layer_dims中,这是一个python列表)
      • Here is the implementation for L=1 (one layer neural network). It should inspire you to implement the general case (L-layer neural network).
      •     if L == 1:
                parameters["W" + str(L)] = np.random.randn(layer_dims[1], layer_dims[0]) * 0.01
                parameters["b" + str(L)] = np.zeros((layer_dims[1], 1))
        def initialize_parameters_deep(layer_dims):
            """
            Arguments:
            layer_dims -- python array (list) containing the dimensions of each layer in our network
            一个python数组(列表)包含网络中每个层的维度
            Returns:
            parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                            Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
                            bl -- bias vector of shape (layer_dims[l], 1)
            """
        
            np.random.seed(3)
            parameters = {}
            L = len(layer_dims)            # number of layers in the network
        
            for l in range(1, L):         # 从1到L-1
                ### START CODE HERE ### (≈ 2 lines of code)
                parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
                parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))
                ### END CODE HERE ###
        
                assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))
                assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))
        
        
            return parameters

        1.3 前向传播模块

        1.3.1 Linear Forward

        Now that you have initialized your parameters, you will do the forward propagation module. You will start by implementing some basic functions that you will use later when implementing the model. You will complete three functions in this order:

        • LINEAR
        • LINEAR -> ACTIVATION where ACTIVATION will be either ReLU or Sigmoid.
        • [LINEAR -> RELU] × (L-1) -> LINEAR -> SIGMOID (whole model)

        The linear forward module (vectorized over all the examples) computes the following equations:(线性前向模块的向量化实现等式如下:)

        Z[l]=W[l]A[l1]+b[l](4)

        where A[0]=X .

        def linear_forward(A, W, b):
            """
            Implement the linear part of a layer's forward propagation.
        
            Arguments:
            A -- activations from previous layer (or input data): (size of previous layer, number of examples)
                 来至前一层的激活值(或者输入数据)
            W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
            b -- bias vector, numpy array of shape (size of the current layer, 1)
        
            Returns:
            Z -- the input of the activation function, also called pre-activation parameter 
                 激活函数的输入,也叫预激活参数
            cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently
            缓存  -- 一个python字典
            """
        
            ### START CODE HERE ### (≈ 1 line of code)
            Z = np.dot(W, A) + b
            ### END CODE HERE ###
        
            assert(Z.shape == (W.shape[0], A.shape[1]))
            cache = (A, W, b)
        
            return Z, cache

        1.3.2 Linear-Activation Forward

        In this notebook, you will use two activation functions:

        • Sigmoid: σ(Z)=σ(WA+b)=11+e(WA+b) . We have provided you with the sigmoid function. This function returns two items: the activation value “A” and a “cache” that contains “Z” (it’s what we will feed in to the corresponding backward function). To use it you could just call:
          此处sigmoid函数将返回两项:一个是激活值”A“另一个是”cache“内容是”Z“(它将在相应反向函数中使用)
        A, activation_cache = sigmoid(Z)
        • ReLU: The mathematical formula for ReLu is A=RELU(Z)=max(0,Z) . We have provided you with the relu function. This function returns two items: the activation value “A” and a “cache” that contains “Z” (it’s what we will feed in to the corresponding backward function). To use it you could just call:
          此处ReLU函数将返回两项:一个是激活值”A“另一个是”cache“内容是”Z“(它将在相应反向函数中使用)
        A, activation_cache = relu(Z)
        def linear_activation_forward(A_prev, W, b, activation):
            """
            Implement the forward propagation for the LINEAR->ACTIVATION layer
        
            Arguments:
            A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
            W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
            b -- bias vector, numpy array of shape (size of the current layer, 1)
            activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
                          该层使用的激活函数,被存为字符串"sigmoid" or "relu"
            Returns:
            A -- the output of the activation function, also called the post-activation value 
            cache -- a python dictionary containing "linear_cache" and "activation_cache";
                     stored for computing the backward pass efficiently
            """
        
            if activation == "sigmoid":
                # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
                ### START CODE HERE ### (≈ 2 lines of code)
                Z, linear_cache = linear_forward(A_prev, W, b) # 此处linear_cache(A, W, b)
                A, activation_cache = sigmoid(Z)               # 此处activation_cache: Z
                ### END CODE HERE ###
        
            elif activation == "relu":
                # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
                ### START CODE HERE ### (≈ 2 lines of code)
                Z, linear_cache = linear_forward(A_prev, W, b)
                A, activation_cache = relu(Z)
                ### END CODE HERE ###
        
            assert (A.shape == (W.shape[0], A_prev.shape[1]))
            cache = (linear_cache, activation_cache)    # 此处cache为一个元组
        
            return A, cache

        1.3.3 L-Layer Model

        llayer
        Instruction: In the code below, the variable AL will denote A[L]=σ(Z[L])=σ(W[L]A[L1]+b[L]) . (This is sometimes also called Yhat, i.e., this is Y^ .)

        Tips:

        • Use the functions you had previously written
        • Use a for loop to replicate [LINEAR->RELU] (L-1) times
        • Don’t forget to keep track of the caches in the “caches” list. To add a new value c to a list, you can use list.append(c).(不要忘记将cache放入”caches”列表中)
        def L_model_forward(X, parameters):
            """
            Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation
        
            Arguments:
            X -- data, numpy array of shape (input size, number of examples)
                 数据,Numpy 数组,shape为(输入size,样本数)
            parameters -- output of initialize_parameters_deep()
                          来至初始化参数函数的输出
            Returns:
            AL -- last post-activation value
            caches -- list of caches containing:
                        every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)
                        the cache of linear_sigmoid_forward() (there is one, indexed L-1)
            """
        
            caches = []
            A = X
            L = len(parameters) // 2                  # number of layers in the neural network 此处为整除2,因为parameters中有W、b所以除2
        
            # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
            for l in range(1, L):
                A_prev = A 
                ### START CODE HERE ### (≈ 2 lines of code)
                A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation='relu')
                caches.append(cache)
                ### END CODE HERE ###
        
            # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
            ### START CODE HERE ### (≈ 2 lines of code)
            AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation='sigmoid')
            caches.append(cache)
            ### END CODE HERE ###
        
            assert(AL.shape == (1,X.shape[1]))
        
            return AL, caches

        1.4 代价函数

        Now you will implement forward and backward propagation. You need to compute the cost, because you want to check if your model is actually learning.(现在实现了前向传播. 需要去计算cost,因为你需要检查模型实际学习的怎么样)

        Exercise: Compute the cross-entropy cost J , using the following formula:

        1mi=1m(y(i)log(a[L](i))+(1y(i))log(1a[L](i)))(7)

        def compute_cost(AL, Y):
            """
            Implement the cost function defined by equation (7).
        
            Arguments:
            AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
            Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)
        
            Returns:
            cost -- cross-entropy cost
            """
        
            m = Y.shape[1]
        
            # Compute loss from aL and y.
            ### START CODE HERE ### (≈ 1 lines of code)
            cost = (-1 / m) * np.sum(np.multiply(Y, np.log(AL)) + np.multiply(1 - Y, np.log(1 - AL)))
            ### END CODE HERE ###
        
            cost = np.squeeze(cost)      # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
            assert(cost.shape == ())
        
            return cost
        import numpy as np
        Y, AL = compute_cost_test_case()
        AL = np.array([[0.01, 0.01, 0.03]])
        print(AL)
        print(Y)
        print("cost = " + str(compute_cost(AL, Y)))
        ------------
        [[ 0.01  0.01  0.03]]
        [[1 1 1]]
        cost = 4.23896608977

        通过改写AL可以发现,只要ALY差得够远这个cost可以很大!

        1.5 反向传播模块

        back

        1.5.1 Linear backward

        linear backward

        def linear_backward(dZ, cache):
            """
            Implement the linear portion of backward propagation for a single layer (layer l)
        
            Arguments:
            dZ -- Gradient of the cost with respect to the linear output (of current layer l)
                  cost关于线性输出Z的梯度(当前层为l)
            cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer
        
            Returns:
            dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
            dW -- Gradient of the cost with respect to W (current layer l), same shape as W
            db -- Gradient of the cost with respect to b (current layer l), same shape as b
            """
            A_prev, W, b = cache
            m = A_prev.shape[1]   # 样本数m
        
            ### START CODE HERE ### (≈ 3 lines of code)
            dW = np.dot(dZ, cache[0].T) / m
            db = np.sum(dZ, axis=1, keepdims=True) / m
            dA_prev = np.dot(cache[1].T, dZ)
            ### END CODE HERE ###
        
            assert (dA_prev.shape == A_prev.shape)
            assert (dW.shape == W.shape)
            assert (db.shape == b.shape)
        
            return dA_prev, dW, db

        1.5.2 Linear-Activation backward

        Next, you will create a function that merges the two helper functions: linear_backward and the backward step for the activation linear_activation_backward.

        To help you implement linear_activation_backward, we provided two backward functions:

        • sigmoid_backward: Implements the backward propagation for SIGMOID unit. You can call it as follows:
        dZ = sigmoid_backward(dA, activation_cache)
        • relu_backward: Implements the backward propagation for RELU unit. You can call it as follows:
        dZ = relu_backward(dA, activation_cache)

        If g(.) is the activation function,
        sigmoid_backward and relu_backward compute

        dZ[l]=dA[l]g(Z[l])(11)
        .

        def linear_activation_backward(dA, cache, activation):
            """
            Implement the backward propagation for the LINEAR->ACTIVATION layer.
        
            Arguments:
            dA -- post-activation gradient for current layer l 当前层已计算出的激活值梯度
            cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
            activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
        
            Returns:
            dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
            dW -- Gradient of the cost with respect to W (current layer l), same shape as W
            db -- Gradient of the cost with respect to b (current layer l), same shape as b
            """
            linear_cache, activation_cache = cache
        
            if activation == "relu":
                ### START CODE HERE ### (≈ 2 lines of code)
                dZ = relu_backward(dA, activation_cache)
                dA_prev, dW, db = linear_backward(dZ, linear_cache)
                ### END CODE HERE ###
        
            elif activation == "sigmoid":
                ### START CODE HERE ### (≈ 2 lines of code)
                dZ = sigmoid_backward(dA, activation_cache)
                dA_prev, dW, db = linear_backward(dZ, linear_cache)
                ### END CODE HERE ###
        
            return dA_prev, dW, db

        1.5.3 L-Model Backward

        Now you will implement the backward function for the whole network. Recall that when you implemented the L_model_forward function, at each iteration, you stored a cache which contains (X,W,b, and z). In the back propagation module, you will use those variables to compute the gradients. Therefore, in the L_model_backward function, you will iterate through all the hidden layers backward, starting from layer L . On each step, you will use the cached values for layer l to backpropagate through layer l . Figure 5 below shows the backward pass.(现在将对整个网络实现反向函数. 回想一下你实现L_model_forward函数时,每迭代一层都保留cache包含(X,W,b, and z). 在反向传播模块将使用这些变量去计算梯度.因此在L_model_backward函数将从L层迭代所有的隐藏层,每步都将使用对应层的cache用于计算梯度,图5显示了反向过程)
        figure5
        Initializing backpropagation:
        To backpropagate through this network, we know that the output is,
        A[L]=σ(Z[L]) . Your code thus needs to compute dAL =LA[L] .
        To do so, use this formula (derived using calculus which you don’t need in-depth knowledge of):(初始化反向传播,我们需要编码实现代价函数关于dAL的导数)

        dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) # derivative of cost with respect to AL
        def L_model_backward(AL, Y, caches):
            """
            Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group
        
            Arguments:
            AL -- probability vector, output of the forward propagation (L_model_forward())
            Y -- true "label" vector (containing 0 if non-cat, 1 if cat)
            caches -- list of caches containing:缓存列表包含:
                        every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)
                        relu激活函数cache的下标是从caches列表的0...L-2
                        the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])
                        sigmoid激活函数下标是L-1
            Returns:
            grads -- A dictionary with the gradients  返回每层W,b的梯度字典
                     grads["dA" + str(l)] = ... 
                     grads["dW" + str(l)] = ...
                     grads["db" + str(l)] = ... 
            """
            grads = {}
            L = len(caches) # the number of layers  我们说神经网络多少层时不包括输入层
            m = AL.shape[1]
            Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL
        
            # Initializing the backpropagation   初始化反向传播即求得AL关于cost的梯度
            ### START CODE HERE ### (1 line of code)
            dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
            ### END CODE HERE ###
        
            # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "AL, Y, caches". Outputs: "grads["dAL"], grads["dWL"], grads["dbL"]
            ### START CODE HERE ### (approx. 2 lines)
            current_cache = caches[-1]  # 取得caches倒数第一个值
            grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, activation = "sigmoid")
            ### END CODE HERE ###
        
            for l in reversed(range(L-1)):  # 从L-2到0
                # lth layer: (RELU -> LINEAR) gradients.
                # Inputs: "grads["dA" + str(l + 2)], caches". Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] 
                ### START CODE HERE ### (approx. 5 lines)
                current_cache = caches[l]  # caches的下标是从0到L-1,所以对于倒数第二层下标为L-2
                dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l+2)], current_cache, activation = "relu")
                grads["dA" + str(l + 1)] = dA_prev_temp
                grads["dW" + str(l + 1)] = dW_temp
                grads["db" + str(l + 1)] = db_temp
                ### END CODE HERE ###
        
            return grads

        1.5.4 Update Parameters

        update

        def update_parameters(parameters, grads, learning_rate):
            """
            Update parameters using gradient descent使用梯度下降更新参数
        
            Arguments:
            parameters -- python dictionary containing your parameters 
            grads -- python dictionary containing your gradients, output of L_model_backward
        
            Returns:
            parameters -- python dictionary containing your updated parameters 
                          parameters["W" + str(l)] = ... 
                          parameters["b" + str(l)] = ...
            """
        
            L = len(parameters) // 2 # number of layers in the neural network
        
            # Update rule for each parameter. Use a for loop.
            ### START CODE HERE ### (≈ 3 lines of code)  参数与梯度下标均是从1到L
            for l in range(L):
                parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l + 1)]
                parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l + 1)]
            ### END CODE HERE ###
            return parameters

        1.6 总结

        Congrats on implementing all the functions required for building a deep neural network! (恭喜你完成了所有搭建深度神经网络所需的函数)

        We know it was a long assignment but going forward it will only get better. The next part of the assignment is easier. (我们知道这个很长的作业,但只有完成它才能更好的继续向前. 下部分的作业将很容易)

        In the next assignment you will put all these together to build two models:
        - A two-layer neural network
        - An L-layer neural network

        You will in fact use these models to classify cat vs non-cat images!(你将使用这些模型去分辨猫的图片!)

        2. Deep Neural Network for Image Classification: Application

        application

        2.1 数据集

        You will use the same “Cat vs non-Cat” dataset as in “Logistic Regression as a Neural Network” (Assignment 2). The model you had built had 70% test accuracy on classifying cats vs non-cats images. Hopefully, your new model will perform a better!(你将使用与逻辑回归相同的数据集,在逻辑回归上有70%的测试正确率. 希望你新的模型表现更好!)

        Problem Statement: You are given a dataset (“data.h5”) containing:
        - a training set of m_train images labelled as cat (1) or non-cat (0)
        - a test set of m_test images labelled as cat and non-cat
        - each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB).

        Number of training examples: 209      训练集样本数209
        Number of testing examples: 50        测试集样本数50
        Each image is of size: (64, 64, 3)    每张图片的size
        train_x_orig shape: (209, 64, 64, 3)  原始x训练集shape
        train_y shape: (1, 209)               原始y训练集shape
        test_x_orig shape: (50, 64, 64, 3)
        test_y shape: (1, 50)
        处理数据集之后的shape
        train_x's shape: (12288, 209)
        test_x's shape: (12288, 50)

        2.2 构建网络

        Now that you are familiar with the dataset, it is time to build a deep neural network to distinguish cat images from non-cat images.

        You will build two different models:(你将搭建两个不同的模型)
        - A 2-layer neural network(两层神经网络)
        - An L-layer deep neural network(L层神经网络)

        You will then compare the performance of these models, and also try out different values for L . (你将比较不同模型的表现,也可以尝试不同的L)

        2.2.1 2-layer neural network

        2

        2.2.2 L-layer deep neural network

        L

        2.2.3 General methodology

        As usual you will follow the Deep Learning methodology to build the model:

        1. Initialize parameters / Define hyperparameters(初始化参数/确定超参数)
        2. Loop for num_iterations:(执行num次循环)
            a. Forward propagation
            b. Compute cost function
            c. Backward propagation
            d. Update parameters (using parameters, and grads from backprop) 
        3. Use trained parameters to predict labels(使用训练好的参数去预测labels)
        

        2.3 两层神经网络

        Question: Use the helper functions you have implemented in the previous assignment to build a 2-layer neural network with the following structure: LINEAR -> RELU -> LINEAR -> SIGMOID. The functions you may need and their inputs are:

        def initialize_parameters(n_x, n_h, n_y):
            ...这个专为两层网络初始化参数
            return parameters 
        def linear_activation_forward(A_prev, W, b, activation):
            ...此处并未用L_model_forward,因为只有两层直接调用两次即可
            return A, cache
        def compute_cost(AL, Y):
            ...
            return cost
        def linear_activation_backward(dA, cache, activation):
            ...与前向相同,调用两次
            return dA_prev, dW, db
        def update_parameters(parameters, grads, learning_rate):
            ...
            return parameters
        def two_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):
            """
            Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID.
        
            Arguments:
            X -- input data, of shape (n_x, number of examples)
            Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
            layers_dims -- dimensions of the layers (n_x, n_h, n_y)
            num_iterations -- number of iterations of the optimization loop
            learning_rate -- learning rate of the gradient descent update rule
            print_cost -- If set to True, this will print the cost every 100 iterations 
        
            Returns:
            parameters -- a dictionary containing W1, W2, b1, and b2
            """
        
            np.random.seed(1)
            grads = {}
            costs = []                              # to keep track of the cost
            m = X.shape[1]                           # number of examples
            (n_x, n_h, n_y) = layers_dims
        
            # Initialize parameters dictionary, by calling one of the functions you'd previously implemented
            ### START CODE HERE ### (≈ 1 line of code)
            parameters = initialize_parameters(n_x, n_h, n_y)
            ### END CODE HERE ###
        
            # Get W1, b1, W2 and b2 from the dictionary parameters.
            W1 = parameters["W1"]
            b1 = parameters["b1"]
            W2 = parameters["W2"]
            b2 = parameters["b2"]
        
            # Loop (gradient descent)
        
            for i in range(0, num_iterations):
        
                # Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1". Output: "A1, cache1, A2, cache2".
                ### START CODE HERE ### (≈ 2 lines of code)
                A1, cache1 = linear_activation_forward(X, W1, b1, 'relu')
                A2, cache2 = linear_activation_forward(A1, W2, b2, 'sigmoid')
                ### END CODE HERE ###
        
                # Compute cost
                ### START CODE HERE ### (≈ 1 line of code)
                cost = compute_cost(A2, Y)
                ### END CODE HERE ###
        
                # Initializing backward propagation
                dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2))
        
                # Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1".
                ### START CODE HERE ### (≈ 2 lines of code)
                dA1, dW2, db2 = linear_activation_backward(dA2, cache2, 'sigmoid')
                dA0, dW1, db1 = linear_activation_backward(dA1, cache1, 'relu')
                ### END CODE HERE ###
        
                # Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2
                grads['dW1'] = dW1
                grads['db1'] = db1
                grads['dW2'] = dW2
                grads['db2'] = db2
        
                # Update parameters.
                ### START CODE HERE ### (approx. 1 line of code)
                parameters = update_parameters(parameters, grads, learning_rate)
                ### END CODE HERE ###
        
                # Retrieve W1, b1, W2, b2 from parameters
                W1 = parameters["W1"]
                b1 = parameters["b1"]
                W2 = parameters["W2"]
                b2 = parameters["b2"]
        
                # Print the cost every 100 training example
                if print_cost and i % 100 == 0:
                    print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
                if print_cost and i % 100 == 0:
                    costs.append(cost)
        
            # plot the cost
        
            plt.plot(np.squeeze(costs))
            plt.ylabel('cost')
            plt.xlabel('iterations (per tens)')
            plt.title("Learning rate =" + str(learning_rate))
            plt.show()
        
            return parameters

        cost
        accuracy
        Note: You may notice that running the model on fewer iterations (say 1500) gives better accuracy on the test set. This is called “early stopping” and we will talk about it in the next course. Early stopping is a way to prevent overfitting. (你可能注意到如果在模型上运行更少的迭代数(比如1500次)可能在测试集上会得到一个更好的结果. 这叫”early stopping”,我们将在下门课讨论. Early stopping是一种防止过拟合的方法)

        Congratulations! It seems that your 2-layer neural network has better performance (72%) than the logistic regression implementation (70%, assignment week 2). Let’s see if you can do even better with an L-layer model.(恭喜!看起来两层的神经网络(72%)比逻辑回归(70%)的表现更好. 让我们看看在L层上是否表现得更好)

        2.4 L层神经网络

        Question: Use the helper functions you have implemented previously to build an L -layer neural network with the following structure: [LINEAR -> RELU]×(L-1) -> LINEAR -> SIGMOID. The functions you may need and their inputs are:

        def initialize_parameters_deep(layer_dims):
            ...此处使用深层的初始化参数函数而不是两层的网络
            return parameters 
        def L_model_forward(X, parameters):
            ...
            return AL, caches
        def compute_cost(AL, Y):
            ...
            return cost
        def L_model_backward(AL, Y, caches):
            ...
            return grads
        def update_parameters(parameters, grads, learning_rate):
            ...
            return parameters
        ### CONSTANTS ###
        layers_dims = [12288, 20, 7, 5, 1] #  5-layer model
        def L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):#lr was 0.009
            """
            Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID.
        
            Arguments:
            X -- data, numpy array of shape (number of examples, num_px * num_px * 3)
            Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
            layers_dims -- list containing the input size and each layer size, of length (number of layers + 1).
            learning_rate -- learning rate of the gradient descent update rule
            num_iterations -- number of iterations of the optimization loop
            print_cost -- if True, it prints the cost every 100 steps
        
            Returns:
            parameters -- parameters learnt by the model. They can then be used to predict.
            """
        
            np.random.seed(1)
            costs = []                         # keep track of cost
        
            # Parameters initialization.
            ### START CODE HERE ###
            parameters = initialize_parameters_deep(layers_dims)
            ### END CODE HERE ###
        
            # Loop (gradient descent)
            for i in range(0, num_iterations):
        
                # Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID.
                ### START CODE HERE ### (≈ 1 line of code)
                AL, caches = L_model_forward(X, parameters)
                ### END CODE HERE ###
        
                # Compute cost.
                ### START CODE HERE ### (≈ 1 line of code)
                cost = compute_cost(AL, Y)
                ### END CODE HERE ###
        
                # Backward propagation.
                ### START CODE HERE ### (≈ 1 line of code)
                grads = L_model_backward(AL, Y, caches)
                ### END CODE HERE ###
        
                # Update parameters.
                ### START CODE HERE ### (≈ 1 line of code)
                parameters = update_parameters(parameters,  grads, learning_rate)
                ### END CODE HERE ###
        
                # Print the cost every 100 training example
                if print_cost and i % 100 == 0:
                    print ("Cost after iteration %i: %f" %(i, cost))
                if print_cost and i % 100 == 0:
                    costs.append(cost)
        
            # plot the cost
            plt.plot(np.squeeze(costs))
            plt.ylabel('cost')
            plt.xlabel('iterations (per tens)')
            plt.title("Learning rate =" + str(learning_rate))
            plt.show()
        
            return parameters

        l
        a
        Congrats! It seems that your 5-layer neural network has better performance (80%) than your 2-layer neural network (72%) on the same test set. (恭喜!看起来在相同的测试集上5层神经网络比2层表现更好)

        This is good performance for this task. Nice job!

        Though in the next course on “Improving deep neural networks” you will learn how to obtain even higher accuracy by systematically searching for better hyperparameters (learning_rate, layers_dims, num_iterations, and others you’ll also learn in the next course). (通过下门课你将获得更高的正确率)

        2.5 结果分析

        First, let’s take a look at some images the L-layer model labeled incorrectly. This will show a few mislabeled images. (首先让我们看看预测错误的图片)
        incorrectly
        A few type of images the model tends to do poorly on include:

        • Cat body in an unusual position(猫的身体在不寻常的位置)
        • Cat appears against a background of a similar color(猫看起来背景颜色相似)
        • Unusual cat color and species(猫的颜色和品种不寻常)
        • Camera Angle(照相机的角度)
        • Brightness of the picture(图片的亮度)
        • Scale variation (cat is very large or small in image) (尺度变化,猫过大或过小)
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值