03-(第一课)第四周作业：深层神经网络

最新推荐文章于 2022-09-28 23:30:53 发布

Eden朱

最新推荐文章于 2022-09-28 23:30:53 发布

阅读量513

点赞数 1

分类专栏： Deep Learning Coursera

本文链接：https://blog.csdn.net/qiqicos/article/details/79010377

版权

Deep Learning Coursera 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

1. Building your Deep Neural Network: Step by Step

assiagment
step2

1.1 作业大纲

outline
为了搭建你的神经网络，你将实现几个”helper functions”.这几个函数将在下次作业被用于搭建两层和 $L$ 层神经网络.下面是这次作业的大纲，你将：

为两层和

L $L$ 层神经网络初始化参数

实现前向传播模块(下图的紫色区域)
- 完成一层的前向传播LINEAR部分(结果为 $Z^{[l]}$ ).
- 我们给出ACTIVATION 函数(relu/sigmoid).
- 将前两步结合为[LINEAR->ACTIVATION]前向函数.
- 执行[LINEAR->RELU]前向函数 $L-1$ 次(从1到L-1层)最后加一次[LINEAR->SIGMOID](最后一层 $L$ ).你将得到新的L_model_forward函数
计算loss
实现反向传播模块(下图的红色区域)
- 完成一层的反向向传播LINEAR部分
- 我们给出ACTIVATION 函数的梯度(relu_backward/sigmoid_backward)
- 将前两步结合为[LINEAR->ACTIVATION]反向函数
- 执行[LINEAR->RELU]反向函数 $L-1$ 次,加一次[LINEAR->SIGMOID].你将得到新的L_model_backward函数
最后更新参数

1.2 初始化

你将写两个helper functions用于初始化你的参数模型.第一个函数被用于初始化一个两层模型的参数.第二个用于初始化 $L$ 层.

1.2.1 2-layer Neural Network

Instructions:

The model’s structure is: LINEAR -> RELU -> LINEAR -> SIGMOID.
Use random initialization for the weight matrices. Use np.random.randn(shape)*0.01 with the correct shape.(使用标准正态分布随机初始化权重矩阵并最后*0.01)
Use zero initialization for the biases. Use np.zeros(shape).(使用0初始化bias)

def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer   (输入层的size)
    n_h -- size of the hidden layer  (隐藏层的size)
    n_y -- size of the output layer  (输出层的size)

    Returns:
    parameters -- python dictionary containing your parameters:(返回一个python字典包含你的参数)
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """

    np.random.seed(1)

    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.zeros((n_h, 1))
    W2 = np.random.randn(n_y, n_h) * 0.01
    b2 = np.zeros((n_y, 1))
    ### END CODE HERE ###

    assert(W1.shape == (n_h, n_x))      # (断言语句判断shape是否正确，如果不正确将抛出异常)
    assert(b1.shape == (n_h, 1))
    assert(W2.shape == (n_y, n_h))
    assert(b2.shape == (n_y, 1))

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters

1.2.2 L-layer Neural Network

Linit
Instructions:

The model’s structure is [LINEAR -> RELU] $\times$ (L-1) -> LINEAR -> SIGMOID. I.e., it has $L-1$ layers using a ReLU activation function followed by an output layer with a sigmoid activation function.(有 $L-1$ 层使用ReLU激活函数，输出层用sigmoid激活函数)
Use random initialization for the weight matrices. Use np.random.rand(shape) * 0.01.
Use zeros initialization for the biases. Use np.zeros(shape).
We will store $n^{[l]}$ , the number of units in different layers, in a variable layer_dims. For example, the layer_dims for the “Planar Data classification model” from last week would have been [2,4,1]: There were two inputs, one hidden layer with 4 hidden units, and an output layer with 1 output unit. Thus means W1’s shape was (4,2), b1 was (4,1), W2 was (1,4) and b2 was (1,1). Now you will generalize this to $L$ layers! (我们每个层的单元数放入到变量layer_dims中，这是一个python列表)

Here is the implementation for $L=1$ (one layer neural network). It should inspire you to implement the general case (L-layer neural network).
if L == 1: parameters["W" + str(L)] = np.random.randn(layer_dims[1], layer_dims[0]) * 0.01 parameters["b" + str(L)] = np.zeros((layer_dims[1], 1))

def initialize_parameters_deep(layer_dims): """ Arguments: layer_dims -- python array (list) containing the dimensions of each layer in our network 一个python数组(列表)包含网络中每个层的维度 Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1]) bl -- bias vector of shape (layer_dims[l], 1) """ np.random.seed(3) parameters = {} L = len(layer_dims) # number of layers in the network for l in range(1, L): # 从1到L-1 ### START CODE HERE ### (≈ 2 lines of code) parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01 parameters['b' + str(l)] = np.zeros((layer_dims[l], 1)) ### END CODE HERE ### assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1])) assert(parameters['b' + str(l)].shape == (layer_dims[l], 1)) return parameters

1.3 前向传播模块

1.3.1 Linear Forward

Now that you have initialized your parameters, you will do the forward propagation module. You will start by implementing some basic functions that you will use later when implementing the model. You will complete three functions in this order:

LINEAR
LINEAR -> ACTIVATION where ACTIVATION will be either ReLU or Sigmoid.
[LINEAR -> RELU] $\times$ (L-1) -> LINEAR -> SIGMOID (whole model)

The linear forward module (vectorized over all the examples) computes the following equations:(线性前向模块的向量化实现等式如下：)

$Z [l] = W [l] A [l - 1] + b [l] (4)$ $Z^{[l]} = W^{[l]}A^{[l-1]} +b^{[l]}\tag{4}$

where $A^{[0]} = X$ .

def linear_forward(A, W, b): """ Implement the linear part of a layer's forward propagation. Arguments: A -- activations from previous layer (or input data): (size of previous layer, number of examples) 来至前一层的激活值(或者输入数据) W -- weights matrix: numpy array of shape (size of current layer, size of previous layer) b -- bias vector, numpy array of shape (size of the current layer, 1) Returns: Z -- the input of the activation function, also called pre-activation parameter 激活函数的输入，也叫预激活参数 cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently 缓存 -- 一个python字典 """ ### START CODE HERE ### (≈ 1 line of code) Z = np.dot(W, A) + b ### END CODE HERE ### assert(Z.shape == (W.shape[0], A.shape[1])) cache = (A, W, b) return Z, cache

1.3.2 Linear-Activation Forward

In this notebook, you will use two activation functions:

Sigmoid: $\sigma(Z) = \sigma(W A + b) = \frac{1}{ 1 + e^{-(W A + b)}}$ . We have provided you with the sigmoid function. This function returns two items: the activation value “A” and a “cache” that contains “Z” (it’s what we will feed in to the corresponding backward function). To use it you could just call:
此处sigmoid函数将返回两项：一个是激活值”A“另一个是”cache“内容是”Z“(它将在相应反向函数中使用)

A, activation_cache = sigmoid(Z)

ReLU: The mathematical formula for ReLu is $A = RELU(Z) = max(0, Z)$ . We have provided you with the relu function. This function returns two items: the activation value “A” and a “cache” that contains “Z” (it’s what we will feed in to the corresponding backward function). To use it you could just call:
此处ReLU函数将返回两项：一个是激活值”A“另一个是”cache“内容是”Z“(它将在相应反向函数中使用)

A, activation_cache = relu(Z)

def linear_activation_forward(A_prev, W, b, activation): """ Implement the forward propagation for the LINEAR->ACTIVATION layer Arguments: A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples) W -- weights matrix: numpy array of shape (size of current layer, size of previous layer) b -- bias vector, numpy array of shape (size of the current layer, 1) activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu" 该层使用的激活函数，被存为字符串"sigmoid" or "relu" Returns: A -- the output of the activation function, also called the post-activation value cache -- a python dictionary containing "linear_cache" and "activation_cache"; stored for computing the backward pass efficiently """ if activation == "sigmoid": # Inputs: "A_prev, W, b". Outputs: "A, activation_cache". ### START CODE HERE ### (≈ 2 lines of code) Z, linear_cache = linear_forward(A_prev, W, b) # 此处linear_cache(A, W, b) A, activation_cache = sigmoid(Z) # 此处activation_cache: Z ### END CODE HERE ### elif activation == "relu": # Inputs: "A_prev, W, b". Outputs: "A, activation_cache". ### START CODE HERE ### (≈ 2 lines of code) Z, linear_cache = linear_forward(A_prev, W, b) A, activation_cache = relu(Z) ### END CODE HERE ### assert (A.shape == (W.shape[0], A_prev.shape[1])) cache = (linear_cache, activation_cache) # 此处cache为一个元组 return A, cache

1.3.3 L-Layer Model

Instruction: In the code below, the variable AL will denote $A^{[L]} = \sigma(Z^{[L]}) = \sigma(W^{[L]} A^{[L-1]} + b^{[L]})$ . (This is sometimes also called Yhat, i.e., this is $\hat{Y}$ .)

Tips:

Use the functions you had previously written
Use a for loop to replicate [LINEAR->RELU] (L-1) times
Don’t forget to keep track of the caches in the “caches” list. To add a new value c to a list, you can use list.append(c).(不要忘记将cache放入”caches”列表中)

def L_model_forward(X, parameters): """ Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation Arguments: X -- data, numpy array of shape (input size, number of examples) 数据，Numpy 数组，shape为(输入size，样本数) parameters -- output of initialize_parameters_deep() 来至初始化参数函数的输出 Returns: AL -- last post-activation value caches -- list of caches containing: every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2) the cache of linear_sigmoid_forward() (there is one, indexed L-1) """ caches = [] A = X L = len(parameters) // 2 # number of layers in the neural network 此处为整除2，因为parameters中有W、b所以除2 # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list. for l in range(1, L): A_prev = A ### START CODE HERE ### (≈ 2 lines of code) A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation='relu') caches.append(cache) ### END CODE HERE ### # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list. ### START CODE HERE ### (≈ 2 lines of code) AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation='sigmoid') caches.append(cache) ### END CODE HERE ### assert(AL.shape == (1,X.shape[1])) return AL, caches

1.4 代价函数

Now you will implement forward and backward propagation. You need to compute the cost, because you want to check if your model is actually learning.(现在实现了前向传播. 需要去计算cost，因为你需要检查模型实际学习的怎么样)

Exercise: Compute the cross-entropy cost $J$ , using the following formula:
$- 1 m \sum i = 1 m (y (i) log (a [L] (i)) + (1 - y (i)) log (1 - a [L] (i))) (7)$ $-\frac{1}{m} \sum\limits_{i = 1}^{m} (y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L](i)}\right)) \tag{7}$

def compute_cost(AL, Y): """ Implement the cost function defined by equation (7). Arguments: AL -- probability vector corresponding to your label predictions, shape (1, number of examples) Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples) Returns: cost -- cross-entropy cost """ m = Y.shape[1] # Compute loss from aL and y. ### START CODE HERE ### (≈ 1 lines of code) cost = (-1 / m) * np.sum(np.multiply(Y, np.log(AL)) + np.multiply(1 - Y, np.log(1 - AL))) ### END CODE HERE ### cost = np.squeeze(cost) # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17). assert(cost.shape == ()) return cost

import numpy as np Y, AL = compute_cost_test_case() AL = np.array([[0.01, 0.01, 0.03]]) print(AL) print(Y) print("cost = " + str(compute_cost(AL, Y))) ------------ [[ 0.01 0.01 0.03]] [[1 1 1]] cost = 4.23896608977

通过改写AL可以发现，只要AL与Y差得够远这个cost可以很大！

1.5 反向传播模块

1.5.1 Linear backward

def linear_backward(dZ, cache): """ Implement the linear portion of backward propagation for a single layer (layer l) Arguments: dZ -- Gradient of the cost with respect to the linear output (of current layer l) cost关于线性输出Z的梯度(当前层为l) cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer Returns: dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev dW -- Gradient of the cost with respect to W (current layer l), same shape as W db -- Gradient of the cost with respect to b (current layer l), same shape as b """ A_prev, W, b = cache m = A_prev.shape[1] # 样本数m ### START CODE HERE ### (≈ 3 lines of code) dW = np.dot(dZ, cache[0].T) / m db = np.sum(dZ, axis=1, keepdims=True) / m dA_prev = np.dot(cache[1].T, dZ) ### END CODE HERE ### assert (dA_prev.shape == A_prev.shape) assert (dW.shape == W.shape) assert (db.shape == b.shape) return dA_prev, dW, db

1.5.2 Linear-Activation backward

Next, you will create a function that merges the two helper functions: linear_backward and the backward step for the activation linear_activation_backward.

To help you implement linear_activation_backward, we provided two backward functions:

sigmoid_backward: Implements the backward propagation for SIGMOID unit. You can call it as follows:

dZ = sigmoid_backward(dA, activation_cache)

relu_backward: Implements the backward propagation for RELU unit. You can call it as follows:

dZ = relu_backward(dA, activation_cache)

If $g(.)$ is the activation function,
sigmoid_backward and relu_backward compute
$d Z [l] = d A [l] * g' (Z [l]) (11)$ $dZ^{[l]} = dA^{[l]} * g'(Z^{[l]}) \tag{11}$ .

def linear_activation_backward(dA, cache, activation): """ Implement the backward propagation for the LINEAR->ACTIVATION layer. Arguments: dA -- post-activation gradient for current layer l 当前层已计算出的激活值梯度 cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu" Returns: dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev dW -- Gradient of the cost with respect to W (current layer l), same shape as W db -- Gradient of the cost with respect to b (current layer l), same shape as b """ linear_cache, activation_cache = cache if activation == "relu": ### START CODE HERE ### (≈ 2 lines of code) dZ = relu_backward(dA, activation_cache) dA_prev, dW, db = linear_backward(dZ, linear_cache) ### END CODE HERE ### elif activation == "sigmoid": ### START CODE HERE ### (≈ 2 lines of code) dZ = sigmoid_backward(dA, activation_cache) dA_prev, dW, db = linear_backward(dZ, linear_cache) ### END CODE HERE ### return dA_prev, dW, db

1.5.3 L-Model Backward

Now you will implement the backward function for the whole network. Recall that when you implemented the L_model_forward function, at each iteration, you stored a cache which contains (X,W,b, and z). In the back propagation module, you will use those variables to compute the gradients. Therefore, in the L_model_backward function, you will iterate through all the hidden layers backward, starting from layer $L$ . On each step, you will use the cached values for layer $l$ to backpropagate through layer $l$ . Figure 5 below shows the backward pass.(现在将对整个网络实现反向函数. 回想一下你实现L_model_forward函数时，每迭代一层都保留cache包含(X,W,b, and z). 在反向传播模块将使用这些变量去计算梯度.因此在L_model_backward函数将从 $L$ 层迭代所有的隐藏层，每步都将使用对应层的cache用于计算梯度，图5显示了反向过程)

Initializing backpropagation:
To backpropagate through this network, we know that the output is,
$A^{[L]} = \sigma(Z^{[L]})$ . Your code thus needs to compute dAL $= \frac{\partial \mathcal{L}}{\partial A^{[L]}}$ .
To do so, use this formula (derived using calculus which you don’t need in-depth knowledge of):(初始化反向传播，我们需要编码实现代价函数关于dAL的导数)

dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) # derivative of cost with respect to AL

def L_model_backward(AL, Y, caches): """ Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group Arguments: AL -- probability vector, output of the forward propagation (L_model_forward()) Y -- true "label" vector (containing 0 if non-cat, 1 if cat) caches -- list of caches containing:缓存列表包含： every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2) relu激活函数cache的下标是从caches列表的0...L-2 the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1]) sigmoid激活函数下标是L-1 Returns: grads -- A dictionary with the gradients 返回每层W，b的梯度字典 grads["dA" + str(l)] = ... grads["dW" + str(l)] = ... grads["db" + str(l)] = ... """ grads = {} L = len(caches) # the number of layers 我们说神经网络多少层时不包括输入层 m = AL.shape[1] Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL # Initializing the backpropagation 初始化反向传播即求得AL关于cost的梯度 ### START CODE HERE ### (1 line of code) dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) ### END CODE HERE ### # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "AL, Y, caches". Outputs: "grads["dAL"], grads["dWL"], grads["dbL"] ### START CODE HERE ### (approx. 2 lines) current_cache = caches[-1] # 取得caches倒数第一个值 grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, activation = "sigmoid") ### END CODE HERE ### for l in reversed(range(L-1)): # 从L-2到0 # lth layer: (RELU -> LINEAR) gradients. # Inputs: "grads["dA" + str(l + 2)], caches". Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] ### START CODE HERE ### (approx. 5 lines) current_cache = caches[l] # caches的下标是从0到L-1，所以对于倒数第二层下标为L-2 dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l+2)], current_cache, activation = "relu") grads["dA" + str(l + 1)] = dA_prev_temp grads["dW" + str(l + 1)] = dW_temp grads["db" + str(l + 1)] = db_temp ### END CODE HERE ### return grads

1.5.4 Update Parameters

def update_parameters(parameters, grads, learning_rate): """ Update parameters using gradient descent使用梯度下降更新参数 Arguments: parameters -- python dictionary containing your parameters grads -- python dictionary containing your gradients, output of L_model_backward Returns: parameters -- python dictionary containing your updated parameters parameters["W" + str(l)] = ... parameters["b" + str(l)] = ... """ L = len(parameters) // 2 # number of layers in the neural network # Update rule for each parameter. Use a for loop. ### START CODE HERE ### (≈ 3 lines of code) 参数与梯度下标均是从1到L for l in range(L): parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l + 1)] parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l + 1)] ### END CODE HERE ### return parameters

1.6 总结

Congrats on implementing all the functions required for building a deep neural network! (恭喜你完成了所有搭建深度神经网络所需的函数)

We know it was a long assignment but going forward it will only get better. The next part of the assignment is easier. (我们知道这个很长的作业，但只有完成它才能更好的继续向前. 下部分的作业将很容易)

In the next assignment you will put all these together to build two models:
- A two-layer neural network
- An L-layer neural network

You will in fact use these models to classify cat vs non-cat images!(你将使用这些模型去分辨猫的图片！)

2. Deep Neural Network for Image Classification: Application

2.1 数据集

You will use the same “Cat vs non-Cat” dataset as in “Logistic Regression as a Neural Network” (Assignment 2). The model you had built had 70% test accuracy on classifying cats vs non-cats images. Hopefully, your new model will perform a better!(你将使用与逻辑回归相同的数据集，在逻辑回归上有70%的测试正确率. 希望你新的模型表现更好！)

Problem Statement: You are given a dataset (“data.h5”) containing:
- a training set of m_train images labelled as cat (1) or non-cat (0)
- a test set of m_test images labelled as cat and non-cat
- each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB).

Number of training examples: 209 训练集样本数209 Number of testing examples: 50 测试集样本数50 Each image is of size: (64, 64, 3) 每张图片的size train_x_orig shape: (209, 64, 64, 3) 原始x训练集shape train_y shape: (1, 209) 原始y训练集shape test_x_orig shape: (50, 64, 64, 3) test_y shape: (1, 50) 处理数据集之后的shape train_x's shape: (12288, 209) test_x's shape: (12288, 50)

2.2 构建网络

Now that you are familiar with the dataset, it is time to build a deep neural network to distinguish cat images from non-cat images.

You will build two different models:(你将搭建两个不同的模型)
- A 2-layer neural network(两层神经网络)
- An L-layer deep neural network(L层神经网络)

You will then compare the performance of these models, and also try out different values for $L$ . (你将比较不同模型的表现，也可以尝试不同的L)

2.2.1 2-layer neural network

2.2.2 L-layer deep neural network

2.2.3 General methodology

As usual you will follow the Deep Learning methodology to build the model:

1. Initialize parameters / Define hyperparameters(初始化参数/确定超参数) 2. Loop for num_iterations:(执行num次循环) a. Forward propagation b. Compute cost function c. Backward propagation d. Update parameters (using parameters, and grads from backprop) 3. Use trained parameters to predict labels(使用训练好的参数去预测labels)

2.3 两层神经网络

Question: Use the helper functions you have implemented in the previous assignment to build a 2-layer neural network with the following structure: LINEAR -> RELU -> LINEAR -> SIGMOID. The functions you may need and their inputs are:

def initialize_parameters(n_x, n_h, n_y): ...这个专为两层网络初始化参数 return parameters def linear_activation_forward(A_prev, W, b, activation): ...此处并未用L_model_forward,因为只有两层直接调用两次即可 return A, cache def compute_cost(AL, Y): ... return cost def linear_activation_backward(dA, cache, activation): ...与前向相同，调用两次 return dA_prev, dW, db def update_parameters(parameters, grads, learning_rate): ... return parameters

def two_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False): """ Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID. Arguments: X -- input data, of shape (n_x, number of examples) Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples) layers_dims -- dimensions of the layers (n_x, n_h, n_y) num_iterations -- number of iterations of the optimization loop learning_rate -- learning rate of the gradient descent update rule print_cost -- If set to True, this will print the cost every 100 iterations Returns: parameters -- a dictionary containing W1, W2, b1, and b2 """ np.random.seed(1) grads = {} costs = [] # to keep track of the cost m = X.shape[1] # number of examples (n_x, n_h, n_y) = layers_dims # Initialize parameters dictionary, by calling one of the functions you'd previously implemented ### START CODE HERE ### (≈ 1 line of code) parameters = initialize_parameters(n_x, n_h, n_y) ### END CODE HERE ### # Get W1, b1, W2 and b2 from the dictionary parameters. W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] # Loop (gradient descent) for i in range(0, num_iterations): # Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1". Output: "A1, cache1, A2, cache2". ### START CODE HERE ### (≈ 2 lines of code) A1, cache1 = linear_activation_forward(X, W1, b1, 'relu') A2, cache2 = linear_activation_forward(A1, W2, b2, 'sigmoid') ### END CODE HERE ### # Compute cost ### START CODE HERE ### (≈ 1 line of code) cost = compute_cost(A2, Y) ### END CODE HERE ### # Initializing backward propagation dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2)) # Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1". ### START CODE HERE ### (≈ 2 lines of code) dA1, dW2, db2 = linear_activation_backward(dA2, cache2, 'sigmoid') dA0, dW1, db1 = linear_activation_backward(dA1, cache1, 'relu') ### END CODE HERE ### # Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2 grads['dW1'] = dW1 grads['db1'] = db1 grads['dW2'] = dW2 grads['db2'] = db2 # Update parameters. ### START CODE HERE ### (approx. 1 line of code) parameters = update_parameters(parameters, grads, learning_rate) ### END CODE HERE ### # Retrieve W1, b1, W2, b2 from parameters W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] # Print the cost every 100 training example if print_cost and i % 100 == 0: print("Cost after iteration {}: {}".format(i, np.squeeze(cost))) if print_cost and i % 100 == 0: costs.append(cost) # plot the cost plt.plot(np.squeeze(costs)) plt.ylabel('cost') plt.xlabel('iterations (per tens)') plt.title("Learning rate =" + str(learning_rate)) plt.show() return parameters

Note: You may notice that running the model on fewer iterations (say 1500) gives better accuracy on the test set. This is called “early stopping” and we will talk about it in the next course. Early stopping is a way to prevent overfitting. (你可能注意到如果在模型上运行更少的迭代数(比如1500次)可能在测试集上会得到一个更好的结果. 这叫”early stopping”,我们将在下门课讨论. Early stopping是一种防止过拟合的方法)

Congratulations! It seems that your 2-layer neural network has better performance (72%) than the logistic regression implementation (70%, assignment week 2). Let’s see if you can do even better with an $L$ -layer model.(恭喜！看起来两层的神经网络(72%)比逻辑回归(70%)的表现更好. 让我们看看在L层上是否表现得更好)

2.4 L层神经网络

Question: Use the helper functions you have implemented previously to build an $L$ -layer neural network with the following structure: [LINEAR -> RELU] $\times$ (L-1) -> LINEAR -> SIGMOID. The functions you may need and their inputs are:

def initialize_parameters_deep(layer_dims): ...此处使用深层的初始化参数函数而不是两层的网络 return parameters def L_model_forward(X, parameters): ... return AL, caches def compute_cost(AL, Y): ... return cost def L_model_backward(AL, Y, caches): ... return grads def update_parameters(parameters, grads, learning_rate): ... return parameters

### CONSTANTS ### layers_dims = [12288, 20, 7, 5, 1] # 5-layer model

def L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):#lr was 0.009 """ Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID. Arguments: X -- data, numpy array of shape (number of examples, num_px * num_px * 3) Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples) layers_dims -- list containing the input size and each layer size, of length (number of layers + 1). learning_rate -- learning rate of the gradient descent update rule num_iterations -- number of iterations of the optimization loop print_cost -- if True, it prints the cost every 100 steps Returns: parameters -- parameters learnt by the model. They can then be used to predict. """ np.random.seed(1) costs = [] # keep track of cost # Parameters initialization. ### START CODE HERE ### parameters = initialize_parameters_deep(layers_dims) ### END CODE HERE ### # Loop (gradient descent) for i in range(0, num_iterations): # Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID. ### START CODE HERE ### (≈ 1 line of code) AL, caches = L_model_forward(X, parameters) ### END CODE HERE ### # Compute cost. ### START CODE HERE ### (≈ 1 line of code) cost = compute_cost(AL, Y) ### END CODE HERE ### # Backward propagation. ### START CODE HERE ### (≈ 1 line of code) grads = L_model_backward(AL, Y, caches) ### END CODE HERE ### # Update parameters. ### START CODE HERE ### (≈ 1 line of code) parameters = update_parameters(parameters, grads, learning_rate) ### END CODE HERE ### # Print the cost every 100 training example if print_cost and i % 100 == 0: print ("Cost after iteration %i: %f" %(i, cost)) if print_cost and i % 100 == 0: costs.append(cost) # plot the cost plt.plot(np.squeeze(costs)) plt.ylabel('cost') plt.xlabel('iterations (per tens)') plt.title("Learning rate =" + str(learning_rate)) plt.show() return parameters

Congrats! It seems that your 5-layer neural network has better performance (80%) than your 2-layer neural network (72%) on the same test set. (恭喜！看起来在相同的测试集上5层神经网络比2层表现更好)

This is good performance for this task. Nice job!

Though in the next course on “Improving deep neural networks” you will learn how to obtain even higher accuracy by systematically searching for better hyperparameters (learning_rate, layers_dims, num_iterations, and others you’ll also learn in the next course). (通过下门课你将获得更高的正确率)

2.5 结果分析

First, let’s take a look at some images the L-layer model labeled incorrectly. This will show a few mislabeled images. (首先让我们看看预测错误的图片)

A few type of images the model tends to do poorly on include:

Cat body in an unusual position(猫的身体在不寻常的位置)
Cat appears against a background of a similar color(猫看起来背景颜色相似)
Unusual cat color and species(猫的颜色和品种不寻常)
Camera Angle(照相机的角度)
Brightness of the picture(图片的亮度)
Scale variation (cat is very large or small in image) (尺度变化，猫过大或过小)

确定要放弃本次机会？
福利倒计时
: :

立减 ¥
普通VIP年卡可用
立即使用

Eden朱

关注关注

1
点赞

踩

2

收藏

觉得还不错? 一键收藏

0
评论

03-(第一课)第四周作业：深层神经网络

1. Building your Deep Neural Network: Step by Step 1.1 作业大纲为了搭建你的神经网络，你将实现几个”helper functions”.这几个函数将在下次作业被用于搭建两层和LL层神经网络.下面是这次作业的大纲，你将：为两层和LL层神经网络初始化参数实现前向传播模块(下图的紫色区域) 完成一层的前向传播LINEA
复制链接

扫一扫

专栏目录

吴恩达深度学习1-Week4课后作业-深层神经网络

Apple_hzc的博客

10-15 1108

一、Deeplearning-assignment 在本周作业的编程中，主要通过构建两层神经网络和L层神经网络对catvnoncat.h5数据集进行学习和预测，并得到对test数据集的预测准确度以及判断一张图片是否为猫。在前面的作业中，我们通过逻辑回归模型对该数据集进行了学习和预测，准确度为70%（可以翻看之前的博客和参阅相关代码结果），今天主要构建深层神经网络对数据集进行学习。由上...

吴恩达深度学习第一课第四周课后作业1参考

liuzhongkai123的专栏

12-18 5991

**第四周课后作业第一部分，对于作业环境安装不知道的可以看一下这里： http://blog.csdn.net/liuzhongkai123/article/details/78766351 这一周把文档也考过来了。** Building your Deep Neural Network: Step by Step 符号说明 Notation: - Superscript [

参与评论您还未登录，请先登录后发表或查看评论

在Tensorflow 1.14和1.15 中如何使用LBFGS优化方法

最新发布

夜晓岚的博客

09-28 1079

在 tensorflow 1 中如何使用LBFGS优化器

【深度学习】吴恩达深度学习-Course1神经网络与深度学习-第四周深度神经网络的关键概念编程（上）——一步步建立深度神经网络

passer__jw767的博客

12-08 1168

视频链接：【中英字幕】吴恩达深度学习课程第一课 — 神经网络与深度学习学习链接：【中文】【吴恩达课后编程作业】Course 1 - 神经网络和深度学习 - 第四周作业(1&2) Building your Deep Neural Network: Step by Step Deep Neural Network for Image Classification: Application @TOC 〇、作业目标在前面的作业中，我们已经训练了2层的神经网络（其中一层为隐藏层）。在这一篇内容中，

【AiLearning】test3：搭建Deep Netural Network

qq_42261092的博客

02-18 215

在吴恩达老师DL系列课程的学习过程中，跟随做的一些小练习，在看懂别人代码基础上，整理的一些小笔记。源代码参考：https://blog.csdn.net/u013733326/article/details/79767169 DL中的列表、元组与字典在DL的算法中，需要将一些参数存储起来，方便在下一次的前向传播与反向传播过程中，直接调用。据观察，参考代码中，常常将存储变量定义为列表、元组或字典类型。 1、在初始化函数initialize_parameters_deep中，将参数存..

第四课第二周：深度卷积神经网络

weixin_43938846的博客

01-26 521

2.1 略 2.2 经典网络 LeNet-5 假设你有一张32×32×1的图片，LeNet-5可以识别图中的手写数字。LeNet-5是针对灰度图片训练的，所以图片的大小只有32×32×1。使用6个5×5的过滤器，步幅为1。输出结果为28×28×6。进行平均池化，过滤器的宽度为2，步幅为2，图像的尺寸，高度和宽度都缩小了2倍，输出结果是一个14×14×6的图像。接下来是卷积层，我们用一组16个5×5的过滤器，新的输出结果有16个通道。图像从14到14缩小到了10×10 又是池化层，高度和宽度再缩小一

第一课 神经网络与深度学习第二周 神经网络基础（已完结（虽然有坑））

Karltan的博客

10-12 768

二分分类怎么说，第一段话就颠覆了我的想象，遍历m个样本不需要用for循环 logistic回归 logistic回归损失函数梯度下降法导数更多导数的例子计算图计算图的导数计算 logistic回归中的梯度下降法 m个样本的梯度下降向量化向量化的更多例子向量化logistic回归向量化logistic回归的梯度输出 python中的广播关于python_numpy向量的说明 Jupyter lpython笔记本的快速指南（选修）logistic损失函数的解释 ...

吴恩达深度学习编程作业（1-4）- Building your DNN & DNN for Image Classification: Application

热门推荐

大树先生的博客

09-26 3万+

Part 1：Building your Deep Neural Network: Step by Step Part 2：Deep Neural Network for Image Classification: Application

多层神经网络python实现

10-23 363

引言：神经网络在线教程有很多，如Andrew NG的deep leaning课程或者Michael Nielsen的在线教程《neural networks and deep learning》都讲述的很详细，只要认真听课，很好上手的。循序渐进构建L层神经网络： 1.初始化参数 1）初始化两层神经网络参数，模型结构LINEAR -> RELU ——> LIN...

吴恩达Coursera深度学习课程 course1-week4 深层神经网络 作业

currycode

10-23 2668

P0 前言第一门课 : 神经网络与深度学习第四周 : Deep Neural Networks（深层神经网络）主要知识点 : 深度神经网络、DNN的前向和反向传播（Forward & Backward propagation）、参数和超参数等视频地址：https://mooc.st...

Coursera deep learning 吴恩达 神经网络和深度学习第四周编程作业 Building your Deep Neural Network

cf的博客

09-15 8553

Coursera deep learning 吴恩达 神经网络和深度学习第四周深度神经网络 编程作业（一） Building your Deep Neural Network - Step by Step v4 & Deep Neural Network - Application v3

吴恩达 神经网络和深度学习第4周编程作业

you1314520me的专栏

05-30 3096

由于csdn的markdown编辑器及其难用，已将本文转移至此处NoteThese are my personal programming assignments at the 4th week after studying the course neural-networks-deep-learning and the copyright belongs to deeplearning.ai.P...

《吴恩达深度学习》第一课第四周任意层的神经网络实现及BUG处理

萝卜地里的兔子的博客

08-13 711

目录一、实现1、吴恩达提供的工具函数sigmoidsigmoid求导relurelu求导2、实现代码导包和配置初始化参数前向运算计算损失后向运算更新参数组装模型3、问题及思考一、实现 1、吴恩达提供的工具函数这几个函数这里只是展示一下，这是吴恩达写好的工具类，在实现的部分会导入；具体查看提供的附件 sigmoid def sigmoid(Z): A = 1/(1+np.exp(-Z))...

前向和反向传播公式推导

且听风吟的博客

04-13 4338

1 定义网络结构假设某二分类问题的网络结构由如图1.1组成（暂仅以2层网络举例，更高层数可依此类比），其输入的特征向量维数为n，隐藏层神经元个数为，输出层神经元个数为（由于是二分类问题，故仅含一个）。图1.1 神经网络结构其训练过程为：首先获取训练数据 X ，初始化每层的训练参数 w、b ，并通过前向传播算法（包含线性前向传播和激活函数前向传播）得...

吴恩达深度学习课程deeplearning.ai课程作业：Class 1 Week 4 assignment4_1

hongbin_xu的博客

12-03 3860

吴恩达deeplearning.ai课程作业，自己写的答案。补充说明： 1. 评论中总有人问为什么直接复制这些notebook运行不了？请不要直接复制粘贴，不可能运行通过的，这个只是notebook中我们要自己写的那部分，要正确运行还需要其他py文件，请自己到GitHub上下载完整的。这里的部分仅仅是参考用的，建议还是自己按照提示一点一点写，如果实在卡住了再看答案。个人觉得这样才是正确的学

Building your Deep Neural Network: Step by Step 吴恩达老师第一课第四周作业

hdhuangzhihao的博客

12-07 454

Building your Deep Neural Network: Step by Step Welcome to your week 4 assignment (part 1 of 2)! You have previously trained a 2-layer Neural Network (with a single hidden layer). This week, you will

吴恩达深度学习1-4课后作业1 Building your Deep Neural Network: Step by Step

别说话写代码的博客

11-24 2665

Building your Deep Neural Network: Step by Step Welcome to your week 4 assignment (part 1 of 2)! You have previously trained a 2-layer Neural Network (with a single hidden layer). This week, you will

参数传递格式问题

qq_42377412的博客

09-16 524

关于参数传递加引号问题，是不是传字符类型的值必须加引号那系统是怎么识别是什么类型的呢比如def info（name，age）是不是调用的时候参数name就需加引号遇到了下面的问题： def linear_activation_forward(A_prev,W,b,activation): """ 实现LINREAR->ACTIVATION这一层的前向传播 :pa...

Python中的numpy.append（）

从零开始的教程世界

07-17 1万+

Python numpy append() function is used to merge two arrays. This function returns a new array and the original array remains unchanged. Python numpy append（）函数用于合并两个数组。该函数返回一个新数组，原始数组保持不变。 NumPy ap...