吴恩达《神经网络和深度学习》第四周编程作业—构建深度神经网络

最新推荐文章于 2024-05-15 10:23:13 发布

Roar冷颜

最新推荐文章于 2024-05-15 10:23:13 发布

阅读量3k

点赞数 2

分类专栏：人工智能学习之深度学习文章标签：深度学习

本文链接：https://blog.csdn.net/qq_29923461/article/details/121330403

版权

人工智能学习之深度学习专栏收录该内容

7 篇文章 13 订阅

订阅专栏

吴恩达《神经网络和深度学习》— 构建深度神经网络

1 安装包
2 构建深度神经网络的框架
3 初始化
- 3.1 两层神经网络参数的初始化
- 3.2 L层神经网络参数的初始化
4 前向传播模块
5 损失函数
6 后向传播模块

※※※※※上一篇：【用一层隐藏层的神经网络分类二维数据】※※※※※下一篇：【深度神经网络应用–Cat or Not】※※※※※

在上一篇教程中我们已经训练了一个两层的神经网络（只有一个隐藏层）。这篇文章，我们将学会构建一个任意层数的深度神经网络，并实现构建深度神经网络所需的所有函数！

学完本篇文章将掌握的技能：
$\bullet$ 使用ReLU等非线性单位来改善模型
$\bullet$ 建立更深的神经网络（具有1个以上的隐藏层）
$\bullet$ 实现一个易于使用的神经网络类

本文所使用的资料：【点击下载】，提取码：hwwc。请在开始之前下载好所需资料，然后将文件解压到你的代码文件同一级目录下，请确保你的代码那里有dnn_utils.py、testCases.py 和 lr_utils.py 文件。

【符号说明】：

$\bullet$ 上标 $\left [ l \right ]$ 表示与 $l^{th}$ 层相关的数量。
- 示例： $a^{\left [ L \right ]}$ 是 $L^{th}$ 层的激活。 $W^{\left [ L \right ]}$ 和 $b^{\left [ L \right ]}$ 是 $L^{th}$ 层参数。
$\bullet$ 上标 $\left ( i \right )$ 表示与 $i^{th}$ 示例相关的数量。
- 示例： $x^{\left ( i \right )}$ 是 $i^{th}$ 的训练数据。
$\bullet$ 下标 $i$ 表示 $i^{th}$ 的向量。
- 示例： $a_{i}^{\left [ l \right ]}$ 表示 $l^{th}$ 层激活的 $i^{th}$ 输入。

1 安装包

在开始之前我们需要准备一些软件包：

import numpy as np
import h5py
import matplotlib.pyplot as plt
import testCases                                                        # 参见资料包
from dnn_utils import sigmoid, sigmoid_backward, relu, relu_backward    # 参见资料包
import lr_utils                                                         # 参见资料包

为了和我的数据匹配，你需要指定随机种子。

np.random.seed(1)

2 构建深度神经网络的框架

为了构建深度神经网络，我们需要实现几个“辅助函数”。这些辅助函数将在下一篇文章【深度神经网络应用–图像分类】中使用，用来构建一个两层神经网络和一个L层的神经网络。

构建深度神经网络的流程如下所示：

$\bullet$ 初始化两层的神经网络和 $L$ 层的神经网络的参数。

$\bullet$ 实现正向传播模块（在下图中以紫色显示）。
- 完成模型正向传播步骤的LINEAR部分（ $Z^{\left [ l \right ]}$ ）。
- 提供使用的ACTIVATION函数（relu / Sigmoid）。
- 将前两个步骤合并为新的[LINEAR-> ACTIVATION]前向函数。
- 堆叠[LINEAR-> RELU]正向函数L-1次（第1到L-1层），并在末尾添加[LINEAR-> SIGMOID]（最后的层）。这合成了一个新的L_model_forward函数。

$\bullet$ 计算损失。

$\bullet$ 实现反向传播模块（在下图中以红色表示）。
- 完成模型反向传播步骤的LINEAR部分。
- 提供的ACTIVATE函数的梯度（relu_backward / sigmoid_backward）。
- 将前两个步骤组合成新的[LINEAR-> ACTIVATION]反向函数。
- 将[LINEAR-> RELU]向后堆叠L-1次，并在新的L_model_backward函数中后向添加[LINEAR-> SIGMOID]。

$\bullet$ 最后更新参数。

在这里插入图片描述

【注意】：对于每个正向函数，都有一个对应的反向函数。这也是为什么在正向传播模块的每一步都将一些值存储在缓存中的原因。在反向传播模块中，将使用缓存的值来计算梯度。

3 初始化

首先编写两个辅助函数用来初始化模型的参数。第一个函数将用于初始化两层模型的参数。第二个将把初始化过程推广到 $L$ 层模型上。

3.1 两层神经网络参数的初始化

【说明】：

$\bullet$ 模型的结构为：LINEAR -> RELU -> LINEAR -> SIGMOID。
$\bullet$ 随机初始化权重矩阵。确保准确的维度，使用 np.random.randn(shape)* 0.01。
$\bullet$ 将偏差初始化为0。使用 np.zeros(shape)。

【代码】：

# GRADED FUNCTION: initialize_parameters

def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer

    Returns:
    parameters -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """

    np.random.seed(1)

    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.zeros((n_h, 1))
    W2 = np.random.randn(n_y, n_h) * 0.01
    b2 = np.zeros((n_y, 1))

    assert (W1.shape == (n_h, n_x))
    assert (b1.shape == (n_h, 1))
    assert (W2.shape == (n_y, n_h))
    assert (b2.shape == (n_y, 1))

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters

初始化完成我们来测试一下：

【测试】：

print("==============测试initialize_parameters==============")
parameters = initialize_parameters(3, 2, 1)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

【结果】：

==============测试initialize_parameters==============
W1 = [[ 0.01624345 -0.00611756 -0.00528172]
 [-0.01072969  0.00865408 -0.02301539]]
b1 = [[0.]
 [0.]]
W2 = [[ 0.01744812 -0.00761207]]
b2 = [[0.]]

3.2 L层神经网络参数的初始化

更深的L层神经网络的初始化更加复杂，因为存在更多的权重矩阵和偏差向量。完成 initialize_parameters_deep后，应确保各层之间的维度匹配。回想一下， $n^{\left [ l \right ]}$ 是 $l$ 层中的神经元数量。因此，如果我们输入的 $X$ 的大小为 $(12288, 209)$ （以 $m = 2009$ 为例），则：

在这里插入图片描述

当我们在python中计算 $(W X + b)$ 时，使用广播，比如：

在这里插入图片描述
则：

在这里插入图片描述

【说明】：

$\bullet$ 模型的结构为 [LINEAR -> RELU] (L-1) -> LINEAR -> SIGMOID。也就是说，前 $L - 1$ 层使用ReLU作为激活函数，最后一层采用sigmoid激活函数输出。
$\bullet$ 随机初始化权重矩阵。使用np.random.rand(shape)* 0.01。
$\bullet$ 零初始化偏差。使用np.zeros(shape)。
$\bullet$ 我们将在layer_dims变量中存储 $n^{\left [ l \right ]}$ ，即不同层中的神经元数。例如，上篇文章中“二维数据分类模型”的layer_dims为[2,4,1]：即一个样本数据包含2个特征，一个隐藏层包含4个隐藏单元，一个输出层包含1个输出单元。因此，W1的维度为（4,2），b1的维度为（4,1），W2的维度为（1,4），而b2的维度为（1,1）。现在把它应用到 $L$ 层。

【代码】：

# GRADED FUNCTION: initialize_parameters_deep

def initialize_parameters_deep(layer_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the dimensions of each layer in our network

    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
                    bl -- bias vector of shape (layer_dims[l], 1)
    """

    np.random.seed(3)
    parameters = {}
    L = len(layer_dims)  # number of layers in the network

    for l in range(1, L):
        parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * 0.01
        parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))
        
        assert (parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l - 1]))
        assert (parameters['b' + str(l)].shape == (layer_dims[l], 1))

    return parameters

测试一下：

【测试】：

# 测试initialize_parameters_deep
print("==============测试initialize_parameters_deep==============")
layers_dims = [5, 4, 3]
parameters = initialize_parameters_deep(layers_dims)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

【结果】：

==============测试initialize_parameters_deep==============
W1 = [[ 0.01788628  0.0043651   0.00096497 -0.01863493 -0.00277388]
 [-0.00354759 -0.00082741 -0.00627001 -0.00043818 -0.00477218]
 [-0.01313865  0.00884622  0.00881318  0.01709573  0.00050034]
 [-0.00404677 -0.0054536  -0.01546477  0.00982367 -0.01101068]]
b1 = [[0.]
 [0.]
 [0.]
 [0.]]
W2 = [[-0.01185047 -0.0020565   0.01486148  0.00236716]
 [-0.01023785 -0.00712993  0.00625245 -0.00160513]
 [-0.00768836 -0.00230031  0.00745056  0.01976111]]
b2 = [[0.]
 [0.]
 [0.]]

我们分别构建了两层和多层神经网络的初始化参数的函数，现在我们开始构建正向传播函数。

4 前向传播模块

首先实现一些基本函数，用于稍后的模型实现。按以下顺序完成三个函数：

$\bullet$ LINEAR
$\bullet$ LINEAR -> ACTIVATION，其中激活函数采用ReLU或Sigmoid。
$\bullet$ [LINEAR -> RELU] (L-1) -> LINEAR -> SIGMOID（整个模型）。

4.1 线性前向

线性前向模块（在所有数据中均进行向量化）的计算按照以下公式： $Z^{\left [ l \right ]}=W^{\left [ l \right ]}A^{\left [ l-1 \right ]}+b^{\left [ l \right ]}$ 其中 $A^{\left [ 0 \right ]} = X$ 。

前向传播中，线性部分计算如下：

【代码】：

# GRADED FUNCTION: linear_forward

def linear_forward(A, W, b):
    """
    Implement the linear part of a layer's forward propagation.

    Arguments:
    A -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)

    Returns:
    Z -- the input of the activation function, also called pre-activation parameter
    cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently
    """

    Z = np.dot(W, A) + b

    assert (Z.shape == (W.shape[0], A.shape[1]))
    cache = (A, W, b)

    return Z, cache

测试一下线性部分：

【测试】：

# 测试linear_forward
print("==============测试linear_forward==============")
A, W, b = testCases.linear_forward_test_case()
Z, linear_cache = linear_forward(A, W, b)
print("A = " + str(A))
print("W = " + str(W))
print("b = " + str(b))
print("Z = " + str(Z))

【结果】：

==============测试linear_forward==============
A = [[ 1.62434536 -0.61175641]
 [-0.52817175 -1.07296862]
 [ 0.86540763 -2.3015387 ]]
W = [[ 1.74481176 -0.7612069   0.3190391 ]]
b = [[-0.24937038]]
Z = [[ 3.26295337 -1.23429987]]

4.2 前向线性激活

我们将使用两个激活函数：

$\bullet$ Sigmoid： $\sigma \left ( Z \right )=\sigma \left ( WA+b \right )=\frac{1}{1+e^{-\left ( WA+b \right )}}$ 该函数返回两项值：激活值"a"和包含"Z"的"cache"（这是我们将馈入到相应的反向函数的内容，用于求解梯度）。可以按下述方式得到两项值：

A, activation_cache = sigmoid(Z)

$\bullet$ ReLU： $RELU\left ( Z \right ) = max\left ( 0,Z \right )$ 该函数返回两项值：激活值“A”和包含“Z”的“cache”（这是我们将馈入到相应的反向函数的内容，用于求解梯度）。可以按下述方式得到两项值：

A, activation_cache = relu(Z)

为了更加方便，我们把两个函数（线性和激活）组合为一个函数（LINEAR-> ACTIVATION）。因此，我们将实现一个函数用以执行LINEAR前向步骤和ACTIVATION前向步骤。

【说明】：实现 LINEAR->ACTIVATION 层的前向传播。数学表达式为： $A^{\left [ l \right ]}=g^{\left [ l \right ]}\left ( Z^{\left [ l \right ]} \right )=g^{\left [ l \right ]}\left ( W^{\left [ l \right ]}A^{\left [ l-1 \right ]}+b^{\left [ l \right ]} \right )$ 其中激活"g" 可以是sigmoid()或relu()。

【代码】：

# GRADED FUNCTION: linear_activation_forward

def linear_activation_forward(A_prev, W, b, activation):
    """
    Implement the forward propagation for the LINEAR->ACTIVATION layer

    Arguments:
    A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

    Returns:
    A -- the output of the activation function, also called the post-activation value 
    cache -- a python dictionary containing "linear_cache" and "activation_cache";
             stored for computing the backward pass efficiently
    """

    if activation == "sigmoid":
        # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = sigmoid(Z)

    elif activation == "relu":
        # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = relu(Z)

    assert (A.shape == (W.shape[0], A_prev.shape[1]))
    cache = (linear_cache, activation_cache)

    return A, cache

【测试】：

# 测试linear_activation_forward
print("==============测试linear_activation_forward==============")
A_prev, W, b = testCases.linear_activation_forward_test_case()
print("A_prev = " + str(A_prev))
print("W = " + str(W))
print("b = " + str(b))

A, linear_activation_cache = linear_activation_forward(A_prev, W, b, activation="sigmoid")
print("sigmoid，A = " + str(A))

A, linear_activation_cache = linear_activation_forward(A_prev, W, b, activation="relu")
print("ReLU，A = " + str(A))

【结果】：

==============测试linear_activation_forward==============
A_prev = [[-0.41675785 -0.05626683]
 [-2.1361961   1.64027081]
 [-1.79343559 -0.84174737]]
W = [[ 0.50288142 -1.24528809 -1.05795222]]
b = [[-0.90900761]]
sigmoid，A = [[0.96890023 0.11013289]]
ReLU，A = [[3.43896131 0.        ]]

【注意】：在深度学习中，"[LINEAR->ACTIVATION]"计算被视为神经网络中的单个层，而不是两个层。

4.3 L层模型

我们把两层模型需要的前向传播函数做完了，那多层网络模型的前向传播是怎样的呢？我们调用上面的那两个函数来实现它，为了在实现L层神经网络时更加方便，我们需要一个函数来复制前一个函数（带有RELU的linear_activation_forward）L-1次，然后用一个带有SIGMOID的linear_activation_forward跟踪它，我们来看一下它的结构是怎样的：
在这里插入图片描述

在下面的代码中，变量AL表示 $A^{\left [ L \right ]}=\sigma \left ( Z^{\left [ L \right ]} \right )=\sigma \left ( W^{\left [ L \right ]}A^{\left [ L-1 \right ]}+b^{\left [ L \right ]} \right )$ 有时也称为Yhat，即 $\hat{Y}$ 。

【代码】：

# GRADED FUNCTION: L_model_forward

def L_model_forward(X, parameters):
    """
    Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation

    Arguments:
    X -- data, numpy array of shape (input size, number of examples)
    parameters -- output of initialize_parameters_deep()

    Returns:
    AL -- last post-activation value
    caches -- list of caches containing:
                every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)
                the cache of linear_sigmoid_forward() (there is one, indexed L-1)
    """

    caches = []
    A = X
    L = len(parameters) // 2  # number of layers in the neural network

    # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
    for l in range(1, L):
        A_prev = A
        A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)],
                                             activation="relu")
        caches.append(cache)

    # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
    AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation="sigmoid")
    caches.append(cache)

    assert (AL.shape == (1, X.shape[1]))

    return AL, caches

【测试】：

# 测试L_model_forward
print("==============测试L_model_forward==============")
X, parameters = testCases.L_model_forward_test_case()
AL, caches = L_model_forward(X, parameters)
print("X = " + str(X))
print("parameters = " + str(parameters))
print("AL = " + str(AL))
print("caches 的长度为 = " + str(len(caches)))
print("caches = " + str(caches))

【结果】：

==============测试L_model_forward==============
X = [[ 1.62434536 -0.61175641]
 [-0.52817175 -1.07296862]
 [ 0.86540763 -2.3015387 ]
 [ 1.74481176 -0.7612069 ]]
parameters = {'W1': array([[ 0.3190391 , -0.24937038,  1.46210794, -2.06014071],
       [-0.3224172 , -0.38405435,  1.13376944, -1.09989127],
       [-0.17242821, -0.87785842,  0.04221375,  0.58281521]]), 'b1': array([[-1.10061918],
       [ 1.14472371],
       [ 0.90159072]]), 'W2': array([[ 0.50249434,  0.90085595, -0.68372786]]), 'b2': array([[-0.12289023]])}
AL = [[0.17007265 0.2524272 ]]
caches 的长度为 = 2
caches = [((array([[ 1.62434536, -0.61175641],
       [-0.52817175, -1.07296862],
       [ 0.86540763, -2.3015387 ],
       [ 1.74481176, -0.7612069 ]]), array([[ 0.3190391 , -0.24937038,  1.46210794, -2.06014071],
       [-0.3224172 , -0.38405435,  1.13376944, -1.09989127],
       [-0.17242821, -0.87785842,  0.04221375,  0.58281521]]), array([[-1.10061918],
       [ 1.14472371],
       [ 0.90159072]])), array([[-2.77991749, -2.82513147],
       [-0.11407702, -0.01812665],
       [ 2.13860272,  1.40818979]])), ((array([[0.        , 0.        ],
       [0.        , 0.        ],
       [2.13860272, 1.40818979]]), array([[ 0.50249434,  0.90085595, -0.68372786]]), array([[-0.12289023]])), array([[-1.58511248, -1.08570881]]))]

现在，我们有了一个完整的前向传播模块，它接受输入 $X$ 并输出包含预测的行向量 $A^{\left [ L \right ]}$ 。它还将所有中间值记录在"caches"中以计算预测的损失值。

5 损失函数

我们已经把这两个模型的前向传播部分完成了，我们需要计算成本（损失），以确定它到底有没有在学习，使用以下公式计算交叉熵损失 $J$ ： $-\frac{1}{m}\sum_{i=1}^{m}\left ( y^{\left ( i \right )}log\left ( a^{\left [ L \right ]\left ( i \right )} \right ) +\left ( 1-y^{\left ( i \right )} \right )log\left ( 1-a^{\left [ L \right ]\left ( i \right )} \right ) \right )$

【代码】：

# GRADED FUNCTION: compute_cost

def compute_cost(AL, Y):
    """
    Implement the cost function defined by equation (7).

    Arguments:
    AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
    Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)

    Returns:
    cost -- cross-entropy cost
    """

    m = Y.shape[1]

    # Compute loss from aL and y.
    cost = -1 / m * np.sum(Y * np.log(AL) + (1 - Y) * np.log(1 - AL), axis=1, keepdims=True)

    cost = np.squeeze(cost)  # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
    assert (cost.shape == ())

    return cost

【测试】：

# 测试compute_cost
print("==============测试compute_cost==============")
Y, AL = testCases.compute_cost_test_case()
print("Y = " + str(Y))
print("AL = " + str(AL))
print("cost = " + str(compute_cost(AL, Y)))

【结果】：

==============测试compute_cost==============
Y = [[1 1 1]]
AL = [[0.8 0.9 0.4]]
cost = 0.41493159961539694

6 后向传播模块

后向传播用于计算损失函数相对于参数的梯度，我们来看看前向和后向传播的流程图：

在这里插入图片描述

如果对微积分有一定了解的话，我们知道可以使用微积分的链式规则来得出两层神经网络中的损失相对于 $z^{\left [ 1 \right ]}$ 的导数，如下所示： $dz^{\left [ 1 \right ]}=\frac{\partial L}{\partial z^{\left [ 1 \right ]}}=\frac{\partial L}{\partial a^{\left [ 2 \right ]}}\cdot \frac{\partial a^{\left [ 2 \right ]}}{\partial z^{\left [ 2 \right ]}}\cdot \frac{\partial z^{\left [ 2 \right ]}}{\partial a^{\left [ 1 \right ]}}\cdot \frac{\partial a^{\left [ 1 \right ]}}{\partial z^{\left [ 1 \right ]}} \tag1$

为了计算梯度 $dW^{\left [ 1 \right ]}$ ，可以在公式（1）的基础上再执行： $dW^{\left [ 1 \right ]} = dz^{\left [ 1 \right ]}\cdot \frac{\partial z^{\left [ 1 \right ]}}{\partial W^{\left [ 1 \right ]}} \tag2$

同样地，为了计算梯度 $db^{\left [ 1 \right ]}$ ，可以在公式（1）的基础上再执行： $db^{\left [ 1 \right ]} = dz^{\left [ 1 \right ]}\cdot \frac{\partial z^{\left [ 1 \right ]}}{\partial b^{\left [ 1 \right ]}} \tag3$

这也是为什么我们称之为反向传播。

现在，类似于前向传播，可以分三个步骤构建后向传播：

$\bullet$ LINEAR backward
$\bullet$ LINEAR -> ACTIVATION backward，其中激活函数使用ReLU或sigmoid的导数计算
$\bullet$ [LINEAR -> RELU] x (L-1) -> LINEAR -> SIGMOID backward（整个模型）

6.1 线性后向

对于层 $l$ ，线性部分为： $Z^{\left [ l \right ]} = W^{\left [ l \right ]}A^{\left [ l-1 \right ]}+b^{\left [ l \right ]}$ 。

在这里插入图片描述

假设已经计算出导数 $dZ^{\left [ l \right ]}=\frac{\partial L}{\partial Z^{\left [ l \right ]}}$ ，则需要根据输入 $dZ^{\left [ l \right ]}$ 计算三个输出 $dW^{\left [ l \right ]}$ 、 $db^{\left [ l \right ]}$ 和 $dA^{\left [ l-1 \right ]}$ 。所需要的公式如下： $dW^{\left [ l \right ]}=\frac{\partial L}{\partial W^{\left [ l \right ]} } = \frac{1}{m}dZ^{\left [ l \right ]}A^{\left [ l-1 \right ]T} \tag4$ $db^{\left [ l \right ]}=\frac{\partial L}{\partial b^{\left [ l \right ]} } = \frac{1}{m}\sum_{i=1}^{m}dZ^{\left [ l \right ]\left ( i \right )} \tag5$ $dA^{\left [ l-1 \right ]}=\frac{\partial L}{\partial A^{\left [ l-1 \right ]} } = W^{\left [ l \right ]T}dZ^{\left [ l \right ]} \tag6$

【代码】：

# GRADED FUNCTION: linear_backward

def linear_backward(dZ, cache):
    """
    Implement the linear portion of backward propagation for a single layer (layer l)

    Arguments:
    dZ -- Gradient of the cost with respect to the linear output (of current layer l)
    cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer

    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """
    A_prev, W, b = cache
    m = A_prev.shape[1]
    dW = 1 / m * np.dot(dZ, A_prev.T)
    db = 1 / m * np.sum(dZ, axis=1, keepdims=True)
    dA_prev = np.dot(W.T, dZ)

    assert (dA_prev.shape == A_prev.shape)
    assert (dW.shape == W.shape)
    assert (db.shape == b.shape)

    return dA_prev, dW, db

【测试】：

# 测试linear_backward
print("==============测试linear_backward==============")
dZ, linear_cache = testCases.linear_backward_test_case()

dA_prev, dW, db = linear_backward(dZ, linear_cache)
print("dA_prev = " + str(dA_prev))
print("dW = " + str(dW))
print("db = " + str(db))

【结果】：

==============测试linear_backward==============
dA_prev = [[ 0.51822968 -0.19517421]
 [-0.40506361  0.15255393]
 [ 2.37496825 -0.89445391]]
dW = [[-0.10076895  1.40685096  1.64992505]]
db = [[0.50629448]]

6.2 后向线性激活

为了帮助你实现linear_activation_backward，我们提供了两个反向函数：

$\bullet$ sigmoid_backward：实现SIGMOID单元的后向传播。你可以这样使用：

dZ = sigmoid_backward(dA, activation_cache)

$\bullet$ relu_backward：实现RELU单元的后向传播。你可以这样使用：

dZ = relu_backward(dA, activation_cache)

如果 $g\left ( \cdot \right )$ 是激活函数，则sigmoid_backward和relu_backward计算： $dZ^{\left [ l \right ]} = dA^{\left [ l \right ]}\ast g^{'}\left ( Z^{\left [ l \right ]} \right )$

【代码】：

# GRADED FUNCTION: linear_activation_backward

def linear_activation_backward(dA, cache, activation):
    """
    Implement the backward propagation for the LINEAR->ACTIVATION layer.

    Arguments:
    dA -- post-activation gradient for current layer l
    cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """
    linear_cache, activation_cache = cache

    if activation == "relu":
        dZ = relu_backward(dA, activation_cache)
        dA_prev, dW, db = linear_backward(dZ, linear_cache)

    elif activation == "sigmoid":
        dZ = sigmoid_backward(dA, activation_cache)
        dA_prev, dW, db = linear_backward(dZ, linear_cache)

    return dA_prev, dW, db

【测试】：

# 测试linear_activation_backward
print("==============测试linear_activation_backward==============")
AL, linear_activation_cache = testCases.linear_activation_backward_test_case()

dA_prev, dW, db = linear_activation_backward(AL, linear_activation_cache, activation="sigmoid")
print("sigmoid:")
print("dA_prev = " + str(dA_prev))
print("dW = " + str(dW))
print("db = " + str(db) + "\n")

dA_prev, dW, db = linear_activation_backward(AL, linear_activation_cache, activation="relu")
print("relu:")
print("dA_prev = " + str(dA_prev))
print("dW = " + str(dW))
print("db = " + str(db))

【结果】：

==============测试linear_activation_backward==============
sigmoid:
dA_prev = [[ 0.11017994  0.01105339]
 [ 0.09466817  0.00949723]
 [-0.05743092 -0.00576154]]
dW = [[ 0.10266786  0.09778551 -0.01968084]]
db = [[-0.05729622]]

relu:
dA_prev = [[ 0.44090989  0.        ]
 [ 0.37883606  0.        ]
 [-0.2298228   0.        ]]
dW = [[ 0.44513824  0.37371418 -0.10478989]]
db = [[-0.20837892]]

6.3 后向L层模型

现在，你将为整个网络实现后向传播函数。回想一下，当实现L_model_forward函数时，在每次迭代中，都存储了一个包含（A，W，b和Z）的缓存。在后向传播模块中，我们将使用这些变量来计算梯度。因此，在L_model_backward函数中，我们将从 $L$ 层开始向后遍历所有隐藏层。在每个步骤中，我们都将使用 $l$ 层的缓存值后向传播到层 $l$ 。下图展示了后向传播过程。

在这里插入图片描述

对于输出层，有： $A^{\left [ L \right ]} = \sigma \left ( Z^{\left [ L \right ]} \right )$ ，所以我们首先需要计算dAL，可以使用下面的代码来计算它：

dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))

然后，就可以使用此激活后的梯度dAL继续后向传播。如上图所示，你现在可以将dAL输入到你实现的LINEAR-> SIGMOID后向函数中（它将使用L_model_forward函数存储的缓存值）。之后，你得通过for循环，使用LINEAR-> RELU后向函数迭代所有其他层。同时将每个dA，dW和db存储在grads词典中。

【代码】：

# GRADED FUNCTION: L_model_backward

def L_model_backward(AL, Y, caches):
    """
    Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group

    Arguments:
    AL -- probability vector, output of the forward propagation (L_model_forward())
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat)
    caches -- list of caches containing:
                every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)
                the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])

    Returns:
    grads -- A dictionary with the gradients
             grads["dA" + str(l)] = ...
             grads["dW" + str(l)] = ...
             grads["db" + str(l)] = ...
    """
    grads = {}
    L = len(caches)  # the number of layers
    m = AL.shape[1]
    Y = Y.reshape(AL.shape)  # after this line, Y is the same shape as AL

    # Initializing the backpropagation
    dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))

    # Lth layer (SIGMOID -> LINEAR) gradients. 
    # Inputs: "AL, Y, caches". 
    # Outputs: "grads["dAL"], grads["dWL"], grads["dbL"]
    current_cache = caches[L - 1]
    grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = \
        linear_activation_backward(dAL, current_cache, activation="sigmoid")

    for l in reversed(range(L - 1)):
        # lth layer: (RELU -> LINEAR) gradients.
        # Inputs: "grads["dA" + str(l + 2)], caches". 
        # Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] 
        current_cache = caches[l]
        dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l + 2)], 
                                                                    current_cache,
                                                                    activation="relu")
        grads["dA" + str(l + 1)] = dA_prev_temp
        grads["dW" + str(l + 1)] = dW_temp
        grads["db" + str(l + 1)] = db_temp
        
    return grads

【测试】：

# 测试L_model_backward
print("==============测试L_model_backward==============")
AL, Y_assess, caches = testCases.L_model_backward_test_case()
grads = L_model_backward(AL, Y_assess, caches)
print("dW1 = " + str(grads["dW1"]))
print("db1 = " + str(grads["db1"]))
print("dA1 = " + str(grads["dA1"]))

【结果】：

==============测试L_model_backward==============
dW1 = [[0.41010002 0.07807203 0.13798444 0.10502167]
 [0.         0.         0.         0.        ]
 [0.05283652 0.01005865 0.01777766 0.0135308 ]]
db1 = [[-0.22007063]
 [ 0.        ]
 [-0.02835349]]
dA1 = [[ 0.          0.52257901]
 [ 0.         -0.3269206 ]
 [ 0.         -0.32070404]
 [ 0.         -0.74079187]]

6.4 更新参数

最后，使用梯度下降来更新模型的参数： $W^{\left [ l \right ]} = W^{\left [ l \right ]} - \alpha dW^{\left [ l \right ]} \tag7$ $b^{\left [ l \right ]} = b^{\left [ l \right ]} - \alpha db^{\left [ l \right ]} \tag8$ 其中 $\alpha$ 是学习率。在计算更新的参数后，将它们存储在参数字典中。

【代码】：

# GRADED FUNCTION: update_parameters

def update_parameters(parameters, grads, learning_rate):
    """
    Update parameters using gradient descent

    Arguments:
    parameters -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients, output of L_model_backward

    Returns:
    parameters -- python dictionary containing your updated parameters 
                  parameters["W" + str(l)] = ... 
                  parameters["b" + str(l)] = ...
    """

    L = len(parameters) // 2  # number of layers in the neural network

    # Update rule for each parameter. Use a for loop.
    for l in range(L):
        parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l + 1)]
        parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l + 1)]

    return parameters

【测试】：

# 测试update_parameters
print("==============测试update_parameters==============")
parameters, grads = testCases.update_parameters_test_case()
parameters = update_parameters(parameters, grads, 0.1)

print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

【结果】：

==============测试update_parameters==============
W1 = [[-0.59562069 -0.09991781 -2.14584584  1.82662008]
 [-1.76569676 -0.80627147  0.51115557 -1.18258802]
 [-1.0535704  -0.86128581  0.68284052  2.20374577]]
b1 = [[-0.04659241]
 [-1.28888275]
 [ 0.53405496]]
W2 = [[-0.55569196  0.0354055   1.32964895]]
b2 = [[-0.84610769]]

至此，我们构建了深度神经网络所需的所有函数。

Roar冷颜

关注

2
点赞
踩
9

收藏

觉得还不错? 一键收藏
打赏
1
评论
吴恩达《神经网络和深度学习》第四周编程作业—构建深度神经网络

吴恩达《神经网络和深度学习》— 构建深度神经网络1 安装包2 构建深度神经网络的框架3 初始化3.1 两层神经网络参数的初始化3.2 L层神经网络参数的初始化4 前向传播模块4.1 线性前向4.2 前向线性激活4.3 L层模型5 损失函数6 后向传播模块※※※※※上一篇：【用一层隐藏层的神经网络分类二维数据】※※※※※ 在上一篇教程中我们已经训练了一个两层的神经网络（只有一个隐藏层）。这篇文章，我们将学会构建一个任意层数的深度神经网络，并实现构建深度神经网络所需的所有函数！学完本篇文章将掌握的技
复制链接

扫一扫