吴恩达深度学习_1_Week4深层神经网络(1)

C夹夹

已于 2023-11-23 19:52:43 修改

阅读量51

点赞数

分类专栏：吴恩达深度学习文章标签：深度学习神经网络人工智能

于 2023-11-07 21:53:23 首次发布

本文链接：https://blog.csdn.net/zxy0000zxy/article/details/134276245

版权

吴恩达深度学习专栏收录该内容

23 篇文章 1 订阅

订阅专栏

创建深度神经网络

@[TOC](创建深度神经网络)

任务大纲
一、包
二、初始化
1、两层神经网络---实现一个二层神经网络并初始化它的参数
2、L层神经网络

三、前向传播模型
1、线性前向传播
2、线性激活正向传播
3、L层神经网络模型

4、代价函数
5、后向传播模型
1、线性后向传播
2、线性激励后向传播
1、L-模型后向传播

5、更新参数

1、本模型所用包
2、初始化–两层、L层神经网络
3、前向传播模型
4、后向传播模型
5、更新参数

第一门课：神经网络和深度学习
第四周：深层神经网络

任务大纲

1、为了L层神经网络对两层网络进行初始化
2、实现前向传播模块
1）完成层前向传播步骤的LINEAR部分
2）运用ACTIVATION函数（relu/sigmoid）
3）将前两个步骤合并到一个新的 [LINEAR->ACTIVATION] 转发函数中
4）将[LINEAR->RELU]前向函数L-1时间堆叠（对于第1层到L-1），并在末尾添加一个[LINEAR->SIGMOID]（对于最后一层 L)，提供新的L_model_forward函数
3、计算损失
4、实现向后传播模块
1）完成层向后传播步骤的LINEAR部分
2）运用提供 ACTIVATE 函数的梯度（relu_backward/sigmoid_backward）
3）将前两个步骤合并到一个新的 [LINEAR->ACTIVATION] 反向函数中
4）将[LINEAR->RELU]向后堆叠 L-1 次，并在新L_mo中向后添加 [LINEAR->SIGMOID]
5、更新参数
在这里插入图片描述

注意：对于每个前向函数，都有一个相应的后向函数。这就是为什么在转发模块的每一步中，都会在缓存中存储一些值。缓存的值对于计算梯度非常有用。然后，在反向传播模块中，将使用缓存来计算梯度。

一、包

import numpy as np
import h5py
import matplotlib.pyplot as plt
from testCases_v2 import *
from dnn_utils_v2 import sigmoid, sigmoid_backward, relu, relu_backward   # 本项目中基本函数

plt.rcParams['figure.figsize'] = (5.0, 4.0)        # 设置绘图的默认大小
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
np.random.seed(1)   # 保持所有随机函数调用的一致性

代码解释：

用于配置 Matplotlib 库的绘图参数，以影响后续绘图的默认行为
1. plt.rcParams['figure.figsize'] = (5.0, 4.0)：设置了绘图的默认大小
plt.rcParams 是 Matplotlib 库的参数配置字典，将绘图的默认宽度设置为 5.0，高度设置为 4.0
2. plt.rcParams['image.interpolation'] = 'nearest'：设置图像的插值方式，默认为 'nearest'
插值方式决定了在图像缩放或旋转等操作中如何进行像素的插值补全，'nearest' 表示使用最近邻插值，即使用最接近的像素值进行补全
3. plt.rcParams['image.cmap'] = 'gray'：设置图像的色彩映射，默认为灰度色彩映射。'gray' 表示将图像显示为灰度色调，即彩色图像转为灰度图像
这段代码的作用是配置 Matplotlib 库的绘图参数，以便在后续的绘图操作中使用指定的默认图像大小、图像插值方式和图像色彩映射

二、初始化

通过两个辅助函数来进行模型参数的初始化.第一个函数用来初始化两层模型,第二个函数将会把这个初始化的过程推广的L层

1、两层神经网络—实现一个二层神经网络并初始化它的参数

1、这个模型的结构是:LINEAR -> RELU -> LINEAR -> SIGMOID
2、使用随机数来初始化权值矩阵,使用正确的维度来运行np.random.randn(shape)*0.01
3、使用零初始化偏差矩阵np.zeros(shape)

def initialize_parameters(n_x, n_h, n_y):
    np.random.seed(1)
    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.zeros(shape=(n_h, 1))
    W2 = np.random.randn(n_y, n_h) * 0.01
    b2 = np.zeros(shape=(n_y, 1))     

    assert (W1.shape == (n_h, n_x))
    assert (b1.shape == (n_h, 1))
    assert (W2.shape == (n_y, n_h))
    assert (b2.shape == (n_y, 1))
    
    parameters = {"W1": W1,"b1": b1,"W2": W2,"b2": b2}
    return parameters
    
parameters = initialize_parameters(2,2,1)   # 初始化n_x为2，n_h为2，n_y为1
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

输出结果应为：
在这里插入图片描述

2、L层神经网络

1、模型的结构是[LINEAR -> RELU]×(L-1)->LINEA -> SIGMOID,
2、共有L-1层使用Relu函数来做i未激活函数最后接一个sigmoid函数来作为输出层的激活函数
3、使用随机数来初始化权值矩阵.使用np.random.rand(shape) * 0.01，使用np.zeros(shape)
4、在变量layerdims中,我们将会在存储同层的神经元数量
对于𝐿=1的应用：if L == 1:
parameters[“W” + str(L)] = np.random.randn(layer_dims[1], layer_dims[0]) * 0.01
parameters[“b” + str(L)] = np.zeros((layer_dims[1], 1)) 在这里插入图片描述

def initialize_parameters_deep(layer_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the dimensions of each layer in our network
    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
    Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
    bl -- bias vector of shape (layer_dims[l], 1)
    """
    np.random.seed(3)
    parameters = {}
    L = len(layer_dims)  # 网络的层数

    for l in range(1, L):
        parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * 0.01
        parameters["b" + str(l)] = np.zeros(shape=(layer_dims[l], 1))
        
        assert (parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l - 1]))
        assert (parameters['b' + str(l)].shape == (layer_dims[l], 1))
    return parameters
    
parameters = initialize_parameters_deep([5,4,3])
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

在这里插入图片描述

三、前向传播模型

1、线性前向传播

在这里插入图片描述
TIP：合理运用np.dot()和W.shape

def linear_forward(A, W, b):
    """
    Arguments:
    A -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)
    Returns:
    Z -- the input of the activation function, also called pre-activation parameter
    cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently
    """
    Z = np.dot(W, A) + b
    assert (Z.shape == (W.shape[0], A.shape[1]))
    cache = (A, W, b)
    return Z, cache
    
A, W, b = linear_forward_test_case()
Z, linear_cache = linear_forward(A, W, b)
print("Z = " + str(Z))

在这里插入图片描述

2、线性激活正向传播

在这里插入图片描述

def linear_activation_forward(A_prev, W, b, activation):
    """
    Arguments:
    A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
    Returns:
    A -- the output of the activation function, also called the post-activation value
    cache -- a python dictionary containing "linear_cache" and "activation_cache";
             stored for computing the backward pass efficiently
    """
    if activation == "sigmoid":
        Z = np.dot(W, A_prev) + b
        A, activation_cache = sigmoid(Z)
        
    elif activation == "relu":
        A,activation_cache = relu(np.dot(W,A_prev) + b)
        
    assert (A.shape == (W.shape[0], A_prev.shape[1]))
    cache = (linear_cache, activation_cache)
    return A, cache
    
A_prev, W, b = linear_activation_forward_test_case()
A, linear_activation_cache = linear_activation_forward(A_prev, W, b, activation="sigmoid")
print("With sigmoid: A = " + str(A))
A, linear_activation_cache = linear_activation_forward(A_prev, W, b, activation="relu")
print("With ReLU: A = " + str(A))

在这里插入图片描述

3、L层神经网络模型

将之前的实现函数(linear_activation_forward with RELU)重复L-1次
最后接入一个linear_activation_forward with Sigmoid
不要忘记跟踪“缓存”列表中的缓存。要向列表添加新值 c，可以使用 list.append（c）
在这里插入图片描述

def L_model_forward(X, parameters):
    """
    Arguments:
    X -- data, numpy array of shape (input size, number of examples)
    parameters -- output of initialize_parameters_deep()
    Returns:
    AL -- last post-activation value
    caches -- list of caches containing:
                every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)
                the cache of linear_sigmoid_forward() (there is one, indexed L-1)
    """
    caches = []
    A = X
    L = len(parameters) // 2
    # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
    for l in range(1, L):
        A_prev = A
        A, cache = linear_activation_forward(A_prev, parameters["W" + str(l)],
                                             parameters["b" + str(l)], activation="relu")
        caches.append(cache)
        
    # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
    AL, cache = linear_activation_forward(A, parameters["W" + str(L)],
                                          parameters["b" + str(L)], activation="sigmoid")
    caches.append(cache)
    assert (AL.shape == (1, X.shape[1]))
    return AL, caches
    
X, parameters = L_model_forward_test_case()
AL, caches = L_model_forward(X, parameters)
print("AL = " + str(AL))
print("Length of caches list = " + str(len(caches)))

在这里插入图片描述
解释代码：

   在神经网络中进行前向传播计算，并将每一层的中间值缓存起来
    1. A_prev = A：将当前层的激活值 A 赋值给变量 A_prev，用于下一层的计算。在循环的第一次迭代中，A 的值是输入层的激活值
    2. A, cache = linear_activation_forward(A_prev, parameters["W" + str(l)], parameters["b" + str(l)], activation="relu")：
    调用了一个名为 linear_activation_forward 的函数,用于在当前层进行线性计算和激活函数计算,并返回计算得到的激活值 A 和缓存信息 cache
    函数的参数包括上一层的激活值 A_prev、当前层的权重参数 parameters["W" + str(l)]、当前层的偏置参数 parameters["b" + str(l)]，以及激活函数的类型（这里是 "relu"）
    3. caches.append(cache)：这部分代码将当前层的缓存信息 cache 添加到一个名为 caches 的列表中，用于后续的反向传播计算。每次循环迭代都会将当前层的缓存信息添加到列表中
    通过这段代码，循环遍历了神经网络的每一层（从第二层开始），在每一层进行线性计算和激活函数计算，并将每一层的激活值和缓存信息保存下来

4、代价函数

在这里插入图片描述

def compute_cost(AL, Y):
    """
    Arguments:
    AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
    Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)
    Returns:
    cost -- cross-entropy cost
    """
    m = Y.shape[1]
    cost = - np.sum(np.multiply(np.log(AL), Y) + np.multiply(np.log(1 - AL), 1 - Y)) / m
    cost = np.squeeze(cost)
    assert (cost.shape == ())
    return cost
    
Y, AL = compute_cost_test_case()
print("cost = " + str(compute_cost(AL, Y)))

cost = 0.414931599615397

5、后向传播模型

在这里插入图片描述

1、线性后向传播

在这里插入图片描述

def linear_backward(dZ, cache):
    """
    Arguments:
    dZ -- Gradient of the cost with respect to the linear output (of current layer l)
    cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer
    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """
    A_prev, W, b = cache
    m = A_prev.shape[1]
    dW = np.dot(dZ, A_prev.T) / m
    db = np.sum(dZ, axis=1, keepdims=True) / m
    dA_prev = np.dot(W.T, dZ)

    assert (dA_prev.shape == A_prev.shape)
    assert (dW.shape == W.shape)
    assert (db.shape == b.shape)
    
    return dA_prev, dW, db

dZ, linear_cache = linear_backward_test_case()
dA_prev, dW, db = linear_backward(dZ, linear_cache)
print("dA_prev = "+ str(dA_prev))
print("dW = " + str(dW))
print("db = " + str(db))

在这里插入图片描述

2、线性激励后向传播

在这里插入图片描述

def linear_activation_backward(dA, cache, activation):
    """
    Arguments:
    dA -- post-activation gradient for current layer l
    cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """
    linear_cache, activation_cache = cache
    if activation == "relu":
        dZ = relu_backward(dA, activation_cache)
        dA_prev, dW, db = linear_backward(dZ, linear_cache)
        
    elif activation == "sigmoid":
        dZ = sigmoid_backward(dA, activation_cache)
        dA_prev, dW, db = linear_backward(dZ, linear_cache)
    return dA_prev, dW, db

AL, linear_activation_cache = linear_activation_backward_test_case()
dA_prev, dW, db = linear_activation_backward(AL, linear_activation_cache, activation = "sigmoid")
print("sigmoid:")
print("dA_prev = "+ str(dA_prev))
print("dW = " + str(dW))
print("db = " + str(db) + "\n")
dA_prev, dW, db = linear_activation_backward(AL, linear_activation_cache, activation = "relu")
print("relu:")
print("dA_prev = "+ str(dA_prev))
print("dW = " + str(dW))
print("db = " + str(db))

在这里插入图片描述

1、L-模型后向传播

实现 L_model_forward 函数时，在每次迭代中，存储了一个包含（X，W，b 和 z）的缓存。在反向传播模块中，将使用这些变量来计算梯度。因此，在 L_model_backward 函数中，将从第 L 层开始向后遍历所有隐藏层每个步骤中，将使用图层 l 的缓存值通过第 L 层反向传播
在这里插入图片描述

使用此激活后梯度 dAL 继续向后移动。现在可以将 dAL 馈入您实现的 LINEAR->SIGMOID 反向函数（该函数将使用 L_model_forward 函数存储的缓存值）。之后，必须使用 for 循环使用 LINEAR->RELU 向后函数遍历所有其他层。应该将每个 dA、dW 和 db 存储在 grads 字典中。为此，请使用以下公式：
在这里插入图片描述

def L_model_backward(AL, Y, caches):
    """
    Arguments:
    AL -- probability vector, output of the forward propagation (L_model_forward())
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat)
    caches -- list of caches containing:
                every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)
                the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])
    Returns:
    grads -- A dictionary with the gradients
             grads["dA" + str(l)] = ...
             grads["dW" + str(l)] = ...
             grads["db" + str(l)] = ...
    """
    grads = {}
    L = len(caches)  # 层数
    m = AL.shape[1]
    Y = Y.reshape(AL.shape)  # after this line, Y is the same shape as AL

    # Initializing the backpropagation
    dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
    # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "AL, Y, caches". Outputs: "grads["dAL"], grads["dWL"], grads["dbL"]
    current_cache = caches[L-1]
    grads["dA" + str(L-1)], grads["dW" + str(L)], grads["db" + str(L)] =linear_activation_backward(dAL, current_cache,activation="sigmoid")

    for l in reversed(range(L - 1)):
        # lth layer: (RELU -> LINEAR) gradients.
        # Inputs: "grads["dA" + str(l + 2)], caches". Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)]
        grads["dA" + str(l)], grads["dW" + str(l + 1)], grads["db" + str(l + 1)] = linear_activation_backward(
            grads["dA" + str(l + 1)], caches[l], activation="relu")

    return grads
AL, Y_assess, caches = L_model_backward_test_case()
grads = L_model_backward(AL, Y_assess, caches)
print("dW1 = "+ str(grads["dW1"]))
print("db1 = "+ str(grads["db1"]))
print("dA0 = "+ str(grads["dA0"]))

在这里插入图片描述

5、更新参数

在这里插入图片描述

def update_parameters(parameters, grads, learning_rate):
    """
    Arguments:
    parameters -- python dictionary containing your parameters
    grads -- python dictionary containing your gradients, output of L_model_backward
    Returns:
    parameters -- python dictionary containing your updated parameters
                  parameters["W" + str(l)] = ...
                  parameters["b" + str(l)] = ...
    """

    L = len(parameters) // 2  # number of layers in the neural network

    # Update rule for each parameter. Use a for loop.
    for l in range(1, L + 1):
        parameters["W" + str(l)] = parameters["W" + str(l)] - learning_rate * grads["dW" + str(l)]
        parameters["b" + str(l)] = parameters["b" + str(l)] - learning_rate * grads["db" + str(l)]

    return parameters
parameters, grads = update_parameters_test_case()
parameters = update_parameters(parameters, grads, 0.1)

print("W1 = "+ str(parameters["W1"]))
print("b1 = "+ str(parameters["b1"]))
print("W2 = "+ str(parameters["W2"]))
print("b2 = "+ str(parameters["b2"]))