Course 1 神经网络和深度学习 Week4 搭建两层神经网络识别猫图

本文链接：https://blog.csdn.net/Reanon/article/details/99295101

基本元素符号约定

上标 $[l]$ 代表神经网络的层数 $l^{th}$ ，比如 $a^{[L]}$ 是 $[L]$ 层的激活， $W^{[L]}$ 是 $[L]$ 层的权重， $b^{[L]}$ 是 $[L]$ 层的偏置。
上标 $(i)$ 表示第 $i^{th}$ 个样本，比如 $x^{(i)}$ 是第 $i^{th}$ 个训练样本。
下标 $i$ 表示 $[l]$ 层的第 $i^{th}$ 项, 比如 $a^{[l]}_i$ 表示第 $l^{th}$ 层的第 $i^{th}$ 个激活项

	W的维度	b的维度	激活值的计算	激活值的维度
第1层	$n^{[1]},12288)$	$n^{[1]},1)$	$Z^{[1]} = W^{[1]} X + b^{[1]}$	$n^{[1]},209)$
第2层	$n^{[2]}, n^{[1]})$	$n^{[2]},1)$	$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}$	$n^{[2]}, 209)$
$\vdots$	$\vdots$	$\vdots$	$\vdots$	$\vdots$
第L-1层	$n^{[L-1]}, n^{[L-2]})$	$n^{[L-1]}, 1)$	$Z^{[L-1]} = W^{[L-1]} A^{[L-2]} + b^{[L-1]}$	$n^{[L-1]}, 209)$
第L层	$n^{[L]}, n^{[L-1]})$	$n^{[L]}, 1)$	$Z^{[L]} = W^{[L]} A^{[L-1]} + b^{[L]}$	$n^{[L]}, 209)$

一、原理

1.1 初始化参数

$W^{[l]}$ 和 $b^{[l]}$

1.2 前向传播

1.2.1 前向传播的线性部分 $W X + b$

$\begin{bmatrix} j & k & l\\ m & n & o \\ p & q & r \end{bmatrix}\;\;\; X = \begin{bmatrix} a & b & c\\ d & e & f \\ g & h & i \end{bmatrix} \;\;\; b =\begin{bmatrix} s \\ t \\ u \end{bmatrix}$
$\begin{bmatrix} (ja + kd + lg) + s & (jb + ke + lh) + s & (jc + kf + li)+ s\\ (ma + nd + og) + t & (mb + ne + oh) + t & (mc + nf + oi) + t\\ (pa + qd + rg) + u & (pb + qe + rh) + u & (pc + qf + ri)+ u \end{bmatrix}$
$Z^{[l]} = W^{[l]}A^{[l-1]} +b^{[l]}$ 其中 $A^{[0]} = X$

1.2.2 前向传播的线激活函数部分公式

激活函数 Sigmoid $\sigma(Z) = \sigma(W A + b) = \frac{1}{ 1 + e^{-(W A + b)}}$
激活函数 Relu $A = R E L U (Z) = m a x (0, Z)$
$A^{[l]} = g(Z^{[l]}) = g(W^{[l]}A^{[l-1]} +b^{[l]})$ 其中 $g ()$ 可以是 $s i g m o i d ()$ 也可以是 $r e l u ()$

1.3 计算误差

1.3.1 成本函数

$=-\frac{1}{m} \sum\limits_{i = 1}^{m} (y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L](i)}\right))$

1.4 反向传播

$A^{[L]}$ 它属于输出层，由 $A^{[L]} = \sigma(Z^{[L]})$ 得来， $A^{[L]}$ 相对于成本函数的导数 $dA^{[L]} = \frac{\partial \mathcal{L}}{\partial A^{[L]}}$ .

dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) # derivative of cost with respect to AL

1.4.1 反向传播激活函数部分公式

$dZ^{[L]} = \frac{\partial \mathcal{L} }{\partial Z^{[L]}}=dA^{[L]} * g'(Z^{[L]})$ 其中g’()是激活函数的导数

1.4.1 反向传播线性部分公式

这里假设已经得到了 $dZ^{[l]}$ :
$dW^{[L]} = \frac{\partial \mathcal{L} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[L]} A^{[L-1] T}$ $db^{[L]} = \frac{\partial \mathcal{L} }{\partial b^{[L]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[L](i)}$ $dZ^{[L-1]} =W^{[L] T} dZ^{[L]} * g'(Z^{[L-1]})$ $dW^{[L-1]} =\frac{1}{m} dZ^{[L-1]} A^{[L-2] T}$ $db^{[L-1]} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[L-1](i)}$ $\vdots$ $dZ^{[1]} =W^{[2] T} dZ^{[2]} * g'(Z^{1]})$ $dW^{1]} =\frac{1}{m} dZ^{[1]} A^{[0] T}$ $db^{[1]} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[1](i)}$
其中 $dA^{[L-1]} = \frac{\partial \mathcal{L} }{\partial A^{[l-1]}} = W^{[L] T} dZ^{[L]}$

1.5 更新参数

$W^{[l]} = W^{[l]} - \alpha \text{ } dW^{[l]}$ $b^{[l]} = b^{[l]} - \alpha \text{ } db^{[l]}$ 其中 $\alpha$ 是学习率

二、两层神经网络搭建

2.1软件包准备

numpy is the main package for scientific computing with Python.
matplotlib is a library to plot graphs in Python.
np.random.seed(1) is used to keep all the random function calls consistent. It will help us grade your work. Please don’t change the seed.

import numpy as np
import h5py
import matplotlib.pyplot as plt
from testCases import *  # 测试函数
from dnn_utils import sigmoid, sigmoid_backward, relu, relu_backward  # 激活函数
import lr_utils
np.random.seed(1)  # 指定随机种子

2.2 初始化参数

两层神经网络的模型结构是线性 -> Relu ->线性 ->sigmoid函数
使用随机初始化权重Wnp.random.randn(shape)*0.01
用0初始化偏置bnp.zero(shape)

    def initialize_parameters(n_x, n_h, n_y):
    '''
    此函数是为了初始化两层网络参数而使用的函数
    :param n_x:输入层节点数量
    :param n_h:隐藏层节点数量
    :param n_y:输出层节点数量
    :return:
        parameters:包含以下参数的字典
            W1：权重矩阵，维度为（n_h,n_x)
            W2：权重矩阵，维度为（n_y,n_h)
            b1:偏向量，维度为（n_h,1)
            b2:偏向量，维度为（n_y,1)
    '''
    W1 = np.random.randn(n_h, n_x) * 0.01
    W2 = np.random.randn(n_y, n_h) * 0.01
    b1 = np.zeros(shape=(n_h, 1))  # 注意np.zeros(shape)shape需要用括号包围起来
    b2 = np.zeros((n_y, 1))

    # 使用断言确保我的数据格式是正确的
    assert (W1.shape == (n_h, n_x))
    assert (W2.shape == (n_y, n_h))
    assert (b1.shape == (n_h, 1))
    assert (b2.shape == (n_y, 1))

    parameters = {"W1": W1,
                  'W2': W2,
                  'b1': b1,
                  'b2': b2}
    return parameters

2.3 前向传播函数

前向传播有三个步骤

计算线性部分
线性部分 -> 激活部分，其中激活函数将会使用Relu或者sigmoid
一般来说，对整个模型使用[L-1]次[linear - > relu]，1次[linera - > sigmoid]

2.3.1 线性部分[Linear]

$Z^{[l]} = W^{[l]}A^{[l-1]} +b^{[l]}$

where $A^{[0]} = X$ .

两个矩阵乘用np.dot(W,A)
用W.shape判断矩阵维度是否符合

def linear_forward(A, W, b):
    '''
    实现前向传播的线性部分
    :param A:来自上一层（或输入数据）的激活，维度为（上一层节点数，样本数）
    :param W:权重矩阵，维度为（当前层的节点数，上一层的节点数）
    :param b:偏向量，维度为（当前层的节点数，1）
    :return:
        Z:激活函数的输入，也称为预激活参数
        cache:一个包含A,W,b的字典，储存它们以便后向传播的计算
    '''
    Z = np.dot(W, A) + b
    assert (Z.shape == (W.shape[0], A.shape[1]))

    cache = (A, W, b)  # cache是一个列表
    return Z, cache

2.3.2 线性激活部分【Linear -> Activation】

使用一下两个激活函数

Sigmoid: $\sigma(Z) = \sigma(W A + b) = \frac{1}{ 1 + e^{-(W A + b)}}$ .
sigmoid函数导数
$\begin{aligned} \sigma'(z) &= (\frac{1}{1+e^{-z}})' = \frac{e^{-z}}{(1+e^{-z})^{2}} = \frac{1+e^{-z}-1}{(1+e^{-z})^{2}} \\ &= \frac{1}{(1+e^{-z})}(1-\frac{1}{(1+e^{-z})}) \\ &= f(z)(1-f(z)) \\ \end{aligned}$

A, activation_cache = sigmoid(Z)

def sigmoid(Z):
    """
    Implements the sigmoid activation in numpy

    Arguments:
    Z -- numpy array of any shape

    Returns:
    A -- output of sigmoid(z), same shape as Z
    cache -- returns Z as well, useful during backpropagation
    """

    A = 1/(1+np.exp(-Z))
    cache = Z

    return A, cache

ReLU函数: $A = R E L U (Z) = m a x (0, Z)$
ReLU导数：

$ReLU'(Z)=\left\{ \begin{array}{rcl} 0 & & {Z \leq 0}\\ 1 & & {0 < Z} \end{array} \right.$

A, activation_cache = relu(Z)

def relu(Z):
    """
    Implement the RELU function.

    Arguments:
    Z -- Output of the linear layer, of any shape

    Returns:
    A -- Post-activation parameter, of the same shape as Z
    cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
    """

    A = np.maximum(0,Z)

    assert(A.shape == Z.shape)

    cache = Z 
    return A, cache

以上两者的activation_cache都是 $Z$ .

实现Linear -> Activation 这个步骤所使用的的公式是 $A^{[l]} = g(Z^{[l]}) = g(W^{[l]}A^{[l-1]} +b^{[l]})$ 这里的激活函数 “g” 可以是 sigmoid() 或者 relu().

def linear_activation_forward(A_prev, W, b, activation):
    '''
    实现linear->activation这一层的前向传播
    :param A_prev:来自上一层（或输入层）的激活，维度为（上一层节点数，样本数）
    :param W:权重矩阵，numpy数组，维度为（当前层节点数量，上一层节点数量）
    :param b:偏向量，numpy阵列，维度为（当前层节点数量，1）
    :param activation:选择在此层中的激活函数，字符串类型，【sigmoid,relu】
    :return:
        A:激活函数的输出，也称为激活后的值
        cache：一个包含'linear_cache'和'activation_cache'的字典，我们需要存储它以有效地计算后向传播
    '''
    if activation == "sigmoid":
        Z, linear_cache = linear_forward(A_prev, W, b)  # linear_cache = (A, W, b)
        A, activation_cache = sigmoid(Z)  # activation_cache = Z
    elif activation == "relu":
        Z, linear_cache = linear_forward(A_prev, W, b)  # linear_cache = (A, W, b)
        A, activation_cache = relu(Z)  # activation_cache = Z

    assert (A.shape == (W.shape[0], A.shape[1]))
    cache = (linear_cache, activation_cache)  # (A,W,b,Z)，其实是个一列表

    return A, cache

2.4 计算成本

计完成了两层模型的前向传播部分，我们需要计算成本（误差），以便确定它到底有没在学习，成本函数公式如下
$J=-\frac{1}{m} \sum\limits_{i = 1}^{m} (y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L](i)}\right))$

def compute_cost(AL, Y):
    '''
    计算成本函数
    :param AL: 与标签预测相对应的概率向量，维度为（1，样本数)
    :param Y:标签向量（例如：如果是猫则为1，不是猫则为0），维度为（1，样本数)
    :return:
        cost:交叉熵成本
    '''
    m = Y.shape[1] # 样本数
    cost = (-1 / m) * np.sum(np.multiply(Y, np.log(AL)) + np.multiply(1 - Y, np.log(1 - AL)))
    
    cost = np.squeeze(cost)  # 让成本函数cost维度是所期望的，比如将[[17]]变成17
    assert (cost.shape == ())  # 一维
    return cost

2.5 反向传播

反向传播用于相对于损失函数的梯度，前向传播和后向传播的流程图
在这里插入图片描述
反向传播依然分为三步

Linear 线性部分后向计算
Linear -> activation后向计算，其中activation计算relu或者sigmoid的导数
整个模型计算 [LINEAR -> RELU] × (L-1) -> LINEAR -> SIGMOID

2.5.1 反向传播的线性部分

对第 $l$ 层, 线性部分是: $Z^{[l]} = W^{[l]} A^{[l-1]} + b^{[l]}$ ，在这之后就是激活函数。假设我们已经有了导数 $dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}}$ ，我们想要得到 $dW^{[l]}, db^{[l]} dA^{[l-1]})$ ，那么可以用下面三个公式计算：
$dW^{[l]} = \frac{\partial \mathcal{L} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T}$ $db^{[l]} = \frac{\partial \mathcal{L} }{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[l](i)}$ $dA^{[l-1]} = \frac{\partial \mathcal{L} }{\partial A^{[l-1]}} = W^{[l] T} dZ^{[l]}$

def linear_backward(dZ, cache):
    '''
    为单层实现反向传播的线性部分(第l层)
    :param dZ: 相对于（当前l层的）线性输出的成本梯度
    :param cache:来自当前层前向传播的值的元组（A_prev,W,b)
    :return:
        dA_prev:相对于激活(前一层l-1)的成本梯度，与A_prev维度相同
        dW:相对于W（当前层l)的成本函数梯度，与w维度相同
        db:相对于b(当前层l）的成本函数梯度，与b维度相同
    '''
    A_prev, W, b = cache
    m = A_prev.shape[1]  # 样本数

    dW = (1 / m) * np.dot(dZ, A_prev.T)
    db = (1 / m) * np.sum(dZ, axis=1, keepdims=True)  # 行向量求和，最后变成一个列向量
    dA_prev = np.dot(W.T, dZ)

    assert (dA_prev.shape == A_prev.shape)
    assert (dW.shape == W.shape)
    assert (db.shape == b.shape)
    return dA_prev, dW, db

2.5.2 反向传播的线性激活部分【linear -> activation backward】

为了实现线性激活后向传播，提供了两个后向函数

sigmoid_backward，实现sigmoid的反向传播dZ = sigmoid_backward(dA, activation_cache)

def sigmoid_backward(dA, cache):
    """
    Implement the backward propagation for a single SIGMOID unit.
 
    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """

    Z = cache

    s = 1/(1+np.exp(-Z))
    dZ = dA * s * (1-s)

    assert (dZ.shape == Z.shape)

    return dZ

relu_backward，实现relu()的反向传播dZ = relu_backward(dA, activation_cache)

def relu_backward(dA, cache):
    """
    Implement the backward propagation for a single RELU unit.

    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """

    Z = cache
    dZ = np.array(dA, copy=True) # just converting dz to a correct object.

    # When z <= 0, you should set dz to 0 as well. 
    dZ[Z <= 0] = 0

    assert (dZ.shape == Z.shape)

    return dZ

如果g(.)是激活函数，那么sigmoid_backward和relu_backward可以这样计算：
$dZ^{[l]} = dA^{[l]} * g'(Z^{[l]})$

def linear_activation_backward(dA, cache, activation):
    '''
    实现linear -> Activation 层的后向传播
    :param dA: 当前层激活后的梯度值
    :param cache: 我们存储用于有效计算反向传播的值的元组,值为（linear_cache(# linear_cache = (A, W, b)),activation_cache(# Z))
    :param activation:要在此层中使用的激活函数的名称，字符串类型，如["relu"|"sigmoid"]
    :return:
        dA_prev:相对于激活（前一层L-1）的成本梯度值，与A_prev的维度相同
        dW:相对于W(当前层l)的成本梯度值，与W维度相同
        db:相对于b(当前层l)的成本梯度值，与b维度相同
    '''
    linear_cache, activation_cache = cache
    if activation == "relu":
        dZ = relu_backward(dA, activation_cache)  # activation_cache = Z
    if activation == "sigmoid":
        dZ = sigmoid_backward(dA, activation_cache)
    dA_prev, dW, db = linear_backward(dZ, linear_cache)
    return dA_prev, dW, db

2.6 更新参数

前向反向传播都完成之后，那么就要更新 $W^{[l]}$ 和 $b^{[l]}$ for $l = 1, 2, . . ., L$ . 的参数：
$W^{[l]} = W^{[l]} - \alpha \text{ } dW^{[l]}$ $b^{[l]} = b^{[l]} - \alpha \text{ } db^{[l]}$

其中 $\alpha$ 是学习率

def update_parameters(parameters, grads, learning_rate):
    '''
    使用梯度下降更新参数
    :param parameters:包含参数“W1”,“b1”,“W2”……“WL”,"bL"的字典
    :param grads:包含梯度值的字典，包含参数“dA1”,“dW1”,“db1”,“dW2”……“dWL”,"dbL",“dWL”
    :param learning_rate:学习参数
    :return:
        :parameters:包含更新参数的字典
            parameters["W" + str(l)] = ...
            parameters["b" + str(l)] = ...
    '''
    L = len(parameters) // 2  # 整除
    for l in range(L):  # 0 -> L-1，这里l从0开始，所以下面就要加1.
        parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l + 1)]
        parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l + 1)]

    return parameters

三、两层神经网络的应用

我们需要搭建一个两层神经网络识别图像这种是否是猫。
在这里插入图片描述
这个模型可以被总结为： INPUT -> LINEAR -> RELU -> LINEAR -> SIGMOID -> OUTPUT

维度为 (64,64,3)的图像被平整为 (12288,1) 大小的向量
输入数据 $x_0,x_1,...,x_{12287}]^T$ 和大小为 $n^{[1]}, 12288)$ 的权重矩阵 $W^{[1]}$ 矩阵乘
加上偏置 $b^{[1]}$ 之后用relu函数激活后得到： $A^{[1]}=[a_0^{[1]}, a_1^{[1]},..., a_{n^{[1]}-1}^{[1]}]^T$ .
$A^{[1]}$ 乘维度为（1, $n^{[1]})$ 的权重矩阵 $W^{[2]}$ ，再加上维度为（1,1）的 $b^{[2]}$ 偏置
最后用sigmoid函数激活，如果结果大于0.5，则分类为cat，否则就为noncat

3.1 准备数据

我们现有一个数据集"data.h5"，其中包含训练集“train_catvnoncat.h5”和测试集“test_catvnoncat.h5”

标签值为0（noncat）或1（cat）的m_train个样本的训练集
标签值为0（noncat）或1（cat）的m_test个样本的训练集
每张图片的维度为（num_px, num_px, 3），其中的3代表RGB

3.1.1 导入数据

训练样本数:209
测试样本数:50
每幅图像大小:（64,64,3）
train_x_orig shape:209,64,64,3）
train_y shape:(1, 209)
test_x_orig shape: (50, 64, 64, 3)
test_y shape: (1, 50)

def load_dataset():
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")  # 读取训练集数据
    train_set_x_orig = np.array(train_dataset["train_set_x"][:])  # 训练集特征 (m_train（209）,num_px, num_px, 3)
    train_set_y_orig = np.array(train_dataset["train_set_y"][:])  # 训练集标签 (m_train(209),1)

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")  # 读取测试集数据
    test_set_x_orig = np.array(test_dataset["test_set_x"][:])  # 测试集特征 (m_test(50),num_px, num_px, 3)
    test_set_y_orig = np.array(test_dataset["test_set_y"][:])  # 测试集标签 (m_test(50),1)

    classes = np.array(test_dataset["list_classes"][:])  # 字符串numpy数组，包含'cat'和'noncat'

    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))  # 维度变为(1,m_train(209))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))  # 维度变为(1,m_test(50))

    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

3.1.2 标准化数据

通常，在将数据送入神经网络之前，我们需要改变图像维度并且将它们标准化。
在这里插入图片描述

train_x’s shape: (12288, 209)
test_x’s shape: (12288, 50)
其中12288=64 * 64 * 3，刚好是一个图像成向量排布的大小

# 加载数据
train_x_orig, train_y, test_x_orig, test_y, classes = load_dataset()
# 改变训练样本和测试样本维度
train_x_flatten = train_x_orig.reshape(train_x_orig.shape[0], -1).T  # -1表示维度可以通过数据进行判断，注意有转置
test_x_flatten = test_x_orig.reshape(test_x_orig.shape[0], -1).T
# 标准化数据，使值介于0 — 1之间
train_x = train_x_flatten / 255
test_x = test_x_flatten / 255

3.2 实现二层神经网络模型

使用我们写过的函数组建这个二层神经网络的模型，这些函数的输入和返回如下：

def initialize_parameters(n_x, n_h, n_y):
    ...
    # 此函数是为了初始化两层网络参数而使用的函数
    return parameters 
def linear_activation_forward(A_prev, W, b, activation):
    ...
    # 实现linear->activation这一层的前向传播
    return A, cache
def compute_cost(AL, Y):
    ...
    # 计算成本函数
    return cost
def linear_activation_backward(dA, cache, activation):
    ...
    #  实现linear -> Activation 层的后向传播
    return dA_prev, dW, db
def update_parameters(parameters, grads, learning_rate):
    ...
    #  更新参数 Update parameters.
    return parameters

3.2.1 两层神经网络的模型

# 构建双层神经网络
def two_layer_model(X, Y, layers_dims, learning_rate=0.0075, num_iterations=3000, print_cost=False, isPlot=True):
    '''
     Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID.
    :param X:输入数据input data, of shape (n_x, number of examples)
    :param Y:标签向量true "label" vector(containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
    :param layers_dims:层数的向量dimensions of the layers (n_x, n_h, n_y)
    :param learning_rate:学习率learning rate of the gradient descent update rule
    :param num_iterations:迭代的次数number of iterations of the optimization loop
    :param print_cost:是否打印If set to True, this will print the cost every 100 iterations
    :return:
        parameters:包含W1, W2, b1, and b2的字典向量a dictionary containing W1, W2, b1, and b2
    '''

    np.random.seed(1)
    grads = {}
    costs = []  # to keep track of the cost 追踪成本函数
    m = X.shape[1]  # number of examples 样本数
    (n_x, n_h, n_y) = layers_dims
    # 初始化两层网络参数
    parameters = initialize_parameters(n_x, n_h, n_y)

    # Get W1, b1, W2 and b2 from the dictionary parameters.
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    # 开始迭代Loop (gradient descent)
    for i in range(0, num_iterations):
        # Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1". Output: "A1, cache1, A2, cache2".
        A1, cache1 = linear_activation_forward(X, W1, b1, "relu") # 实现linear->activation这一层的前向传播
        A2, cache2 = linear_activation_forward(A1, W2, b2, "sigmoid")
        # 计算成本
        cost = compute_cost(A2, Y)

        # 初始化后向传播，得到dA2
        dA2 = -(np.divide(Y, A2) - np.divide(1 - Y, 1 - A2))

        # Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1".
        dA1, dW2, db2 = linear_activation_backward(dA2, cache2, "sigmoid") #  实现linear -> Activation 层的后向传播
        dA0, dW1, db1 = linear_activation_backward(dA1, cache1, "relu")

        # Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2
        grads['dW1'] = dW1
        grads['db1'] = db1
        grads['dW2'] = dW2
        grads['db2'] = db2

        #  更新参数 Update parameters.
        parameters = update_parameters(parameters, grads, learning_rate)

        # 重新获得参数 Retrieve W1, b1, W2, b2 from parameters
        W1 = parameters["W1"]
        b1 = parameters["b1"]
        W2 = parameters["W2"]
        b2 = parameters["b2"]

        # Print the cost every 100 training example
        if print_cost and i % 100 == 0:
            print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
        if print_cost and i % 100 == 0:
            costs.append(cost)

    # 迭代完成，根据条件绘制图像
    if isPlot:
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()

        # 返回parameters
        return parameters

3.3 进行训练

# 数据加载完成，开始进行二层网络的训练
n_x = 12288
n_h = 7
n_y = 1
layers_dims = (n_x, n_h, n_y)
parameters = two_layer_model(train_x, train_y, layers_dims=(n_x, n_h, n_y), num_iterations=2500, print_cost=True,
                             isPlot=True)

3.3.1 运行结果

Cost after iteration 0: 0.693049735659989
Cost after iteration 100: 0.6464320953428849
Cost after iteration 200: 0.6325140647912677
Cost after iteration 300: 0.6015024920354665
Cost after iteration 400: 0.5601966311605747
Cost after iteration 500: 0.515830477276473
Cost after iteration 600: 0.47549013139433266

：
：

Cost after iteration 2000: 0.07439078704319078
Cost after iteration 2100: 0.06630748132267926
Cost after iteration 2200: 0.059193295010381654
Cost after iteration 2300: 0.05336140348560552
Cost after iteration 2400: 0.04855478562877014

在这里插入图片描述

3.3.2 结果分析

在以上代码运行完成之后，我们的训练模型就算是训练好了，但实际效果如何还需要检验。

首先我们要用这个模型对训练集进行一次预测，查看模型对训练集的吻合程度
对测试集进行预测，查看准确率

预测函数如下：

def predict(X, y, parameters):
    """
    该函数用于预测二层神经网络的结果
    参数：
     X - 测试集
     y - 标签
     parameters - 训练模型的参数
    返回：
     p - 给定数据集X的预测
    """
    m = X.shape[1]
    n = len(parameters) // 2  # 神经网络的层数
    p = np.zeros((1, m))

    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    A1, cache1 = linear_activation_forward(X, W1, b1, "relu")  # 实现linear->activation这一层的前向传播
    A2, cache2 = linear_activation_forward(A1, W2, b2, "sigmoid")
    # 根据参数前向传播
    probas = A2
    for i in range(0, probas.shape[1]):  # range =(0,m_train)
        if probas[0, i] > 0.5:
            p[0, i] = 1
        else:
            p[0, i] = 0

    print("准确度为: " + str(float(np.sum((p == y)) / m)))
    return p

3.4 进行预测

pred_train = predict(train_x, train_y, parameters)  # 训练集
pred_test = predict(test_x, test_y, parameters)  # 测试集

3.4.1 预测结果

准确度为: 1.0
准确度为: 0.72

四、完整代码

import numpy as np
import h5py
import matplotlib.pyplot as plt

np.random.seed(1)  # 指定随机种子


def initialize_parameters(n_x, n_h, n_y):
    '''
    此函数是为了初始化两层网络参数而使用的函数
    :param n_x:输入层节点数量
    :param n_h:隐藏层节点数量
    :param n_y:输出层节点数量
    :return:
        parameters:包含以下参数的字典
            W1：权重矩阵，维度为（n_h,n_x)
            W2：权重矩阵，维度为（n_y,n_h)
            b1:偏向量，维度为（n_h,1)
            b2:偏向量，维度为（n_y,1)
    '''
    W1 = np.random.randn(n_h, n_x) * 0.01
    W2 = np.random.randn(n_y, n_h) * 0.01
    b1 = np.zeros(shape=(n_h, 1))  # 注意np.zeros(shape)shape需要用括号包围起来
    b2 = np.zeros((n_y, 1))

    # 使用断言确保我的数据格式是正确的
    assert (W1.shape == (n_h, n_x))
    assert (W2.shape == (n_y, n_h))
    assert (b1.shape == (n_h, 1))
    assert (b2.shape == (n_y, 1))

    parameters = {"W1": W1,
                  'W2': W2,
                  'b1': b1,
                  'b2': b2}
    return parameters


def linear_forward(A, W, b):
    '''
    实现前向传播的线性部分
    :param A:来自上一层（或输入数据）的激活，维度为（上一层节点数，样本数）
    :param W:权重矩阵，维度为（当前层的节点数，上一层的节点数）
    :param b:偏向量，维度为（当前层的节点数，1）
    :return:
        Z:激活函数的输入，也称为预激活参数
        cache:一个包含A,W,b的字典，储存它们以便后向传播的计算
    '''
    Z = np.dot(W, A) + b
    assert (Z.shape == (W.shape[0], A.shape[1]))

    cache = (A, W, b)  # cache是一个列表
    return Z, cache


def sigmoid(Z):
    """
    Implements the sigmoid activation in numpy

    Arguments:
    Z -- numpy array of any shape

    Returns:
    A -- output of sigmoid(z), same shape as Z
    cache -- returns Z as well, useful during backpropagation
    """

    A = 1 / (1 + np.exp(-Z))
    cache = Z

    return A, cache


def relu(Z):
    """
    Implement the RELU function.

    Arguments:
    Z -- Output of the linear layer, of any shape

    Returns:
    A -- Post-activation parameter, of the same shape as Z
    cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
    """

    A = np.maximum(0, Z)

    assert (A.shape == Z.shape)

    cache = Z
    return A, cache


def linear_activation_forward(A_prev, W, b, activation):
    '''
    实现linear->activation这一层的前向传播
    :param A_prev:来自上一层（或输入层）的激活，维度为（上一层节点数，样本数）
    :param W:权重矩阵，numpy数组，维度为（当前层节点数量，上一层节点数量）
    :param b:偏向量，numpy阵列，维度为（当前层节点数量，1）
    :param activation:选择在此层中的激活函数，字符串类型，【sigmoid,relu】
    :return:
        A:激活函数的输出，也称为激活后的值
        cache：一个包含'linear_cache'和'activation_cache'的字典，我们需要存储它以有效地计算后向传播
    '''
    if activation == "sigmoid":
        Z, linear_cache = linear_forward(A_prev, W, b)  # linear_cache = (A, W, b)
        A, activation_cache = sigmoid(Z)  # activation_cache = Z
    elif activation == "relu":
        Z, linear_cache = linear_forward(A_prev, W, b)  # linear_cache = (A, W, b)
        A, activation_cache = relu(Z)  # activation_cache = Z

    assert (A.shape == (W.shape[0], A.shape[1]))
    cache = (linear_cache, activation_cache)  # (A,W,b,Z)，其实是个一列表

    return A, cache


def compute_cost(AL, Y):
    '''
    计算成本函数
    :param AL: 与标签预测相对应的概率向量，维度为（1，样本数)
    :param Y:标签向量（例如：如果是猫则为1，不是猫则为0），维度为（1，样本数)
    :return:
        cost:交叉熵成本
    '''
    m = Y.shape[1]  # 样本数
    cost = (-1 / m) * np.sum(np.multiply(Y, np.log(AL)) + np.multiply(1 - Y, np.log(1 - AL)))

    cost = np.squeeze(cost)  # 让成本函数cost维度是所期望的，比如将[[17]]变成17
    assert (cost.shape == ())  # 一维
    return cost


def linear_backward(dZ, cache):
    '''
    为单层实现反向传播的线性部分(第l层)
    :param dZ: 相对于（当前l层的）线性输出的成本梯度
    :param cache:来自当前层前向传播的值的元组（A_prev,W,b)
    :return:
        dA_prev:相对于激活(前一层l-1)的成本梯度，与A_prev维度相同
        dW:相对于W（当前层l)的成本函数梯度，与w维度相同
        db:相对于b(当前层l）的成本函数梯度，与b维度相同
    '''
    A_prev, W, b = cache
    m = A_prev.shape[1]  # 样本数

    dW = (1 / m) * np.dot(dZ, A_prev.T)
    db = (1 / m) * np.sum(dZ, axis=1, keepdims=True)  # 行向量求和，最后变成一个列向量
    dA_prev = np.dot(W.T, dZ)

    assert (dA_prev.shape == A_prev.shape)
    assert (dW.shape == W.shape)
    assert (db.shape == b.shape)
    return dA_prev, dW, db


def sigmoid_backward(dA, cache):
    """
    Implement the backward propagation for a single SIGMOID unit.

    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """

    Z = cache

    s = 1 / (1 + np.exp(-Z))
    dZ = dA * s * (1 - s)

    assert (dZ.shape == Z.shape)

    return dZ


def relu_backward(dA, cache):
    """
    Implement the backward propagation for a single RELU unit.

    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """

    Z = cache
    dZ = np.array(dA, copy=True)  # just converting dz to a correct object.

    # When z <= 0, you should set dz to 0 as well.
    dZ[Z <= 0] = 0

    assert (dZ.shape == Z.shape)

    return dZ


def linear_activation_backward(dA, cache, activation):
    '''
    实现linear -> Activation 层的后向传播
    :param dA: 当前层激活后的梯度值
    :param cache: 我们存储用于有效计算反向传播的值的元组,值为（linear_cache(# linear_cache = (A, W, b)),activation_cache(# Z))
    :param activation:要在此层中使用的激活函数的名称，字符串类型，如["relu"|"sigmoid"]
    :return:
        dA_prev:相对于激活（前一层L-1）的成本梯度值，与A_prev的维度相同
        dW:相对于W(当前层l)的成本梯度值，与W维度相同
        db:相对于b(当前层l)的成本梯度值，与b维度相同
    '''
    linear_cache, activation_cache = cache
    if activation == "relu":
        dZ = relu_backward(dA, activation_cache)  # activation_cache = Z
    if activation == "sigmoid":
        dZ = sigmoid_backward(dA, activation_cache)
    dA_prev, dW, db = linear_backward(dZ, linear_cache)
    return dA_prev, dW, db


def update_parameters(parameters, grads, learning_rate):
    '''
    使用梯度下降更新参数
    :param parameters:包含参数“W1”,“b1”,“W2”……“WL”,"bL"的字典
    :param grads:包含梯度值的字典，包含参数“dA1”,“dW1”,“db1”,“dW2”……“dWL”,"dbL",“dWL”
    :param learning_rate:学习参数
    :return:
        :parameters:包含更新参数的字典
            parameters["W" + str(l)] = ...
            parameters["b" + str(l)] = ...
    '''
    L = len(parameters) // 2  # 整除
    for l in range(L):  # 0 -> L-1，这里l从0开始，所以下面就要加1.
        parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l + 1)]
        parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l + 1)]

    return parameters


def load_dataset():
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")  # 读取训练集数据
    train_set_x_orig = np.array(train_dataset["train_set_x"][:])  # 训练集特征 (m_train（209）,num_px, num_px, 3)
    train_set_y_orig = np.array(train_dataset["train_set_y"][:])  # 训练集标签 (m_train(209),1)

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")  # 读取测试集数据
    test_set_x_orig = np.array(test_dataset["test_set_x"][:])  # 测试集特征 (m_test(50),num_px, num_px, 3)
    test_set_y_orig = np.array(test_dataset["test_set_y"][:])  # 测试集标签 (m_test(50),1)

    classes = np.array(test_dataset["list_classes"][:])  # 字符串numpy数组，包含'cat'和'noncat'

    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))  # 维度变为(1,m_train(209))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))  # 维度变为(1,m_test(50))

    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes


# 构建双层神经网络
def two_layer_model(X, Y, layers_dims, learning_rate=0.0075, num_iterations=3000, print_cost=True, isPlot=True):
    '''
     Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID.
    :param X:输入数据input data, of shape (n_x, number of examples)
    :param Y:标签向量true "label" vector(containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
    :param layers_dims:层数的向量dimensions of the layers (n_x, n_h, n_y)
    :param learning_rate:学习率learning rate of the gradient descent update rule
    :param num_iterations:迭代的次数number of iterations of the optimization loop
    :param print_cost:是否打印If set to True, this will print the cost every 100 iterations
    :return:
        parameters:包含W1, W2, b1, and b2的字典向量a dictionary containing W1, W2, b1, and b2
    '''

    np.random.seed(1)
    grads = {}
    costs = []  # to keep track of the cost 追踪成本函数
    m = X.shape[1]  # number of examples 样本数
    (n_x, n_h, n_y) = layers_dims
    # 初始化两层网络参数
    parameters = initialize_parameters(n_x, n_h, n_y)

    # Get W1, b1, W2 and b2 from the dictionary parameters.
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    # 开始迭代Loop (gradient descent)
    for i in range(0, num_iterations):
        # Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1". Output: "A1, cache1, A2, cache2".
        A1, cache1 = linear_activation_forward(X, W1, b1, "relu")  # 实现linear->activation这一层的前向传播
        A2, cache2 = linear_activation_forward(A1, W2, b2, "sigmoid")
        # 计算成本
        cost = compute_cost(A2, Y)

        # 初始化后向传播，得到dA2
        dA2 = -(np.divide(Y, A2) - np.divide(1 - Y, 1 - A2))

        # Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1".
        dA1, dW2, db2 = linear_activation_backward(dA2, cache2, "sigmoid")  # 实现linear -> Activation 层的后向传播
        dA0, dW1, db1 = linear_activation_backward(dA1, cache1, "relu")

        # Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2
        grads['dW1'] = dW1
        grads['db1'] = db1
        grads['dW2'] = dW2
        grads['db2'] = db2

        #  更新参数 Update parameters.
        parameters = update_parameters(parameters, grads, learning_rate)

        # 重新获得参数 Retrieve W1, b1, W2, b2 from parameters
        W1 = parameters["W1"]
        b1 = parameters["b1"]
        W2 = parameters["W2"]
        b2 = parameters["b2"]

        # Print the cost every 100 training example
        if print_cost and i % 100 == 0:
            print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
        if print_cost and i % 100 == 0:
            costs.append(cost)

    # 迭代完成，根据条件绘制图像
    if isPlot:
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()

        # 返回parameters
        return parameters

def predict(X, y, parameters):
    """
    该函数用于预测二层神经网络的结果
    参数：
     X - 测试集
     y - 标签
     parameters - 训练模型的参数
    返回：
     p - 给定数据集X的预测
    """
    m = X.shape[1]
    n = len(parameters) // 2  # 神经网络的层数
    p = np.zeros((1, m))

    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    A1, cache1 = linear_activation_forward(X, W1, b1, "relu")  # 实现linear->activation这一层的前向传播
    A2, cache2 = linear_activation_forward(A1, W2, b2, "sigmoid")
    # 根据参数前向传播
    probas = A2
    for i in range(0, probas.shape[1]):  # range =(0,m_train)
        if probas[0, i] > 0.5:
            p[0, i] = 1
        else:
            p[0, i] = 0

    print("准确度为: " + str(float(np.sum((p == y)) / m)))
    return p

# 加载数据
train_x_orig, train_y, test_x_orig, test_y, classes = load_dataset()
# 改变训练样本和测试样本维度
train_x_flatten = train_x_orig.reshape(train_x_orig.shape[0], -1).T  # -1表示维度可以通过数据进行判断，注意有转置
test_x_flatten = test_x_orig.reshape(test_x_orig.shape[0], -1).T
# 标准化数据，使值介于0 — 1之间
train_x = train_x_flatten / 255
test_x = test_x_flatten / 255

# 数据加载完成，开始进行二层网络的训练
n_x = 12288
n_h = 7
n_y = 1
layers_dims = (n_x, n_h, n_y)
parameters = two_layer_model(train_x, train_y, layers_dims=(n_x, n_h, n_y), num_iterations=3000, print_cost=True,
                             isPlot=True)





# 进行预测
pred_train = predict(train_x, train_y, parameters)  # 训练集
pred_test = predict(test_x, test_y, parameters)  # 测试集