吴恩达深度学习课后习题（1-3）

programDH

已于 2024-04-05 15:33:25 修改

阅读量547

点赞数 7

文章标签： python 开发语言

于 2024-03-28 14:21:28 首次发布

本文链接：https://blog.csdn.net/programDH/article/details/137110018

版权

做作业的一些代码，持续更新ing~

作业是参考下面的链接的，其中的一些数据集等可以在该链接获取：GitHub - AccumulateMore/CV: ✔（已完结）最全面的深度学习笔记【土堆 Pytorch】【李沐动手学深度学习】【吴恩达深度学习】

hw1

编写sigmoid函数

这个是sigmoid函数，函数的表达式是：1/1 + e-x

在np.exp(-x)中不要使用math.exp(-x)，因为math.exp(-x)当你输入的是向量的话就会报错，所以用np.exp(-x)

注意：np.array([])采用这样的方式来定义，当你有多个向量的话，就用np.array([[1,2,3],[4,5,6],[7,8,9]])这样的方式定义

 import math
 import numpy as np
 
 def sigmoid(x):
     result = 1 / (1 + np.exp(-x))
     return result
 
 print(sigmoid(np.array([[1,2,3],[4,5,6],[7,8,9]])))

创建sigmoid_grad函数

计算sigmoid函数相对于其输入x的梯度。表达式是：sigmoid_derivative(x) = σ'(x)= σ(x)(1 - σ(x))

 import numpy as np
 
 def sigmoid(x):
     x = 1 / (1 + np.exp(-x))
     return x
 
 result = sigmoid(np.array([1,2,3]))*(1 - sigmoid(np.array([1,2,3])))
 print(result)

实现image2vector()

该输入采用维度为(length, height, depth)的输入，并返回维度为(length * height * depth, 1)的向量（相当于将高维向量展开成为一个1维向量）。

shape函数返回的是你的当前向量的维度大小，例如(3,3,2)代表着有三个元素，每个元素是3*2的矩阵。

 def image2vector(image):
     """
     Argument:
     image -- a numpy array of shape (length, height, depth)
     
     Returns:
     v -- a vector of shape (length*height*depth, 1)
     """
     v = image.reshape(image.shape[0] * image.shape[1] * image.shape[2], 1)      
     
     return v
 
 image = np.array([[[ 0.67826139,  0.29380381],
         [ 0.90714982,  0.52835647],
         [ 0.4215251 ,  0.45017551]],
 
        [[ 0.92814219,  0.96677647],
         [ 0.85304703,  0.52351845],
         [ 0.19981397,  0.27417313]],
 
        [[ 0.60659855,  0.00533165],
         [ 0.10820313,  0.49978937],
         [ 0.34144279,  0.94630077]]])

执行 normalizeRows（）来标准化矩阵的行

将此函数应用于输入矩阵x之后，x的每一行应为单位长度（即长度为1）向量。

 import numpy as np
 
 def normalizeRows(x):
     
     x_norm = np.linalg.norm(x, axis = 1, keepdims = True)
     
     return x/x_norm
 
 x = np.array([
     [0, 3, 4],
     [1, 6, 4]])
 print("normalizeRows(x) = " + str(normalizeRows(x)))

softmax函数：模型对于每个类别的预测概率

softmax函数会接受一个向量（或者一组原始预测值），然后输出一个同样长度的新向量，其中每个元素的值介于0到1之间，并且所有元素的和为1。这样，输出向量的每个元素可以被解释为模型对于每个类别的预测概率

softmax的函数表达式如下所示：

 def softmax(x):
     """Calculates the softmax for each row of the input x.
 
     Your code should work for a row vector and also for matrices of shape (n, m).
 
     Argument:
     x -- A numpy matrix of shape (n,m)
 
     Returns:
     s -- A numpy matrix equal to the softmax of x, of shape (n,m)
     """
     x_exp = np.exp(x)
     x_sum = np.sum(x_exp, axis = 1, keepdims = True)
     
     print(x_exp.shape)
     print(x_sum.shape)
     
     s = x_exp / x_sum
     
     return s

hw2

识别输入的图像是否是猫，步骤按照下面进行

下面的操作是在jupyter环境中使用的

导入需要使用的包

 import numpy as np
 import matplotlib.pyplot as plt
 from lr_utils import load_dataset
 
 %matplotlib inline

导入数据集

在下载的数据集文件目录下创建jupyter文件

 train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
 # load_dataset在wed_dataseet/wed/lr_utils.py中，可以查看是如何加载数据的，具体的load_dataset代码如下：

 import numpy as np
 import h5py
     
     
 def load_dataset():
     train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
     train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # 从train_dataset数据集中取出所有键为train_set_x的y值，并将这些值转换为numpy数值
     train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # 同理
 
     test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
     test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # 从test_dataset数据集中取出所有键为train_set_x的y值，并将这些值转换为numpy数值
     test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # 同理
 
     classes = np.array(test_dataset["list_classes"][:]) # 表示图片是不是猫，结果是'non-cat'或'cat'
     
     train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0])) # 将train_set_y_orig
     test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
     
     return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

查看训练数据集以及测试集中的图片数据

 index = 5
 plt.imshow(train_set_x_orig[index]) # 要显示的图像数据，index=5查看的就是第五张图片
 print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")  # y = [0]表示的是不是猫，y = [1]表示的是这是猫

查看数据集中的图片的数量以及图片的长度和宽度

 m_train = train_set_x_orig.shape[0]   
 m_test = test_set_x_orig.shape[0]     
 num_px = train_set_x_orig.shape[1]    
 num_py = train_set_x_orig.shape[2] 
 # shape[0]表示的是数据集中包含多少张图像
 # shape[1]表示的是图像有多少行（即图像的高度）
 # shape[2]表示的是图像有多少列（即图像的宽度）
 # 可以使用print来查看图像的具体参数，例如：
 print ("Number of training examples: m_train = " + str(m_train))
 print ("Number of testing examples: m_test = " + str(m_test))
 print ("Height of each image: num_px = " + str(num_px))
 print ("weight of each image: num_py =  " + str(num_py))
 
 '''输出如下：
 Number of training examples: m_train = 209
 Number of testing examples: m_test = 50
 Height of each image: num_px = 64
 weight of each image: num_py = 64
 '''

重塑维度

目前的图像的维度可以打印出来查看

 print ("train_set_x shape: " + str(train_set_x_orig.shape))
 print ("train_set_y shape: " + str(train_set_y.shape)) 
 print ("test_set_x shape: " + str(test_set_x_orig.shape))
 print ("test_set_y shape: " + str(test_set_y.shape))
 
 '''输出如下：
 train_set_x shape: (209, 64, 64, 3)
 train_set_y shape: (1, 209)
 test_set_x shape: (50, 64, 64, 3)
 test_set_y shape: (1, 50)
 '''

将三维的数据转换为二维的数据，将每个样本展开成为一个列向量，多个列向量拼接成为一个矩阵

 # 假设三维数据的初始形状为 (m, n, p)，其中 m 是样本数量，n 是图像的高度，p 是图像的宽度，将其转换为一个二维数组，其形状为 (m, np)。其中的.T操作是对转换后的二维数组进行转置
 # 将训练集的维度降低并转置。
 train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
 # 将测试集的维度降低并转置。
 test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T

查看重塑后的维度

 print ("训练集降维最后的维度: " + str(train_set_x_flatten.shape))
 print ("训练集_标签的维数: " + str(train_set_y.shape))
 print ("测试集降维之后的维度: " + str(test_set_x_flatten.shape))
 print ("测试集_标签的维数: " + str(test_set_y.shape))
 '''输出如下：        
 训练集降维最后的维度: (12288, 209) 前一个数据代表的是一张照片有多少个像素值，后面代表着有209张照片
 训练集_标签的维数: (1, 209)
 测试集降维之后的维度: (12288, 50)
 测试集_标签的维数: (1, 50)
 '''

标准化数据集

对于图像数据来说，可以将数据集的每一行都除以255，因为255是像素通道的最大值，RGB中不存在比255大的数据。所以可以除以255，将标准化的数据介于[0,1]之间。

 train_set_x = train_set_x_flatten/255
 test_set_x = test_set_x_flatten/255

总结：处理数据集的过程

找到原始数据的尺寸和维度
重塑数据集，使得每个数据的大小为（num_px x num_py x 3, 1）
标准化数据集（将数据转换为均值为 0，标准差为 1 的正态分布）

建立神经网络（重点！！！）

建立神经网络的主要步骤

定义模型结构（例如输入特征的数量）
初始化模型的参数
循环：

3.1 计算当前损失（正向传播）

3.2 计算当前梯度（反向传播）

3.3 更新参数（梯度下降）

创建sigmoid函数

 def sigmoid(z):
     """
     参数：
         z  - 任何大小的标量或numpy数组。
     
     返回：
         s  -  sigmoid（z）
     """
     
     s = 1 / (1 + np.exp(-z))    
     return s

建立initialize()函数：实现参数的初始化（w和b参数的初始化）

def initialize_with_zeros(dim):
     """
         此函数为w创建一个维度为（dim，1）的0向量，并将b初始化为0。
         参数：
             dim  - 我们想要的w矢量的大小（或者这种情况下的参数数量）
         
         返回：
             w  - 维度为（dim，1）的初始化向量。
             b  - 初始化的标量（对应于偏差）
     """
     w = np.zeros((dim, 1))
     b = 0
     
     # 使用assert函数确定w的维度是 (dim,1)
     assert(w.shape == (dim, 1)) 
     # b 的类型是 float 或者是 int
     assert(isinstance(b, float) or isinstance(b, int)) 
     
     return w, b
 
 dim = 2
 w, b = initialize_with_zeros(dim)
 print ("w = " + str(w))
 print ("b = " + str(b))

建立propagate函数：计算损失函数及其梯度。使用下面的方法计算正向传播以及反向传播（其中计算梯度就是计算dw和db）

 def propagate(w, b, X, Y):
     """
     实现前向和后向传播的传播函数，计算成本函数及其梯度。
     参数：
         w  - 权重，大小不等的数组（num_px * num_px * 3，1）
         b  - 偏差，一个标量
         X  - 矩阵类型为（num_px * num_px * 3，训练数量）
         Y  - 真正的“标签”矢量（如果非猫则为0，如果是猫则为1），矩阵维度为(1,训练数据数量)
     返回：
         cost- 逻辑回归的负对数似然成本
         dw  - 相对于w的损失梯度，因此与w相同的形状
         db  - 相对于b的损失梯度，因此与b的形状相同
     """
     m = X.shape[1]
 
     # 正向传播
     A = sigmoid(np.dot(w.T, X) + b)
     # cost函数
     cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))
 
     # 反向传播（要求出dw和db）
     dw = 1 / m * np.dot(X, (A - Y).T)
     db = 1 / m * np.sum(A - Y)
 
     # 使用assert函数判断数据类型是否符合要求
     assert (dw.shape == w.shape)
     assert (db.dtype == float)
     cost = np.squeeze(cost) # 将cost数据中维度长度为1的维度去掉
     assert (cost.shape == ())
 
     # 创建一个字典，把 dw 和 db 保存起来。
     grads = {"dw": dw, "db": db}
 
     return grads, cost

通过使用w = w - α * dw，b = b - α * db（α是学习率）来更新参数w和b

 def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost=False):
     """
     此函数通过运行梯度下降算法来优化w和b
     
     参数：
         w  - 权重，大小不等的数组（num_px * num_px * 3，1）
         b  - 偏差，一个标量
         X  - 维度为（num_px * num_px * 3，训练数据的数量）的数组。
         Y  - 真正的“标签”矢量（如果非猫则为0，如果是猫则为1），矩阵维度为(1,训练数据的数量)
         num_iterations  - 优化循环的迭代次数
         learning_rate  - 梯度下降更新规则的学习率
         print_cost  - 每100步打印一次损失值
     返回：
         params  - 包含权重w和偏差b的字典
         grads  - 包含权重和偏差相对于成本函数的梯度的字典
         成本 - 优化期间计算的所有成本列表，将用于绘制学习曲线。
     
     提示：
     我们需要写下两个步骤并遍历它们：
         1）计算当前参数的成本和梯度，使用propagate（）。
         2）使用w和b的梯度下降法则更新参数。
     """
 
     costs = []
 
     for i in range(num_iterations):
 
         grads, cost = propagate(w, b, X, Y)
 
         dw = grads["dw"]
         db = grads["db"]
 
         # 更新参数w和b，使用w = w - α * dw，b = b - α * db（α是学习率）
         w = w - learning_rate * dw
         b = b - learning_rate * db
 
         # 每一百次将cost加入到costs数组中并打印出来
         if i % 100 == 0:
             costs.append(cost)
         if print_cost and i % 100 == 0:
             print("Cost after iteration %i: %f" % (i, cost))
 
     params = {"w": w, "b": b}
 
     grads = {"dw": dw, "db": db}
 
     return params, grads, costs

预测函数，其中w = w.reshape(X.shape[0], 1)是重新塑造w的维度，具体规则如下：

要根据输入特征向量 X 来确定权重向量 w的维度，可以考虑以下两种情况：

权重向量为列向量： 如果你选择将权重向量表示为列向量，那么权重向量的维度应该与输入特征向量的维度相同。也就是说，权重向量的长度应该等于输入特征向量的长度。因此，如果输入特征向量 X 的形状为 (n, m)，那么权重向量 w 的形状应该为 (n, 1)。
权重向量为行向量： 如果你选择将权重向量表示为行向量，那么权重向量的长度应该等于输入特征向量的长度。也就是说，权重向量的形状应该为 (1, n)。在这种情况下，可以使用 np.transpose() 函数将权重向量从列向量转换为行向量。

def predict(w, b, X):
     """
     使用学习逻辑回归参数 logistic(w，b) 预测标签是0还是1，
     
     参数：
         w  - 权重，大小不等的数组（num_px * num_px * 3，1）
         b  - 偏差，一个标量
         X  - 维度为（num_px * num_px * 3，训练数据的数量）的数据
     
     返回：
         Y_prediction  - 包含X中所有图片的所有预测的一个numpy数组
     
     """
 
     # 图片的数量
     m = X.shape[1] # shape[1]表示X的列数
     Y_prediction = np.zeros((1, m))
     w = w.reshape(X.shape[0], 1) # shape[0]表示X的行数
 
     # 预测猫在图片中出现的概率，sigmoid函数会返回一个[0,1]之间的结果
     A = sigmoid(np.dot(w.T, X) + b)
 
     for i in range(A.shape[1]):
         # 将概率 a[0，i] 转换为实际预测 p[0，i]
         if A[0, i] <= 0.5:
             Y_prediction[0, i] = 0
         else:
             Y_prediction[0, i] = 1
 
     assert (Y_prediction.shape == (1, m))
 
     return Y_prediction

搭建模型

 def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
     """
     通过调用之前实现的函数来构建逻辑回归模型
     参数：
         X_train  - numpy的数组,维度为（num_px * num_px * 3，m_train）的训练集
         Y_train  - numpy的数组,维度为（1，m_train）（矢量）的训练标签集
         X_test   - numpy的数组,维度为（num_px * num_px * 3，m_test）的测试集
         Y_test   - numpy的数组,维度为（1，m_test）的（向量）的测试标签集
         num_iterations  - 表示用于优化参数的迭代次数的超参数
         learning_rate  - 表示optimize（）更新规则中使用的学习速率的超参数
         print_cost  - 设置为true以每100次迭代打印成本
     
     返回：
         d  - 包含有关模型信息的字典。
     """
 
     # 初始化全零参数
     w, b = initialize_with_zeros(X_train.shape[0])
 
     # 梯度下降
     parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
 
     # 从“parameters”字典中检索参数w和b
     w = parameters["w"]
     b = parameters["b"]
 
     # 预测测试/训练集的例子
     Y_prediction_test = predict(w, b, X_test)
     Y_prediction_train = predict(w, b, X_train)
 
     # 打印训练后的准确性
     print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
     print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
 
     d = {"costs": costs,
          "Y_prediction_test": Y_prediction_test,
          "Y_prediction_train": Y_prediction_train,
          "w": w,
          "b": b,
          "learning_rate": learning_rate,
          "num_iterations": num_iterations}
 
     return d

绘制损失函数

plot 函数是用于绘制图形的函数，通常在数据分析、可视化和报告中使用。具体来说，plot 函数通常接受数据作为输入，并根据输入数据绘制相应的图形。例如，对于折线图，plot 函数会将数据点连接起来形成一条折线；对于散点图，plot 函数会将数据点绘制在二维平面上。

 costs = np.squeeze(d['costs'])
 plt.plot(costs)
 plt.ylabel('cost')
 plt.xlabel('iterations (per hundreds)')
 plt.title("Learning rate =" + str(d["learning_rate"]))
 plt.show()

迭代次数以及学习率的选择

学习率通过learning_rates来调整，num_iterations可以调整迭代的次数

 learning_rates = [0.01, 0.001, 0.0001]
 models = {}
 for i in learning_rates:
     print("learning rate is: " + str(i))
     models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations=1500, learning_rate=i, print_cost=False)
     print('\n' + "-------------------------------------------------------" + '\n')
 
 for i in learning_rates:
     plt.plot(np.squeeze(models[str(i)]["costs"]), label=str(models[str(i)]["learning_rate"]))
 
 plt.ylabel('cost')
 plt.xlabel('iterations')
 
 legend = plt.legend(loc='upper center', shadow=True)
 frame = legend.get_frame()
 frame.set_facecolor('0.90')
 plt.show()
 '''
 plt.legend(loc='upper center', shadow=True)：这行代码用于添加图例。`loc` 参数指定了图例的位置，这里设置为 `'upper center'` 表示图例位于图形的上方居中位置。`shadow=True` 参数表示在图例周围绘制阴影，使图例更加突出。
 frame = legend.get_frame()：这行代码用于获取图例对象的框架（frame）。图例对象可以通过 `plt.legend()` 函数获取，而 `get_frame()` 方法用于获取图例对象的框架。
 frame.set_facecolor('0.90')：这行代码用于设置图例框架的背景颜色。在这里，`set_facecolor()` 方法将图例框架的背景颜色设置为 `'0.90'`，表示一个灰色系列的背景色，数值范围一般在 0 到 1 之间，表示不同的灰度色调。
 '''

hw3

使用LogisticRegression查看预测结果

import sys 
import numpy 
import matplotlib
import sklearn 
# Package imports
import numpy as np
import matplotlib.pyplot as plt
from testCases import *
import sklearn
import sklearn.datasets
import sklearn.linear_model
from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset, load_extra_datasets

%matplotlib inline

# 设置一个固定的随机种子。那么w和b在初始化的时候无论在什么条件下生成的初始化数据都是一样的，可以使实验可重现性。
np.random.seed(1) 

#导入数据集
X, Y = load_planar_dataset() 

#绘制数据分布的散点图，X[0, :]代表数据的x值, X[1, :]代表数据的y值，
#c=Y.reshape(X[0,:].shape)表示数据的颜色由Y的值来决定
#红色的数据点表示y=0，蓝色的数据点表示y=1
plt.scatter(X[0, :], X[1, :], c=Y.reshape(X[0,:].shape), s=40, cmap=plt.cm.Spectral)  


shape_X = X.shape
shape_Y = Y.shape
m = shape_X[1]  # shape[0]表示有多少行 shape[1]表示有多少列

print ('X的维度为: ' + str(shape_X))
print ('Y的维度为: ' + str(shape_Y))
print ("数据集里面的数据有：" + str(m) + " 个")

#使用逻辑回归函数查看效果，训练数据集上的逻辑回归分类器。
clf = sklearn.linear_model.LogisticRegressionCV();
clf.fit(X.T, Y.T);

# Plot the decision boundary for logistic regression
# 绘制决策边界                       
plot_decision_boundary(lambda x: clf.predict(x), X, Y) 
# 图标题
plt.title("Logistic Regression") 

# 打印准确性
LR_predictions = clf.predict(X.T)
print ('逻辑回归的准确性：%d ' % float((np.dot(Y,LR_predictions) + np.dot(1-Y,1-LR_predictions))/float(Y.size)*100) + '% ' + "(正确标记的数据点所占的百分比)")

训练模型

神经网络的数学逻辑

对于只有一层的隐藏层的神经网络而言：

建立神经网络的步骤

① 建立神经网络的一般方法是：

1. 定义神经网络结构（输入单元数，隐藏单元数等）
2. 初始化模型的参数
3. 循环：

3.1 实施前项传播
3.2 计算损失
3.3 实现后向传播
3.4 更新参数（梯度下降）

② 我们通常会构建辅助函数来计算第（1）-（3）步，然后将它们合并为nn_model()函数。

③ 一旦构建了 nn_model() 并学习了正确的参数，就可以对新数据进行预测。

定义神经网络结构

def layer_sizes(X, Y):
    """
    参数：
     X - 输入数据集,维度为（输入的数量，训练/测试的数量）
     Y - 标签，维度为（输出的数量，训练/测试数量）
    
    返回：
     n_x - 输入层的数量
     n_h - 隐藏层的数量
     n_y - 输出层的数量
    """
    n_x = X.shape[0]
    n_h = 4
    n-y = Y.shape[0]
    
    return n_x,n_h,n_y

初始化模型参数

def initialize_parameters(n_x, n_h, n_y):
    """
    参数：
        n_x - 输入层节点的数量
        n_h - 隐藏层节点的数量
        n_y - 输出层节点的数量
    
    返回：
        parameters - 包含参数的字典：
            W1 - 权重矩阵,维度为（n_h，n_x）采用随机生成的方式
            b1 - 偏向量，维度为（n_h，1） 一般设置为全0矩阵
            W2 - 权重矩阵，维度为（n_y，n_h）采用随机生成的方式
            b2 - 偏向量，维度为（n_y，1）一般设置为全0矩阵

    """
    np.random.seed(2) 
    
    W1 = np.random.randn(n_h,n_x) * 0.01 #乘以0.01是因为w不能太大，要小一点
    b1 = np.zeros((n_h,1))  #注意是zeros并且有两个括号(())
    W2 = np.random.randn(n_y,n_h) * 0.01
    b2 = np.zeros((n_y,1))
    
    paramaters = {"W1":W1,"b1":b1,"W2":W2,"b2":b2}
    
    return paramaters

前向传播

Z = W*X + b，根据前面提到的知识对前向传播进行运算

def forward_propagation(X, parameters):
    """
    参数：
         X - 维度为（n_x，m）的输入数据。
         parameters - 初始化函数（initialize_parameters）的输出
    
    返回：
         A2 - 使用sigmoid()函数计算的第二次激活后的数值
         cache - 包含“Z1”，“A1”，“Z2”和“A2”的字典类型变量
     """

    Z1 = np.dot(parameters["W1"], X) + parameters["b1"]
    A1 = np.tanh(Z1)
    Z2 = np.dot(parameters["W2"], A1) + parameters["b2"]
    A2 = sigmoid(Z2)

    cache = {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2}
    return A2, cache

计算成本函数

成本函数的函数表达式如下所示：

def compute_cost(A2, Y):
    """
    计算方程（7）中给出的交叉熵成本，
    
    参数：
         A2 - 使用sigmoid()函数计算的第二次激活后的数值
         Y - "True"标签向量,维度为（1，数量）
         parameters - 一个包含W1，B1，W2和B2的字典类型的变量
    
    返回：
         成本 - 交叉熵成本给出方程（7）
    """
    m = Y.shape[1]
    cost = -1 / m * np.sum(Y * np.log(A2) + (1 - Y) * np.log(1 - A2))
    return cost

反向传播

反向传播的总结如下：

左边是单样本的神经网络，右边是多样本的神经网络

def backward_propagation(parameters, cache, X, Y):
    """
    使用上述说明搭建反向传播函数。
    
    参数：
     parameters - 一个包含W1，B1，W2和B2的字典类型的变量
     cache - 包含“Z1”，“A1”，“Z2”和“A2”的字典类型的变量。
     X - 输入数据，维度为（2，数量）
     Y - “True”标签，维度为（1，数量）
    
    返回：
     grads - 包含W和b的导数一个字典类型的变量。
    """
    m = X.shape[1]
    dz2 = cache["A2"] - Y
    dw2 = 1 / m * np.dot(dz2, cache["A1"].T)
    db2 = 1 / m * np.sum(dz2, axis=1, keepdims=True)
    dz1 = np.dot(parameters["W2"].T, dz2) * (1 - np.power(cache["A1"], 2))
    dw1 = 1 / m * np.dot(dz1, X.T)
    db1 = 1 / m * np.sum(dz1, axis=1, keepdims=True)

    grads = {"dW2": dw2, "db2": db2, "dW1": dw1, "db1": db1}

    return grads

更新参数

使用下面的公式来更新参数

w = w - α * dw
b = b - α * db (其中α代表的是学习率)

def update_parameters(parameters, grads, learning_rate=1.2):
    """
    使用上面给出的梯度下降更新规则更新参数
    
    参数：
     parameters - 包含参数的字典类型的变量。
     grads - 包含导数值的字典类型的变量。
     learning_rate - 学习速率
    
    返回：
     parameters - 包含更新参数的字典类型的变量。
    """
    W1 = parameters["W1"] - learning_rate * grads["dW1"]
    b1 = parameters["b1"] - learning_rate * grads["db1"]
    W2 = parameters["W2"] - learning_rate * grads["dW2"]
    b2 = parameters["b2"] - learning_rate * grads["db2"]

    parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2}

    return parameters

整合模型

神经网络训练的步骤如下：

1. 准备数据集：
收集并准备训练数据集和测试数据集。
对数据进行预处理，包括归一化、标准化、特征工程等操作。

2. 初始化模型参数：
初始化神经网络的权重和偏置参数，通常使用随机初始化的方式。

3. 前向传播：
将训练数据输入神经网络中，通过神经网络的前向传播计算得到预测值。
在每一层中，计算加权输入，应用激活函数，然后将结果传递到下一层。

4. 计算损失函数：
使用损失函数（如交叉熵损失函数）比较神经网络的预测值与真实标签之间的差距。
损失函数用于衡量模型预测的准确度，帮助调整模型参数以最小化预测误差。

5. 反向传播：
从输出层向输入层反向传播误差梯度。
计算每一层的参数梯度，用于更新模型参数。

6. 参数更新：
使用梯度下降或其变种算法（如 Adam、RMSprop 等）更新模型参数。
通过更新参数，使损失函数尽可能地减小，提高模型的性能。

7. 重复迭代：
重复执行前向传播、计算损失函数、反向传播和参数更新的步骤，直到满足停止条件（如达到最大迭代次数、损失函数收敛等）。

8. 模型评估：
使用测试数据集评估训练得到的模型性能。
计算模型在测试集上的准确率、精确率、召回率、F1 值等指标，评估模型的泛化能力。

def nn_model(X, Y, n_h, num_iterations=10000, print_cost=False):
    """
    参数：
        X - 数据集,维度为（2，示例数）
        Y - 标签，维度为（1，示例数）
        n_h - 隐藏层的数量
        num_iterations - 梯度下降循环中的迭代次数
        print_cost - 如果为True，则每1000次迭代打印一次成本数值
    
    返回：
        parameters - 模型学习的参数，它们可以用来进行预测。
     """
    # 初始化参数，然后检索 W1, b1, W2, b2。输入:“n_x, n_h, n_y”。
    np.random.seed(3)
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]

    # 初始化参数，然后检索 W1, b1, W2, b2。
    # 输入:“n_x, n_h, n_y”。输出=“W1, b1, W2, b2，参数”。
    parameters = initialize_parameters(n_x, n_h, n_y)
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    # 循环(梯度下降)
    for i in range(0, num_iterations):

        # 前向传播
        A2, cache = forward_propagation(X, parameters)

        # 计算成本
        cost = compute_cost(A2, Y)

        # 反向传播
        grads = backward_propagation(parameters, cache, X, Y)

        # 更新参数
        parameters = update_parameters(parameters, grads)

        # 每1000次迭代打印成本
        if print_cost and i % 1000 == 0:
            print("Cost after iteration %i: %f" % (i, cost))

    return parameters

预测

构建predict函数来预测，使用前向传播算法来预测结果

def predict(parameters, X): 
    """
    使用学习的参数，为X中的每个示例预测一个类
    
    参数：
        parameters - 包含参数的字典类型的变量。
        X - 输入数据（n_x，m）
    
    返回
        predictions - 我们模型预测的向量（红色：0 /蓝色：1）
     
     """
    
    # 使用前向传播计算概率，并使用 0.5 作为阈值将其分类为 0/1。
    A2, cache = forward_propagation(X, parameters)
    predictions = np.round(A2) # 对A2数组中的每个元素执行四舍五入的操作
    
    return predictions

模型运行

调用模型对结果进行预测

parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)

绘制图像

plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
plt.title("Decision Boundary for hidden layer size " + str(4))

打印准确率

predictions = predict(parameters, X)
print ('准确率: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
# (np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)表示的是预测正确的结果，第一个表示的是预测值为正确且真实值也为正确的样本数量，第二个表示的是预测值为错误且结果也为错误的样本数量。总之两者相加就是最终准确预测的样本数量

调节隐藏层个数

plt.figure(figsize=(16, 32))
hidden_layer_sizes = [1, 2, 3, 4, 5, 10, 20] # 隐藏层数量
for i, n_h in enumerate(hidden_layer_sizes):
    plt.subplot(5, 2, i+1)
    plt.title('Hidden Layer of size %d' % n_h)
    parameters = nn_model(X, Y, n_h, num_iterations = 5000)
    plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
    predictions = predict(parameters, X)
    accuracy = float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100)
    print ("隐藏层的节点数量： {}  ，准确率: {} %".format(n_h, accuracy))