吴恩达机器学习笔记---ex4(python实现)

最新推荐文章于 2024-05-10 16:46:00 发布

ML0209

最新推荐文章于 2024-05-10 16:46:00 发布

阅读量1.6k

点赞数 5

分类专栏：机器学习excise 文章标签：神经网络 python 机器学习

本文链接：https://blog.csdn.net/qq_45604750/article/details/107796278

版权

机器学习excise 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

练习链接

编程练习4：神经网络学习

1. 神经网络

在前面的练习中，您实现了神经网络的前向传播，并使用我们提供的权重来预测手写数字。在本练习中，您将实现反向传播算法来学习神经网络的参数。

1.1 可视化数据

ex4data1.mat中获得一个包含5000个手写数字训练示例的数据集。.mat格式表示数据已保存为原生Octave / MATLAB矩阵格式，而不是像csv文件那样的文本（ASCII）格式。可以使用loadmat命令将这些矩阵直接读入程序。
ex4data1.mat中有5000个训练示例，其中每个训练示例是数字的20×20像素灰度图像。每个像素由浮点数表示，该浮点数表示该位置处的灰度强度。20×20像素网格被“展开”成400维向量。这些训练示例中的每一个在我们的数据矩阵X中变成单行。这给出了5000×400矩阵X，其中每行是手写数字图像的训练示例。训练集的第二部分是5000维向量y，其包含训练集的标签。为了使事物与Octave / MATLAB索引更加兼容，在没有零索引的情况下，我们将数字零映射到值10。因此，“0”数字标记为“10”，而数字“1”至“9”按其自然顺序标记为“1”至“9”。

# 导入数据
path = r'C:\Users\Administrator\Desktop\ML\machine-learning-ex4\ex4\ex4data1.mat'
data = loadmat(path)

X = data['X']                           # 5000 x 400
y = data['y']                           # 5000 x 1

# 可视化数据
def DisplayData(X):
    fig, ax = plt.subplots(nrows=10, ncols=10, sharey=True, sharex=True)     # 生成10行10列的画布
    pick_one = np.random.randint(0, 5000, (100, ))                           # 随机生成100个0到5000的数字，作为索引
    for row in range(10):
        for col in range(10):
            x = (X[pick_one[col + row * 10]].reshape((20, 20))).T            # 提取对应的数字信息
            ax[row, col].matshow(x, cmap='gray_r')                           # 画出灰度图

    plt.xticks([])
    plt.yticks([])
    plt.show()

DisplayData(X)
'''
因为数字库的数字直接画出来是混乱的，在上一次的作业里我画出来就是那样，尝试在数据后面加上.T转置，
数字就变得不颠倒了
'''

运行结果：
在这里插入图片描述

1.2 模型展示

在这里插入图片描述
我们的神经网络模型如图所示，共有3层，分别是输入层、隐藏层和输出层。不加上偏置项，L1有400个激活单元，因为输入的是20*20的数据，隐藏层有25个激活单元，输出层有10个输出单元，对应10个数字。由此可以得到theta1是25 x 401，theta2是 10 x 26的。我们将已经训练好的模型参数加载出来。

# 模型表示
path = r'C:\Users\Administrator\Desktop\ML\machine-learning-ex4\ex4\ex4weights.mat'
weights = loadmat(path)

theta1 = weights['Theta1']             # 25 x 401
theta2 = weights['Theta2']             # 10 x 26

1.3 前向传播和代价函数

在这里，我们开始进行前向传播过程并表示出正则化后的代价函数。
在这里插入图片描述

# 定义Sigmod函数
def Sigmoid(z):
    return 1/(1+np.exp(-z))

# 前向传播
def ForwardPropagation(X, theta1, theta2):
    a1 = np.insert(X, 0, 1, axis=1)             # 添加偏置项
    z2 = np.dot(a1, theta1.T)                   
    a2 = np.insert(Sigmoid(z2), 0, 1, axis=1)   # 添加偏置项
    z3 = np.dot(a2, theta2.T)
    h = Sigmoid(z3)
    return h


# 代价函数
def CostFunction(X, y, lr):
    X = np.mat(X)
    y = np.mat(y)
    h = ForwardPropagation(X, theta1, theta2)   # 得到输出结果，5000 x 10

    J = 0
    for i in range(5000):                       # 依次计算每个样本的误差
        first_term = np.multiply(-y[i, :], np.log(h[i, :]))
        second_term = np.multiply((1 - y[i, :]), np.log(1 - h[i, :]))
        J += np.sum(first_term - second_term)

    J = J / 5000
    
    # 正则化代价函数
#   J += (float(lr) / (2 * 5000)) * (np.sum(np.power(theta1[:, 1:], 2)) + np.sum(np.power(theta2[:, 1:], 2)))

    return J

还有很重要的一点， $y$ 是1到10的数字，我们不能将 $y$ 直接代入，应将其转换为如下形式，对应数字处为1，其余地方为0，使用独热码的形式实现
在这里插入图片描述

# y是1到10的数字，将其转换为独热码的形式，便于反向传播
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse=False)
y = encoder.fit_transform(y)           # 5000 x 10

最后调用函数，求得在给定theta的前提下，正则化后的代价函数的值为0.38376985909092354，没有正则化的结果是0.2876291651613188，因为没有正则化可能会导致过拟合，代价函数的值自然会更小。
完整代码：

import numpy as np
import matplotlib.pyplot as plt
from scipy.io import loadmat

# 导入数据
path = r'C:\Users\Administrator\Desktop\ML\machine-learning-ex4\ex4\ex4data1.mat'
data = loadmat(path)

X = data['X']                           # 5000 x 400
y = data['y']                           # 5000 x 1

# y是1到10的数字，将其转换为独热码的形式，便于反向传播
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse=False)
y = encoder.fit_transform(y)           # 5000 x 10

# 可视化数据
def DisplayData(X):
    fig, ax = plt.subplots(nrows=10, ncols=10, sharey=True, sharex=True)     # 生成10行10列的画布
    pick_one = np.random.randint(0, 5000, (100, ))                           # 随机生成100个0到5000的数字，作为索引
    for row in range(10):
        for col in range(10):
            # 最后加上转置，使得混乱的数字变得整齐
            x = (X[pick_one[col + row * 10]].reshape((20, 20))).T            # 提取对应的数字信息
            ax[row, col].matshow(x, cmap='gray_r')                           # 画出灰度图

    plt.xticks([])
    plt.yticks([])
    plt.show()

DisplayData(X)

# 模型表示
path = r'C:\Users\Administrator\Desktop\ML\machine-learning-ex4\ex4\ex4weights.mat'
weights = loadmat(path)

theta1 = weights['Theta1']             # 25 x 401
theta2 = weights['Theta2']             # 10 x 26

# 定义Sigmod函数
def Sigmoid(z):
    return 1/(1+np.exp(-z))

# 前向传播
def ForwardPropagation(X, theta1, theta2):
    a1 = np.insert(X, 0, 1, axis=1)             # 添加偏置项
    z2 = np.dot(a1, theta1.T)
    a2 = np.insert(Sigmoid(z2), 0, 1, axis=1)   # 添加偏置项
    z3 = np.dot(a2, theta2.T)
    h = Sigmoid(z3)
    return h


# 代价函数
def CostFunction(X, y, lr):
    X = np.mat(X)
    y = np.mat(y)
    h = ForwardPropagation(X, theta1, theta2)   # 得到输出结果，5000 x 10

    J = 0
    for i in range(5000):                       # 依次计算每个样本的误差
        first_term = np.multiply(-y[i, :], np.log(h[i, :]))
        second_term = np.multiply((1 - y[i, :]), np.log(1 - h[i, :]))
        J += np.sum(first_term - second_term)

    J = J / 5000

    # 正则化代价函数
#   J += (float(lr) / (2 * 5000)) * (np.sum(np.power(theta1[:, 1:], 2)) + np.sum(np.power(theta2[:, 1:], 2)))

    return J

lr = 1
print(CostFunction(X, y, lr))

2. 反向传播

这一部分我们将使用反向传播算法对模型参数进行优化。简单来说就是对代价函数为每个模型参数求偏导，步骤大致如下，我们逐步实现该过程。
在这里插入图片描述
加上正则化：

2.1 Sigmoid导数

观察上面的过程，我们会发现有对Sigmoid函数求导的过程，我们根据公式表示出来：
在这里插入图片描述

def Sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_gradient(z):
    return np.multiply(Sigmoid(z), (1 - Sigmoid(z)))

2.2 随机初始化

对模型参数进行随机初始化，根据题意将它们的范围控制住-0.12到0.12之间，我们使用numpy中的均匀分布函数实现。

def random(size):
    return np.random.uniform(-0.12, 0.12, size)

我们对一些变量进行定义，方便后面的使用。

l1 = 400       # 第一层有400个激活单元
l2 = 25        # 第二层有25个激活单元
l3 = 10        # 第三层有10个输出单元

params = random(l2 * (l1 + 1) + l3 * (l2 + 1))      # 将参赛展开并进行初始化，大小为(25*401 + 10*26)

2.3 反向传播

# 定义反向传播
def BackPropagation(params, l1, l2, l3, X, y, lr):

    theta1 = np.mat(np.reshape(params[:l2 * (l1 + 1)], (l2, (l1 + 1))))     # 参数还原
    theta2 = np.mat(np.reshape(params[l2 * (l1 + 1):], (l3, (l2 + 1))))

    a1, z2, a2, z3, h = ForwardPropagation(X, theta1, theta2)     # 利用前向传播得到输出结果h

    delta1 = np.zeros(theta1.shape)  # (25, 401)                  # 初始化误差矩阵为0
    delta2 = np.zeros(theta2.shape)  # (10, 26)
    
    # 代价函数
    J = 0
    for i in range(5000):
        first_term = np.multiply(-y[i, :], np.log(h[i, :]))
        second_term = np.multiply((1 - y[i, :]), np.log(1 - h[i, :]))
        J += np.sum(first_term - second_term)

    J = J / 5000
    
    #正则化
    J += (float(lr) / (2 * 5000)) * (np.sum(np.power(theta1[:, 1:], 2)) + np.sum(np.power(theta2[:, 1:], 2)))

    for t in range(5000):                 # 对每个样本依次进行操作
        a1t = a1[t, :]  # (1, 401)
        z2t = z2[t, :]  # (1, 25)
        a2t = a2[t, :]  # (1, 26)
        ht = h[t, :]  # (1, 10)
        yt = y[t, :]  # (1, 10)

        d3t = ht - yt  # (1, 10)         # 得到输出层的误差

        z2t = np.insert(z2t, 0, 1, axis=1)  # (1, 26)
        d2t = np.multiply((theta2.T * d3t.T).T, sigmoid_gradient(z2t))  # (1, 26)    # 得到隐藏层的误差

        delta1 = delta1 + (d2t[:, 1:]).T * a1t      # 得到误差矩阵
        delta2 = delta2 + d3t.T * a2t

    delta1 = delta1 / 5000               # 对误差矩阵求均值              
    delta2 = delta2 / 5000

    delta1[:, 1:] = delta1[:, 1:] + (theta1[:, 1:] * lr) / 5000   # 加上正则化，并去掉偏置项
    delta2[:, 1:] = delta2[:, 1:] + (theta2[:, 1:] * lr) / 5000

    grad = np.concatenate((np.ravel(delta1), np.ravel(delta2)))

    return J, grad

2.4 参数优化

from scipy.optimize import minimize

fmin = minimize(fun=BackPropagation, x0=params, args=(l1, l2, l3, X, y_onehot, lr),
                method='TNC', jac=True, options={'maxiter': 250})

2.5 计算正确率

theta1 = np.mat(np.reshape(fmin.x[:l2 * (l1 + 1)], (l2, (l1 + 1))))      # 优化好的参数
theta2 = np.mat(np.reshape(fmin.x[l2 * (l1 + 1):], (l3, (l2 + 1))))

a1, z2, a2, z3, h = ForwardPropagation(X, theta1, theta2)
y_pred = np.array(np.argmax(h, axis=1) + 1)

correct = [1 if a == b else 0 for (a, b) in zip(y_pred, y)]
accuracy = (sum(map(int, correct)) / float(len(correct)))
print ('accuracy = {0}%'.format(accuracy * 100))

这一部分运行的速度会很慢，估计需要几分钟，所以没有进行梯度检验，否则速度会更慢。最终运行结果：

accuracy = 99.26%

3. 可视化隐藏层

我们提取隐藏层的参数theta1，观察一下输入的图像在经过一层神经网络之后的图像

def plot_hidden(theta1):
    t1 = theta1[:, 1:]
    fig, ax = plt.subplots(5, 5, sharex=True, sharey=True, figsize=(6, 6))
    for r in range(5):
        for c in range(5):
            ax[r, c].matshow(t1[r * 5 + c].reshape(20, 20), cmap='gray_r')
            plt.xticks([])
            plt.yticks([])
    plt.show()

plot_hidden(theta1)

运行结果：
在这里插入图片描述
完整代码：

import numpy as np
from scipy.io import loadmat
import matplotlib.pyplot as plt

# 导入数据
path = r'C:\Users\Administrator\Desktop\ML\machine-learning-ex4\ex4\ex4data1.mat'
data = loadmat(path)

X = data['X']                           # 5000 x 400
y = data['y']                           # 5000 x 1
X = np.mat(X)
y = np.mat(y)

l1 = 400       # 第一层有400个激活单元
l2 = 25        # 第二层有25个激活单元
l3 = 10        # 第三层有10个输出单元

lr = 1

from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse=False)
y_onehot = encoder.fit_transform(y)

# 定义Sigmoid函数
def Sigmoid(z):
    return 1 / (1 + np.exp(-z))

# 定义Sigmoid函数的导数
def sigmoid_gradient(z):
    return np.multiply(Sigmoid(z), (1 - Sigmoid(z)))

# 随机初始化
def random(size):
    return np.random.uniform(-0.12, 0.12, size)

# 定义前向传播
def ForwardPropagation(X, theta1, theta2):

    a1 = np.insert(X, 0, 1, axis=1)
    z2 = a1 * theta1.T
    a2 = np.insert(Sigmoid(z2), 0, 1, axis=1)
    z3 = a2 * theta2.T
    h = Sigmoid(z3)

    return a1, z2, a2, z3, h

params = random(l2 * (l1 + 1) + l3 * (l2 + 1))      # 将参赛展开并进行初始化，大小为(25*401 + 10*26)

# 定义反向传播
def BackPropagation(params, l1, l2, l3, X, y, lr):

    theta1 = np.mat(np.reshape(params[:l2 * (l1 + 1)], (l2, (l1 + 1))))     # 参数还原
    theta2 = np.mat(np.reshape(params[l2 * (l1 + 1):], (l3, (l2 + 1))))

    a1, z2, a2, z3, h = ForwardPropagation(X, theta1, theta2)     # 利用前向传播得到输出结果h

    delta1 = np.zeros(theta1.shape)  # (25, 401)                  # 初始化误差矩阵为0
    delta2 = np.zeros(theta2.shape)  # (10, 26)

    # 代价函数
    J = 0
    for i in range(5000):
        first_term = np.multiply(-y[i, :], np.log(h[i, :]))
        second_term = np.multiply((1 - y[i, :]), np.log(1 - h[i, :]))
        J += np.sum(first_term - second_term)

    J = J / 5000

    #正则化
    J += (float(lr) / (2 * 5000)) * (np.sum(np.power(theta1[:, 1:], 2)) + np.sum(np.power(theta2[:, 1:], 2)))

    for t in range(5000):                 # 对每个样本依次进行操作
        a1t = a1[t, :]  # (1, 401)
        z2t = z2[t, :]  # (1, 25)
        a2t = a2[t, :]  # (1, 26)
        ht = h[t, :]  # (1, 10)
        yt = y[t, :]  # (1, 10)

        d3t = ht - yt  # (1, 10)         # 得到输出层的误差

        z2t = np.insert(z2t, 0, 1, axis=1)  # (1, 26)
        d2t = np.multiply((theta2.T * d3t.T).T, sigmoid_gradient(z2t))  # (1, 26)    # 得到隐藏层的误差

        delta1 = delta1 + (d2t[:, 1:]).T * a1t      # 得到误差矩阵
        delta2 = delta2 + d3t.T * a2t

    delta1 = delta1 / 5000               # 对误差矩阵求均值
    delta2 = delta2 / 5000

    delta1[:, 1:] = delta1[:, 1:] + (theta1[:, 1:] * lr) / 5000   # 加上正则化，并去掉偏置项
    delta2[:, 1:] = delta2[:, 1:] + (theta2[:, 1:] * lr) / 5000

    grad = np.concatenate((np.ravel(delta1), np.ravel(delta2)))

    return J, grad

from scipy.optimize import minimize

fmin = minimize(fun=BackPropagation, x0=params, args=(l1, l2, l3, X, y_onehot, lr),
                method='TNC', jac=True, options={'maxiter': 250})

theta1 = np.mat(np.reshape(fmin.x[:l2 * (l1 + 1)], (l2, (l1 + 1))))      # 优化好的参数
theta2 = np.mat(np.reshape(fmin.x[l2 * (l1 + 1):], (l3, (l2 + 1))))

def pred_accuracy():
    a1, z2, a2, z3, h = ForwardPropagation(X, theta1, theta2)
    y_pred = np.array(np.argmax(h, axis=1) + 1)

    correct = [1 if a == b else 0 for (a, b) in zip(y_pred, y)]
    accuracy = (sum(map(int, correct)) / float(len(correct)))
    print ('accuracy = {0}%'.format(accuracy * 100))

# 可视化隐藏层
def plot_hidden(theta1):
    t1 = theta1[:, 1:]
    fig, ax = plt.subplots(5, 5, sharex=True, sharey=True, figsize=(6, 6))
    for r in range(5):
        for c in range(5):
            ax[r, c].matshow(t1[r * 5 + c].reshape(20, 20), cmap='gray_r')
            plt.xticks([])
            plt.yticks([])
    plt.show()

总结：在运行过程中遇到许多问题，集中在反向传播的求偏导过程，要注意各变量之间的维度关系；注意在什么时候加上偏置项，什么时候去掉偏置项，不对偏置项求误差；使用正确的乘积方法；函数中的参数不要颠倒等等。

ML0209

关注

5
点赞
踩
24

收藏

觉得还不错? 一键收藏
4
评论
吴恩达机器学习笔记---ex4(python实现)

练习链接编程练习4：神经网络学习1. 神经网络在前面的练习中，您实现了神经网络的前向传播，并使用我们提供的权重来预测手写数字。在本练习中，您将实现反向传播算法来学习神经网络的参数。1.1 可视化数据 ex4data1.mat中获得一个包含5000个手写数字训练示例的数据集。.mat格式表示数据已保存为原生Octave / MATLAB矩阵格式，而不是像csv文件那样的文本（ASCII）格式。可以使用loadmat命令将这些矩阵直接读入程序。 ex4data1.mat中有5000个训.
复制链接

扫一扫

专栏目录