吴恩达机器学习ex4:BP神经网络

最新推荐文章于 2021-08-07 19:54:45 发布

wssssang

最新推荐文章于 2021-08-07 19:54:45 发布

阅读量925

点赞数

分类专栏： python 机器学习文章标签： python 神经网络机器学习

本文链接：https://blog.csdn.net/weixin_47798560/article/details/115897601

版权

python 同时被 2 个专栏收录

34 篇文章 5 订阅

订阅专栏

机器学习

8 篇文章 0 订阅

订阅专栏

吴恩达机器学习作业四：BP神经网络

在这里插入图片描述

1.1 Visualizing the data

这与上一个练习中使用的数据集相同。ex3data1.mat中有5000个训练示例，其中每个训练示例是数字的20像素乘20像素灰度图像。每个像素由一个浮点数表示，表示该位置的灰度强度。像素的20乘20网格被“展开”成400维向量。这些训练示例中的每一个都成为我们的数据矩阵X中的一行。这给了我们一个5000×400的矩阵X，其中每一行都是手写数字图像的训练示例。
训练集的第二部分是包含训练集标签的5000维向量y。为了使事情更兼容八进制/MATLAB索引，在没有零索引的情况下，我们将数字零映射到值10。因此，“0”数字标记为“10”，而数字“1”到“9”按其自然顺序标记为“1”到“9”。

#coding=utf-8
import numpy as np
import pandas as pd
from scipy.io import loadmat
import scipy.optimize as opt
from matplotlib import pyplot as plt
from sklearn.metrics import classification_report  # 这个包是评价报告

def load_data(path):
    data = loadmat(path)
    x = data['X']
    y = data['y']
    return x,y

path_data = 'D:\编程\ex3data1.mat'

x1,y1= load_data(path_data)
print(x1.shape,y1.shape)



# 随机画100个数字
def plot_images(x):
    pick_index = np.random.choice(x.shape[0],100)
    pick_images = x[pick_index,:]  #(100,400)
    fig, ax = plt.subplots(nrows=10, ncols=10, sharey=True, sharex=True, figsize=(8, 8))
    for i in range(10):
        for j in range(10):
            # fig, ax = plt.subplots()等价于：
            # fig = plt.figure()
            # ax = fig.add_subplot(1, 1, 1)
            ax[i,j].matshow(pick_images[10*i+j,:].reshape(20,20))
            plt.xticks([])
            plt.yticks([])
    plt.show()



print(plot_images(x1))

在这里插入图片描述

1.2 Model representation 模型表示

我们的网络有三层，输入层，隐藏层，输出层。我们的输入是数字图像的像素值，因为每个数字的图像大小为20*20，所以我们输入层有400个单元（这里不包括总是输出要加一个偏置单元）。

在这里插入图片描述

1.2.1 获取输入变量

# 获取输入变量x
x1 = np.insert(x1,0,values=1,axis=1)
print(x1.shape)

首先我们要将标签值（1，2，3，4，…，10）转化成非线性相关的向量，向量对应位置（y[i-1]）上的值等于1，例如y[0]=6转化为y[0]=[0,0,0,0,0,1,0,0,0,0]。

# 获取输入变量y
y1_new = []
for i in y1:
    y_array = np.zeros(10)
    y_array[i-1] = 1
    y1_new.append(y_array)
y1_new = np.array(y1_new)
print(y1_new.shape)
# （500，10）

方法二，也可对y标签进行one hot编码，如下。

from sklearn.preprocessing import OneHotEncoder
encoder  = OneHotEncoder(sparse=False)
y1_new = encoder.fit_transform(y1)
print(y1_new.shape)

1.2.2 获取权重参数

我们已经向您提供了一组网络参数θ1和θ2，这些参数已经过我们的培训。它们存储在ex4weights.mat将由ex4.m加载到θ1和θ2。这些参数的维数是为第二层有25个单元和10个输出单元（对应于10位类）的神经网络确定的。

# 获取权重
path_theta = 'D:\编程\ex3weights.mat'
theta = loadmat(path_theta)
theta1 = theta['Theta1']  # (25, 401)
theta2 = theta['Theta2']  # (10, 26)
print(theta1.shape)
print(theta2.shape)

当我们使用高级优化方法来优化神经网络时，我们需要将多个参数矩阵展开，才能传入优化函数，然后再恢复形状。

# 序列化权重参数：将θ1和θ2合并转化为一维数组
def serialize(a,b):
    return np.append(a.flatten(),b.flatten())
theta_serialize = serialize(theta1,theta2)
print(theta_serialize.size)

# 解序列化权重参数
def deserialize(a,b):
    theta1 = serialize(a,b)[:25*401].reshape(25,401)
    theta2 = deserialize(a,b)[25*401:].reshape(10,26)
    return theta1,theta2

1.3 Feedforward and cost function

1.3.1 Feedforward

确保每层的单元数，注意输出时加一个偏置单元，s(1)=400+1，s(2)=25+1，s(3)=10。
在这里插入图片描述

# 前向传播
def sigmoid(z):
    return 1/(1+np.exp(-z))

def feedforword(x,theta):
    t1,t2 = deserialize(theta)
    # 前面已经插入过偏执单元，这里就不用插入了
    a1 = x
    z2 = a1 @ t1.T
    a2 = np.insert(sigmoid(z2),0,values=1,axis = 1)
    z3 = a2 @ t2.T
    a3 = sigmoid(z3)
    return a1,z2,a2,z3,a3

a1,z2,a2,z3,h = feedforword(x1_new,theta_serialize)
print(a1,z2,a2,z3,h)

1.3.2 Cost function

在这里插入图片描述

# 无正则化代价函数
def costfunction( theta,x,y):
    a1, z2, a2, z3, h = feedforword( theta,x)
    J = - np.sum(y * np.log(h) + (1-y) * np.log(1-h))/len(x)
    return J
print(costfunction(theta_serialize,x1_new,y1_new))
# 0.2876291651613189

1.4 Regularized cost function

在这里插入图片描述
注意不要将每层的偏置项正则化。

# 有正则化代价函数
def regular_costf(theta,x,y,lam):
    t1,t2 = deserialize(theta)
    # 注意这里是从 1 开始
    part1 = np.sum(np.power(t1[:,1:],2))
    part2 = np.sum(np.power(t2[:,1:],2))
    reg = lam * (part1+part2) / (2*len(x))
    return costfunction(theta,x,y) + reg

print(regular_costf(theta_serialize,x1_new,y1_new,1))
# 0.38376985909092365

2 Backpropagation 反向传播

在这部分练习中，您将实现反向传播算法来计算神经网络代价函数的梯度。您需要完成nnCostFunction.m，以便它为grad返回适当的值。一旦你计算了梯度，你就可以通过使用诸如fmincg之类的高级优化器最小化代价函数J（Θ）来训练神经网络。您将首先实现反向传播算法来计算（未规范化）神经网络参数的梯度。

2.1 Sigmoid gradient

在这里插入图片描述

# sigmoid gradient  对z求偏导
def sigmoid_gradient(z):
    return sigmoid(z) * (1-sigmoid(z))

2.2 Random initialization

当我们训练神经网络时，随机初始化参数是很重要的，可以打破数据的对称性。一个有效的策略是在均匀分布(−e，e)中随机选择值，我们可以选择 e = 0.12 这个范围的值来确保参数足够小，使得训练更有效率。

# 随机初始化
def random_init(size):
    # 服从均匀分布的范围中随机返回size大小的值
    return np.random.uniform(-0.12,0.12,size)

2.3 Backpropagation

在这里插入图片描述

目标：获取整个网络代价函数的梯度。以便在优化算法中求解。

这里面一定要理解正向传播和反向传播的过程，才能弄清楚各种参数在网络中的维度，切记。比如手写出每次传播的式子。

# 检查一下各参数的维度

print('a1', a1.shape,'t1', t1.shape)
print('z2', z2.shape)
print('a2', a2.shape, 't2', t2.shape)
print('z3', z3.shape)
print('a3', h.shape)

在这里插入图片描述

# 反向传播梯度下降
def gradient(theta,x,y):
    t1,t2 = deserialize(theta)
    a1, z2, a2, z3, h = feedforword(theta, x)
    d3 = h - y #(5000,10)
    d2 = d3 @ t2[:,1:] * sigmoid_gradient(z2) #(5000,25)
    D2 = d3.T @ a2  # (10,26)
    D1 = d2.T @ a1 # (25,401)
    D = (1 / len(x)) * serialize(D1,D2)  # (1025,)
    return D

2.4 Gradient checking 梯度检测

在你的神经网络,你是最小化代价函数J(Θ)。执行梯度检查你的参数,你可以想象展开参数Θ(1)Θ(2)成一个长向量θ。通过这样做,你能使用以下梯度检查过程。
在这里插入图片描述

# 梯度检测
def gradient_cheecking(theta,x,y,e):
    def a_numeric_grad(plus,minus):
        # 对每个参数theta计算数值梯度，即理论梯度
        return (costfunction(plus,x,y) - costfunction(minus,x,y)) / (e * 2)
    numeric_grad = []
    for i in range(len(theta)):
        plus = theta.copy()
        minus = theta.copy()
        plus[i] = plus[i] + e
        minus[i] = minus[i] - e
        grad_i = a_numeric_grad(plus,minus)
        numeric_grad.append(grad_i)
    numeric_grad = np.array(numeric_grad)
    analytic_grad = regularized_gradient(theta,x,y)
    
    diff = np.linalg.norm(numeric_grad - analytic_grad) / np.linalg.norm(numeric_grad + analytic_grad)
    print(
        'If your backpropagation implementation is correct,\nthe relative difference will be smaller than 10e-9 (assume epsilon=0.0001).\nRelative Difference: {}\n'.format(
            diff))

gradient_checking(theta, X, y, epsilon= 0.0001)#这个运行很慢，谨慎运行

2.5 Regularized Neural Networks

成功实现反向传播算法后，将向梯度添加正则化。为了说明正则化，可以在使用反向传播计算梯度之后将其作为附加项添加。
在这里插入图片描述
注意，你不应该正则化用来表示偏差项。此外，在参数中，i 从1开始索引，j 从0开始索引。

# 有正则化的梯度下降
def regularized_gradient(theta,x,y,lam):
    # 不惩罚偏执单元的参数
    D1,D2 = deserialize(gradient(theta,x,y))

    t1[:,0] = 0
    t2[:,0] = 0
    reg_D1 = D1 + (lam / len(x)) * t1
    reg_D2 = D2 + (lam / len(x)) * t2
    return serialize(reg_D1,reg_D2)

2.6 Learning parameters using fmincg

# 最优化函数
def nn_training(x,y):
    init_theta = random_init(10285) # 25*401+10*26
    result = opt.minimize(fun=regular_costf,x0 = init_theta,args=(x,y,1),method='TNC',jac = regularized_gradient,options={'maxiter':400})
    # 可选项：最大迭代次数，选择为300次
    return result

result = nn_training(x1_new,y1_new)
print(result)

在这里插入图片描述

# 计算精度
_, _, _, _, h = feedforword(result.x, x1_new)
y_pred = np.argmax(h, axis=1) + 1
print(classification_report(y1,y_pred))

在这里插入图片描述

# 方法二：计算精度
correct = [1 if a==b else 0 for (a,b) in zip(y1,y_pred)]
accuracy = sum(map(float,correct)) / float(len(correct))
print('accuracy = {}％'.format(accuracy * 100))

在这里插入图片描述

3 Visualizing the hidden layer

在这里插入图片描述

# 可视化隐藏层
def plot_hidden(theta):
    t1, _ = deserialize(theta)
    t1 = t1[:, 1:]  # （25，400）
    fig,ax_array = plt.subplots(5, 5, sharex=True, sharey=True, figsize=(6,6))
    for row in range(5):
        for column in range(5):
            ax_array[row, column].matshow(t1[row * 5 + column].reshape(20, 20), cmap='gray_r')
            plt.xticks([])
            plt.yticks([])
    plt.show()

plot_hidden(result.x)

在这里插入图片描述
以下为完整代码：

#coding=utf-8
import numpy as np
import pandas as pd
from scipy.io import loadmat
import scipy.optimize as opt
from matplotlib import pyplot as plt
from sklearn.metrics import classification_report  # 这个包是评价报告

def load_data(path):
    data = loadmat(path)
    x = data['X']
    y = data['y']
    return x,y

path_data = 'D:\编程\ex3data1.mat'

x1,y1= load_data(path_data)
print(x1.shape,y1.shape)
print(y1)



# # 随机画100个数字
# def plot_images(x):
#     pick_index = np.random.choice(x.shape[0],100)
#     pick_images = x[pick_index,:]  #(100,400)
#     fig, ax = plt.subplots(nrows=10, ncols=10, sharey=True, sharex=True, figsize=(8, 8))
#     for i in range(10):
#         for j in range(10):
#             # fig, ax = plt.subplots()等价于：
#             # fig = plt.figure()
#             # ax = fig.add_subplot(1, 1, 1)
#             ax[i,j].matshow(pick_images[10*i+j,:].reshape(20,20))
#             plt.xticks([])
#             plt.yticks([])
#     plt.show()
# print(plot_images(x1))


# 获取输入变量x
x1_new = np.insert(x1,0,values=1,axis=1)
print(x1_new.shape)



# 获取输入变量y
y1_new = []
for i in y1:
    y_array = np.zeros(10)
    y_array[i-1] = 1
    y1_new.append(y_array)
y1_new = np.array(y1_new)
print(y1_new.shape)

# 方法二：one hot 编码
from sklearn.preprocessing import OneHotEncoder
encoder  = OneHotEncoder(sparse=False)
y1_new = encoder.fit_transform(y1)
print(y1_new.shape)

# 获取权重
path_theta = 'D:\编程\ex3weights.mat'
theta = loadmat(path_theta)
theta1 = theta['Theta1']  # (25, 401)
theta2 = theta['Theta2']  # (10, 26)
print(theta1.shape)
print(theta2.shape)

# 序列化权重参数：将θ1和θ2合并转化为一维数组
def serialize(a,b):
    return np.append(a.flatten(),b.flatten())
theta_serialize = serialize(theta1,theta2)
print(theta_serialize.size)

# 解序列化权重参数
def deserialize(theta_serialize):
    theta1 = theta_serialize[:25*401].reshape(25,401)
    theta2 = theta_serialize[25*401:].reshape(10,26)
    return theta1,theta2

# 前向传播
def sigmoid(z):
    return 1/(1+np.exp(-z))

def feedforword(theta,x):
    t1,t2 = deserialize(theta)
    # 前面已经插入过偏执单元，这里就不用插入了
    a1 = x
    z2 = a1 @ t1.T
    a2 = np.insert(sigmoid(z2),0,values=1,axis = 1)
    z3 = a2 @ t2.T
    a3 = sigmoid(z3)
    return a1,z2,a2,z3,a3

a1,z2,a2,z3,h = feedforword(theta_serialize,x1_new,)
print(a1,z2,a2,z3,h)


# 无正则化代价函数
def costfunction( theta,x,y):
    a1, z2, a2, z3, h = feedforword( theta,x)
    J = - np.sum(y * np.log(h) + (1-y) * np.log(1-h))/len(x)
    return J
print(costfunction(theta_serialize,x1_new,y1_new))


# 有正则化代价函数
def regular_costf(theta,x,y,lam):
    t1,t2 = deserialize(theta)
    # 注意这里是从1 开始
    part1 = np.sum(np.power(t1[:,1:],2))
    part2 = np.sum(np.power(t2[:,1:],2))
    reg = lam * (part1+part2) / (2*len(x))
    return costfunction(theta,x,y) + reg
print(regular_costf(theta_serialize,x1_new,y1_new,1))

# 反向传播
# sigmoid gradient  对z求偏导
def sigmoid_gradient(z):
    return sigmoid(z) * (1-sigmoid(z))

# 随机初始化
def random_init(size):
    # 服从均匀分布的范围中随机返回size大小的值
    return np.random.uniform(-0.12,0.12,size)


# 检查一下各参数的维度
t1,t2 = deserialize(theta_serialize)
print('a1', a1.shape,'t1', t1.shape)
print('z2', z2.shape)
print('a2', a2.shape, 't2', t2.shape)
print('z3', z3.shape)
print('a3', h.shape)


# 反向传播梯度下降
def gradient(theta,x,y):
    t1,t2 = deserialize(theta)
    a1, z2, a2, z3, h = feedforword(theta, x)
    d3 = h - y #(5000,10)
    d2 = d3 @ t2[:,1:] * sigmoid_gradient(z2) #(5000,25)
    D2 = d3.T @ a2  # (10,26)
    D1 = d2.T @ a1 # (25,401)
    D = (1 / len(x)) * serialize(D1,D2)  # (1025,)
    return D


# 梯度检测
def gradient_cheecking(theta,x,y,e):
    def a_numeric_grad(plus,minus):
        # 对每个参数theta计算数值梯度，即理论梯度
        return (costfunction(plus,x,y) - costfunction(minus,x,y)) / (e * 2)
    numeric_grad = []
    for i in range(len(theta)):
        plus = theta.copy()
        minus = theta.copy()
        plus[i] = plus[i] + e
        minus[i] = minus[i] - e
        grad_i = a_numeric_grad(plus,minus)
        numeric_grad.append(grad_i)
    numeric_grad = np.array(numeric_grad)
    analytic_grad = regularized_gradient(theta,x,y)

    diff = np.linalg.norm(numeric_grad - analytic_grad) / np.linalg.norm(numeric_grad + analytic_grad)
    print(
        'If your backpropagation implementation is correct,\nthe relative difference will be smaller than 10e-9 (assume epsilon=0.0001).\nRelative Difference: {}\n'.format(
            diff))


# 有正则化的梯度下降
def regularized_gradient(theta,x,y,lam):
    # 不惩罚偏执单元的参数
    D1,D2 = deserialize(gradient(theta,x,y))
    t1[:,0] = 0
    t2[:,0] = 0
    reg_D1 = D1 + (lam / len(x)) * t1
    reg_D2 = D2 + (lam / len(x)) * t2
    return serialize(reg_D1,reg_D2)

# 最优化函数
def nn_training(x,y):
    init_theta = random_init(10285) # 25*401+10*26
    result = opt.minimize(fun=regular_costf,x0 = init_theta,args=(x,y,1),method='TNC',jac = regularized_gradient,options={'maxiter':400})
    return result
result = nn_training(x1_new,y1_new)
print(result)




# 计算精度
_, _, _, _, h = feedforword(result.x, x1_new)
y_pred = np.argmax(h, axis=1) + 1
print(classification_report(y1,y_pred))

# 可视化隐藏层
def plot_hidden(theta):
    t1, _ = deserialize(theta)
    t1 = t1[:, 1:]  # （25，400）
    fig,ax_array = plt.subplots(5, 5, sharex=True, sharey=True, figsize=(6,6))
    for row in range(5):
        for column in range(5):
            ax_array[row, column].matshow(t1[row * 5 + column].reshape(20, 20), cmap='gray_r')
            plt.xticks([])
            plt.yticks([])
    plt.show()

plot_hidden(result.x)

wssssang

关注

0
点赞
踩
13

收藏

觉得还不错? 一键收藏
0
评论
吴恩达机器学习ex4:BP神经网络

吴恩达机器学习作业四：BP神经网络1.1 Visualizing the data这与上一个练习中使用的数据集相同。ex3data1.mat中有5000个训练示例，其中每个训练示例是数字的20像素乘20像素灰度图像。每个像素由一个浮点数表示，表示该位置的灰度强度。像素的20乘20网格被“展开”成400维向量。这些训练示例中的每一个都成为我们的数据矩阵X中的一行。这给了我们一个5000×400的矩阵X，其中每一行都是手写数字图像的训练示例。训练集的第二部分是包含训练集标签的5000维向量y。为了使事
复制链接

扫一扫