BP神经网络代码实现及反向传播推导过程

KW.KW

已于 2023-07-28 21:00:08 修改

阅读量153

点赞数

文章标签：机器学习 python 神经网络

于 2023-07-28 19:02:36 首次发布

本文链接：https://blog.csdn.net/m0_67782255/article/details/131837850

版权

该文详细介绍了如何在Python中实现神经网络的反向传播算法，包括数据预处理、前向传播、损失函数计算、正则化、权重的序列化与解序列化以及梯度计算。通过吴恩达的课程练习数据，演示了神经网络的训练过程和隐藏层权重的可视化。

摘要由CSDN通过智能技术生成

文章目录

一、反向传播的推导过程
二、使用步骤

一、反向传播的推导过程

在这里插入图片描述

二、使用步骤

1.引入库

代码如下（示例）：

import numpy as np
import scipy.io as sio
from scipy.optimize import minimize
import  matplotlib.pyplot as plt

2.读入数据

代码如下（示例）：

data=sio.loadmat('E:\AAA\深度学习\吴恩达神经网络模型实现\exerise3\ex3data1.mat')
raw_X=data['X']
raw_y=data['y']

#%%
X=np.insert(raw_X,0,values=1,axis=1)
X.shape

该处使用的url网络请求的数据。

3.对Y进行独热编码处理：one-hot编码

代码如下（示例）：

def one_hot_encoder(raw_y):
    result=[]#定义一个空列表，用于储存编码后的结果
    for i in raw_y:#对输入的raw_y列表中的每个值进行遍历
        y_temp=np.zeros(10)#创建一个长度为10的全零数组y_temp，用于表示对应类别的编码。
        y_temp[i-1]=1#将y_temp数组中下标为(i-1)的位置设为1，实现了对应类别的one-hot编码。

        result.append(y_temp)#将result列表转换为NumPy数组并返回。
    return np.array(result)


#%%
y=one_hot_encoder(raw_y)
y.shape
#%%
theta=sio.loadmat('E:\AAA\深度学习\吴恩达神经网络模型实现\exerise3\ex3weights.mat')
#%%
theta1,theta2=theta['Theta1'],theta['Theta2']
theta1.shape,theta2.shape

4.序列化权重参数和解序列化权重参数

代码如下（示例）：

def serialize(a,b):
    return np.append(a.flatten(),b.flatten())
#在函数内部，使用np.append()函数将a和b展平（flatten），然后将展平后的结果拼接在一起，返回一个一维数组。这样做的目的是将多维数组序列化为一维数组。
#%%
theta_serialize=serialize(theta1,theta2)
theta_serialize.shape
def deserialize(theta_serialize):
    theta1=theta_serialize[:25*401].reshape(25,401)
    #将theta_serialize数组的前25*401个元素切片出来，并使用reshape()函数将其恢复为维度为(25, 401)的theta1数组。这里假设theta1原本是一个25x401的数组。
    theta2=theta_serialize[25*401:].reshape(10,26)
    return theta1,theta2
#%%
theta1,theta2=deserialize(theta_serialize)
#%%
theta1.shape,theta2.shape

5.前向传播

def sigmoid(z):
    return 1/(1+np.exp(-z))
#%%
def feed_forward(theta_serialize,X):
    theta1,theta2=deserialize(theta_serialize)
    a1=X
    z2=a1@theta1.T
    a2=sigmoid(z2)
    a2=np.insert(a2,0,1,axis=1)
    z3=a2@theta2.T
    h=sigmoid(z3)
    return  a1,z2,a2,z3,h
#%% md

6.损失函数

1.不带正则化的损失函数
2.带正则化
代码如下（示例）：

def cost(theta_serialize,X,y):
    a1,z2,a2,z3,h=feed_forward(theta_serialize,X)
    J=-np.sum(y*np.log(h)+(1-y)*np.log(1-h))/len(X)
    return J
#%% md
2.带正则化
#%%
def reg_cost(theta_serialize,X,y,lamda):
    sum1=np.sum(np.power(theta1[:,1:],2))
    sum2=np.sum(np.power(theta2[:,1:],2))
    reg=lamda*(sum1+sum2)/(2*len(X))
    return reg+cost(theta_serialize,X,y)
#%%
lamda=1
reg=reg_cost(theta_serialize,X,y,lamda)

7.反向传播

def sigmoid_gradient(z):
    return sigmoid(z)*(1-sigmoid(z))
#%%
def gradient(theta_serialize,X,y):
    theta1,theta2=deserialize(theta_serialize)
    a1,z2,a2,z3,h=feed_forward(theta_serialize,X)
    d3=h-y
    d2=d3@theta2[:,1:]*sigmoid_gradient(z2)
    D2=(d3.T@ a2)/len(X)
    D1=(d2.T@ a1)/len(X)
    return serialize(D1,D2)

#%% md
**带正则化的梯度**
#%%f
def reg_gradint(theta_serialize,X,y,lamda):
   D=gradient(theta_serialize,X,y)
   D1,D2=deserialize(D)
   theta1,theta2=deserialize(theta_serialize)
   D1[:,1:]=D1[:,1:]+theta1[:,1:]*lamda/len(X)
   D2[:,1:]=D2[:,1:]+theta2[:,1:]*lamda/len(X)
   return serialize(D1,D2)

#%% md
**优化**
#%%
from  scipy.optimize import  minimize
def nn_training(X,y):
    init_theta=np.random.uniform(-0.5,0.5,10285)
    res=minimize(fun=cost,
                 x0=init_theta,
                 args=(X,y),
                 method='TNC',
                 jac=gradient,
                 options={'maxiter':300})
    return res
#%%

res=nn_training(X,y)
raw_y=data['y'].reshape(5000,)#它的作用是将数组 data['y'] 重塑为形状为 (5000,) 的一维数组 raw_y。

#%%
_,_,_,_,h=feed_forward(res.x,X)#您调用了feed_forward函数，并传入了训练后的参数res.x和输入数据X作为参数。函数返回多个值，通过使用下划线 _ 忽略前四个返回值，而将第五个返回值赋值给变量 h
y_pred=np.argmax(h,axis=1)+1#使用np.argmax函数在h数组的第一个轴（行）上找到最大值的索引，然后通过+ 1操作将索引值转换为类别标签。将结果存储在y_pred变量中。
acc=np.mean(y_pred==raw_y)
acc

#%% md
**可视化隐藏层**
#%%
def plot_hidden_layer(theta):
    theta1,_=deserialize(theta)
    hidden_layer=theta1[:,1:]
    fig,ax=plt.subplots(nrows=5,ncols=5,figsize=(8,8),sharex=True,sharey=True)
    for r in range(5):
        for c in range(5):
            ax[r,c].imshow(hidden_layer[5*r+c].reshape(20,20).T,cmap='gray_r')
    plt.xticks([])
    plt.yticks([])
    plt.show
    #下面是对代码的解释：
#     theta1, _ = deserialize(theta)：函数的参数 theta 应该是一个序列化的参数，这段代码将通过调用 deserialize 函数将其反序列化为 theta1 和 _。
# hidden_layer = theta1[:,1:]：从 theta1 中提取隐藏层的权重，并将结果保存在 hidden_layer 变量中。theta1 是一个包含权重的数组，每一行代表一个隐藏层神经元的权重。
# fig, ax = plt.subplots(nrows=5, ncols=5, figsize=(8,8), sharex=True, sharey=True)：创建一个 5x5 的子图网格，用于显示隐藏层的特征图像。nrows 和 ncols 指定了子图的行数和列数，figsize 指定了整个图像的大小，sharex 和 sharey 则表示所有子图共享 x 轴和 y 轴刻度。
# for r in range(5): for c in range(5): ax[r,c].imshow(hidden_layer[5*r+c].reshape(20,20).T, cmap='gray_r')：通过循环遍历 5x5 的子图网格，对每个子图调用 imshow 函数，以显示对应隐藏层神经元的特征图像。hidden_layer[5*r+c] 表示要显示的特征权重，reshape(20,20).T 将其重塑为 20x20 的二维数组，并使用 cmap='gray_r' 指定灰度颜色图。
# plt.xticks([]), plt.yticks([])：隐藏 x 轴和 y 轴的刻度。
# plt.show()：显示图形。
#%%
plot_hidden_layer(res.x)
#%% md