Python 手写 BP神经网络

本文详细介绍了误差反向传播(BP)算法在神经网络中的应用,包括计算均方误差、参数调整过程以及代码实现,重点展示了如何利用梯度下降策略更新权重和偏置,以优化模型性能。
摘要由CSDN通过智能技术生成

误差反向传播算法

输出层

对训练例 ( x k , y k ) \left(\boldsymbol{x}_{k}, \boldsymbol{y}_{k}\right) (xk,yk), 假定神经网络的输出为 y ^ k = ( y ^ 1 k , y ^ 2 k , … , y ^ l k ) \hat{\boldsymbol{y}}_{k}=\left(\hat{y}_{1}^{k}, \hat{y}_{2}^{k}, \ldots, \hat{y}_{l}^{k}\right) y^k=(y^1k,y^2k,,y^lk), 即
y ^ j k = f ( β j − θ j ) , \hat{y}_{j}^{k}=f\left(\beta_{j}-\theta_{j}\right), y^jk=f(βjθj),
则网络在 ( x k , y k ) \left(\boldsymbol{x}_{k}, \boldsymbol{y}_{k}\right) (xk,yk) 上的均方误差为
E k = 1 2 ∑ j = 1 l ( y ^ j k − y j k ) 2 . E_{k}=\frac{1}{2} \sum_{j=1}^{l}\left(\hat{y}_{j}^{k}-y_{j}^{k}\right)^{2} . Ek=21j=1l(y^jkyjk)2.
BP 算法基于梯度下降(gradient descent)策略, 以目标的负梯度方向对参数 进行调整. 对误差 E k E_{k} Ek, 给定学习率 η \eta η, 有
Δ w h j = − η ∂ E k ∂ w h j . \Delta w_{h j}=-\eta \frac{\partial E_{k}}{\partial w_{h j}} . Δwhj=ηwhjEk.

注意到 w h j w_{h j} whj 先影响到第 j j j 个输出层神经元的输入值 β j \beta_{j} βj, 再影响到其输出值 y ^ j k \hat{y}_{j}^{k} y^jk, 然后影响到 E k E_{k} Ek,那么根据链式法则有,
∂ E k ∂ w h j = ∂ E k ∂ y ^ j k ⋅ ∂ y ^ j k ∂ β j ⋅ ∂ β j ∂ w h j . \frac{\partial E_{k}}{\partial w_{h j}}=\frac{\partial E_{k}}{\partial \hat{y}_{j}^{k}} \cdot \frac{\partial \hat{y}_{j}^{k}}{\partial \beta_{j}} \cdot \frac{\partial \beta_{j}}{\partial w_{h j}} . whjEk=y^jkEkβjy^jkwhjβj.
因为有 β j = ∑ h = 1 q w h j b h \beta_{j}= \sum\limits_{h=1}^{q} w_{hj}b_{h} βj=h=1qwhjbh
我们将 β j \beta_j βj抽象为斜率为 b h b_h bh的一条直线,那么自然有
∂ β j ∂ w h j = b h . \frac{\partial \beta_{j}}{\partial w_{h j}}=b_{h} . whjβj=bh.

g j = − ∂ E k ∂ y ^ j k ⋅ ∂ y ^ j k ∂ β j g_{j} =-\frac{\partial E_{k}}{\partial \hat{y}_{j}^{k}} \cdot \frac{\partial \hat{y}_{j}^{k}}{\partial \beta_{j}} gj=y^jkEkβjy^jk
= − ( y ^ j k − y j k ) f ′ ( β j − θ j ) =-\left(\hat{y}_{j}^{k}-y_{j}^{k}\right) f^{\prime}\left(\beta_{j}-\theta_{j}\right) =(y^jkyjk)f(βjθj)
= y ^ j k ( 1 − y ^ j k ) ( y j k − y ^ j k ) . =\hat{y}_{j}^{k}\left(1-\hat{y}_{j}^{k}\right)\left(y_{j}^{k}-\hat{y}_{j}^{k}\right) . =y^jk(1y^jk)(yjky^jk).

结合上式得 Δ w h j \Delta w_{h j} Δwhj
Δ w h j = η g j b h . \Delta w_{h j}=\eta g_{j} b_{h} . Δwhj=ηgjbh.
类似可得

Δ θ j = − η g j , \Delta \theta_{j} =-\eta g_{j}, Δθj=ηgj,

Δ γ h = − η e h , \Delta \gamma_{h} =-\eta e_{h}, Δγh=ηeh,

隐藏层

同理得出
e h =   − ∂ E k ∂ b h ⋅ ∂ b h ∂ α h =   − ∑ j = 1 l ∂ E k ∂ β j ⋅ ∂ β j ∂ b h f ′ ( α h − γ h ) =   f ′ ( α h − γ h ) ∑ j = 1 l w h j g j \begin{equation} \begin{split} e_{h} = &\ -\frac{\partial E_{k}}{\partial b_{h}} \cdot \frac{\partial b_{h}}{\partial \alpha_{h}} \\ =&\ -\sum_{j=1}^{l} \frac{\partial E_{k}}{\partial \beta_{j}} \cdot \frac{\partial \beta_{j}}{\partial b_{h}} f^{\prime}\left(\alpha_{h}-\gamma_{h}\right)\\ = &\ f^{\prime}\left(\alpha_{h}-\gamma_{h}\right)\sum\limits_{j=1}^{l}w_{hj}g_{j} \end{split} \end{equation} eh=== bhEkαhbh j=1lβjEkbhβjf(αhγh) f(αhγh)j=1lwhjgj
Δ v i h = η e h x i , \Delta v_{i h} =\eta e_{h} x_{i}, Δvih=ηehxi,
Δ γ h = − η e h , \Delta \gamma_{h} =-\eta e_{h}, Δγh=ηeh,

代码实现

from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
from sklearn import datasets


iris = datasets.load_iris()
data = iris.data
target = iris.target

class NeuralNetwork:
    def __init__(self, in_size, o_size, h_size):
        # 初始化层的数量
        self.in_size = in_size
        self.o_size = o_size
        self.h_size = h_size
        
        self.W1 = np.random.randn(in_size, h_size) # n x b的矩阵
        self.W2 = np.random.randn(h_size, o_size) # b x k的矩阵
        
    def sigmod(self, x):
        return 1 / (1 + np.exp(-x))
    
    # 映射函数,将连续值变成离散值
    def ref(self, x):
        if x <= (1 / 3):
            return 0
        elif x <= (2 / 3):
            return 1
        else:
            return 2
        
    # 设输入X为 m x n的矩阵
    def forward(self, X):
        vec_rule = np.vectorize(self.ref)
        self.z2 = np.dot(X, self.W1) # m x b
        self.act2 = self.sigmod(self.z2)
        self.z3 = np.dot(self.act2, self.W2)# m x k
        self.y_hat = self.sigmod(self.z3)
        self.y_hat = vec_rule(self.y_hat)
        
        return self.y_hat
    # 设y为 m x k 的矩阵
    def backward(self, X, y, y_hat, leraning_rate):
        # 算出输出层的梯度顶
        Grd_1 = (y - y_hat) *  self.sigmod(self.z3) * (1 - self.sigmod(self.z3)) # m x k
        # 输出层的Δ值
        Delta_W2 = np.dot(self.act2.T, Grd_1) # b x k
        # 隐藏层的梯度顶
        Grd_2 = np.dot(Grd_1, self.W2.T) * self.sigmod(self.z2) * (1 - self.sigmod(self.z2)) # m x b
        # 隐藏层的Δ值
        Delta_W1 = np.dot(X.T, Grd_2) # n x b
        
        # 更新权值
        self.W1 += leraning_rate * Delta_W1
        self.W2 += leraning_rate * Delta_W2
        
    def tarin(self, X, y, learning_rate, num_epochs):
        # 检查形状
        if(X.shape[0] != y.shape[0]):
            return -1;
        for i in range(1, num_epochs + 1):
            y_hat = self.forward(X)
            self.backward(X, y, self.y_hat, learning_rate)
        # 输出均方误差
            loss = np.mean((y - y_hat) ** 2)
            print(f"loss = {loss}, epochs/num_epochs:{i}/{num_epochs}")
    def predict(self, X):
        y_pred = self.forward(X)
        return y_pred
        


注: 部分公式来自周志华的西瓜书

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
当然可以!下面是一个简单的Python代码实现BP神经网络的示例: ```python import numpy as np class NeuralNetwork: def __init__(self, input_size, hidden_size, output_size): self.input_size = input_size self.hidden_size = hidden_size self.output_size = output_size # 初始化权重和偏置 self.W1 = np.random.randn(self.input_size, self.hidden_size) self.b1 = np.zeros((1, self.hidden_size)) self.W2 = np.random.randn(self.hidden_size, self.output_size) self.b2 = np.zeros((1, self.output_size)) def forward(self, X): # 前向传播 self.z1 = np.dot(X, self.W1) + self.b1 self.a1 = self.sigmoid(self.z1) self.z2 = np.dot(self.a1, self.W2) + self.b2 self.a2 = self.sigmoid(self.z2) return self.a2 def backward(self, X, y, learning_rate): # 反向传播 m = X.shape # 计算输出层的误差 delta2 = self.a2 - y # 计算隐藏层的误差 delta1 = np.dot(delta2, self.W2.T) * self.sigmoid_derivative(self.a1) # 更新权重和偏置 dW2 = np.dot(self.a1.T, delta2) / m db2 = np.sum(delta2, axis=0) / m dW1 = np.dot(X.T, delta1) / m db1 = np.sum(delta1, axis=0) / m self.W2 -= learning_rate * dW2 self.b2 -= learning_rate * db2 self.W1 -= learning_rate * dW1 self.b1 -= learning_rate * db1 def train(self, X, y, epochs, learning_rate): for i in range(epochs): output = self.forward(X) self.backward(X, y, learning_rate) def predict(self, X): return np.round(self.forward(X)) def sigmoid(self, x): return 1 / (1 + np.exp(-x)) def sigmoid_derivative(self, x): return x * (1 - x) ``` 使用示例: ```python # 创建一个2-3-1的神经网络 nn = NeuralNetwork(2, 3, 1) # 输入数据 X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) y = np.array([, , , ]) # 训练神经网络 nn.train(X, y, epochs=10000, learning_rate=0.1) # 预测结果 print(nn.predict(X)) ``` 这是一个简单的BP神经网络的实现,其中包括前向传播、反向传播、训练和预测等功能。你可以根据自己的需求进行修改和扩展。希望对你有帮助!
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值