Python 手写 BP神经网络

误差反向传播算法

输出层

对训练例 ( x k , y k ) \left(\boldsymbol{x}_{k}, \boldsymbol{y}_{k}\right) (xk,yk), 假定神经网络的输出为 y ^ k = ( y ^ 1 k , y ^ 2 k , … , y ^ l k ) \hat{\boldsymbol{y}}_{k}=\left(\hat{y}_{1}^{k}, \hat{y}_{2}^{k}, \ldots, \hat{y}_{l}^{k}\right) y^k=(y^1k,y^2k,,y^lk), 即
y ^ j k = f ( β j − θ j ) , \hat{y}_{j}^{k}=f\left(\beta_{j}-\theta_{j}\right), y^jk=f(βjθj),
则网络在 ( x k , y k ) \left(\boldsymbol{x}_{k}, \boldsymbol{y}_{k}\right) (xk,yk) 上的均方误差为
E k = 1 2 ∑ j = 1 l ( y ^ j k − y j k ) 2 . E_{k}=\frac{1}{2} \sum_{j=1}^{l}\left(\hat{y}_{j}^{k}-y_{j}^{k}\right)^{2} . Ek=21j=1l(y^jkyjk)2.
BP 算法基于梯度下降(gradient descent)策略, 以目标的负梯度方向对参数 进行调整. 对误差 E k E_{k} Ek, 给定学习率 η \eta η, 有
Δ w h j = − η ∂ E k ∂ w h j . \Delta w_{h j}=-\eta \frac{\partial E_{k}}{\partial w_{h j}} . Δwhj=ηwhjEk.

注意到 w h j w_{h j} whj 先影响到第 j j j 个输出层神经元的输入值 β j \beta_{j} βj, 再影响到其输出值 y ^ j k \hat{y}_{j}^{k} y^jk, 然后影响到 E k E_{k} Ek,那么根据链式法则有,
∂ E k ∂ w h j = ∂ E k ∂ y ^ j k ⋅ ∂ y ^ j k ∂ β j ⋅ ∂ β j ∂ w h j . \frac{\partial E_{k}}{\partial w_{h j}}=\frac{\partial E_{k}}{\partial \hat{y}_{j}^{k}} \cdot \frac{\partial \hat{y}_{j}^{k}}{\partial \beta_{j}} \cdot \frac{\partial \beta_{j}}{\partial w_{h j}} . whjEk=y^jkEkβjy^jkwhjβj.
因为有 β j = ∑ h = 1 q w h j b h \beta_{j}= \sum\limits_{h=1}^{q} w_{hj}b_{h} βj=h=1qwhjbh
我们将 β j \beta_j βj抽象为斜率为 b h b_h bh的一条直线,那么自然有
∂ β j ∂ w h j = b h . \frac{\partial \beta_{j}}{\partial w_{h j}}=b_{h} . whjβj=bh.

g j = − ∂ E k ∂ y ^ j k ⋅ ∂ y ^ j k ∂ β j g_{j} =-\frac{\partial E_{k}}{\partial \hat{y}_{j}^{k}} \cdot \frac{\partial \hat{y}_{j}^{k}}{\partial \beta_{j}} gj=y^jkEkβjy^jk
= − ( y ^ j k − y j k ) f ′ ( β j − θ j ) =-\left(\hat{y}_{j}^{k}-y_{j}^{k}\right) f^{\prime}\left(\beta_{j}-\theta_{j}\right) =(y^jkyjk)f(βjθj)
= y ^ j k ( 1 − y ^ j k ) ( y j k − y ^ j k ) . =\hat{y}_{j}^{k}\left(1-\hat{y}_{j}^{k}\right)\left(y_{j}^{k}-\hat{y}_{j}^{k}\right) . =y^jk(1y^jk)(yjky^jk).

结合上式得 Δ w h j \Delta w_{h j} Δwhj
Δ w h j = η g j b h . \Delta w_{h j}=\eta g_{j} b_{h} . Δwhj=ηgjbh.
类似可得

Δ θ j = − η g j , \Delta \theta_{j} =-\eta g_{j}, Δθj=ηgj,

Δ γ h = − η e h , \Delta \gamma_{h} =-\eta e_{h}, Δγh=ηeh,

隐藏层

同理得出
e h =   − ∂ E k ∂ b h ⋅ ∂ b h ∂ α h =   − ∑ j = 1 l ∂ E k ∂ β j ⋅ ∂ β j ∂ b h f ′ ( α h − γ h ) =   f ′ ( α h − γ h ) ∑ j = 1 l w h j g j \begin{equation} \begin{split} e_{h} = &\ -\frac{\partial E_{k}}{\partial b_{h}} \cdot \frac{\partial b_{h}}{\partial \alpha_{h}} \\ =&\ -\sum_{j=1}^{l} \frac{\partial E_{k}}{\partial \beta_{j}} \cdot \frac{\partial \beta_{j}}{\partial b_{h}} f^{\prime}\left(\alpha_{h}-\gamma_{h}\right)\\ = &\ f^{\prime}\left(\alpha_{h}-\gamma_{h}\right)\sum\limits_{j=1}^{l}w_{hj}g_{j} \end{split} \end{equation} eh=== bhEkαhbh j=1lβjEkbhβjf(αhγh) f(αhγh)j=1lwhjgj
Δ v i h = η e h x i , \Delta v_{i h} =\eta e_{h} x_{i}, Δvih=ηehxi,
Δ γ h = − η e h , \Delta \gamma_{h} =-\eta e_{h}, Δγh=ηeh,

代码实现

from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
from sklearn import datasets


iris = datasets.load_iris()
data = iris.data
target = iris.target

class NeuralNetwork:
    def __init__(self, in_size, o_size, h_size):
        # 初始化层的数量
        self.in_size = in_size
        self.o_size = o_size
        self.h_size = h_size
        
        self.W1 = np.random.randn(in_size, h_size) # n x b的矩阵
        self.W2 = np.random.randn(h_size, o_size) # b x k的矩阵
        
    def sigmod(self, x):
        return 1 / (1 + np.exp(-x))
    
    # 映射函数,将连续值变成离散值
    def ref(self, x):
        if x <= (1 / 3):
            return 0
        elif x <= (2 / 3):
            return 1
        else:
            return 2
        
    # 设输入X为 m x n的矩阵
    def forward(self, X):
        vec_rule = np.vectorize(self.ref)
        self.z2 = np.dot(X, self.W1) # m x b
        self.act2 = self.sigmod(self.z2)
        self.z3 = np.dot(self.act2, self.W2)# m x k
        self.y_hat = self.sigmod(self.z3)
        self.y_hat = vec_rule(self.y_hat)
        
        return self.y_hat
    # 设y为 m x k 的矩阵
    def backward(self, X, y, y_hat, leraning_rate):
        # 算出输出层的梯度顶
        Grd_1 = (y - y_hat) *  self.sigmod(self.z3) * (1 - self.sigmod(self.z3)) # m x k
        # 输出层的Δ值
        Delta_W2 = np.dot(self.act2.T, Grd_1) # b x k
        # 隐藏层的梯度顶
        Grd_2 = np.dot(Grd_1, self.W2.T) * self.sigmod(self.z2) * (1 - self.sigmod(self.z2)) # m x b
        # 隐藏层的Δ值
        Delta_W1 = np.dot(X.T, Grd_2) # n x b
        
        # 更新权值
        self.W1 += leraning_rate * Delta_W1
        self.W2 += leraning_rate * Delta_W2
        
    def tarin(self, X, y, learning_rate, num_epochs):
        # 检查形状
        if(X.shape[0] != y.shape[0]):
            return -1;
        for i in range(1, num_epochs + 1):
            y_hat = self.forward(X)
            self.backward(X, y, self.y_hat, learning_rate)
        # 输出均方误差
            loss = np.mean((y - y_hat) ** 2)
            print(f"loss = {loss}, epochs/num_epochs:{i}/{num_epochs}")
    def predict(self, X):
        y_pred = self.forward(X)
        return y_pred
        


注: 部分公式来自周志华的西瓜书

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值