误差反向传播法求梯度全过程

最新推荐文章于 2024-07-21 12:40:56 发布

政通人和本通

最新推荐文章于 2024-07-21 12:40:56 发布

阅读量197

点赞数 2

文章标签：人工智能 python 深度学习神经网络

本文链接：https://blog.csdn.net/weixin_51262110/article/details/132814784

版权

写在前面

hello,hello~我又来刷刷存在感咯，最近学习鱼书学到了误差反向传播法，就做一个梳理并把代码整理一下。

本文章是以2层神经网络为例，同时使用了层的概念，分别包含Affine层，激活函数层（本文章使用的激活函数时Relu函数），Softmax-with-Loss层。接下来我们先介绍各个层的代码（代码的复用性很高哦，可放心食用），然后还是构架一个2层的神经网络，并通过反向传播法求梯度最终成功完成分类。

本文章使用的数据集是MNIST手写数字图像数字集。

各个层的作用和代码实现

激活函数层

激活函数采用Relu激活函数，使用Relu函数是为了引入非线性因素。Relu函数的Python实现。

#激活函數的實現之Relu层
class Relu:
    def __init__(self):
        self.mask = None
    def forward(self,x):
        self.mask = (x<=0)
        # print(self.mask)
        out = x.copy()
        # print(out)
        out[self.mask] =0
        # print(out)
        return out
    def backward(self,dout):
        dout[self.mask] = 0
        print(dout)
        dx =dout
        return dx

Affine层

就是计算加权信号的总和，对应矩阵相乘的部分，代码如下（包括了张量）

class Affine:
    def __init__(self, W, b):
        self.W = W
        self.b = b


        self.x = None
        self.original_x_shape = None
        # 权重和偏置参数的导数
        self.dW = None
        self.db = None


    def forward(self, x):
        # 对应张量
        self.original_x_shape = x.shape
        x = x.reshape(x.shape[0], -1)
        self.x = x


        out = np.dot(self.x, self.W) + self.b


        return out

Softmax-with-Loss层

该层的作用主要就是把输出值正规化然后输出然后计算关于交叉熵误差损失函数的梯度

class SoftmaxWithLoss:
    def __init__(self):
        self.loss = None #损失
        self.y = None   #softmax的输出
        self.t = None   #监督数据（one-hot vector）
    def forward(self,x,t):
        self.t = t
        self.y = softmax(t)
        self.loss = cross_entropy_error(self.y,self.t)
        return self.loss
    def backward(self,dout = 1):
        batch_size = self.t.shape[0]
        dx = (self.y-self.t)/batch_size
        return dx

简单的二层神经网络使用误差反向传播法的例子

import numpy as np
from mnist import load_mnist
import matplotlib.pyplot as plt
#激活函數的實現之Sigmoid函数
class Sigmoid:
    def __init__(self):
        self.out = None
    def forward(self,x):
        out = 1/(1+np.exp(-x))
        self.out = out
        return out
    def backward(self,dout):
        dx =dout*(1.0-self.out)*self.out
        return dx
class Affine:
    def __init__(self,W,b):
        self.W = W
        self.b = b
        self.x = None
        self.dW = None
        self.db = None
    def forward(self,x):
        self.x = x
        out = np.dot(x,self.W)+self.b
        return out
    def backward(self,dout):
        dx =np.dot(dout,self.W.T)
        self.dw = np.dot(self.x.T,dout)
        self.db = np.sum(dout,axis = 0)

        return dx


class Affine:
    def __init__(self, W, b):
        self.W = W
        self.b = b

        self.x = None
        self.original_x_shape = None
        # 权重和偏置参数的导数
        self.dW = None
        self.db = None

    def forward(self, x):
        # 对应张量
        self.original_x_shape = x.shape
        x = x.reshape(x.shape[0], -1)
        self.x = x

        out = np.dot(self.x, self.W) + self.b

        return out
class SoftmaxWithLoss:
    def __init__(self):
        self.loss = None #损失
        self.y = None   #softmax的输出
        self.t = None   #监督数据（one-hot vector）
    def forward(self,x,t):
        self.t = t
        self.y = softmax(t)
        self.loss = cross_entropy_error(self.y,self.t)
        return self.loss
    def backward(self,dout = 1):
        batch_size = self.t.shape[0]
        dx = (self.y-self.t)/batch_size
        return dx


# 读入数据
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

iters_num = 10000
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]

    # 梯度
    # grad = network.numerical_gradient(x_batch, t_batch)
    grad = network.gradient(x_batch, t_batch)

    # 更新
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]

    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)

    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print(train_acc, test_acc)

通过跟数值微分的方法对比，这太快了，关与例子中各个方法的作用我这里没有给出，大家可以参考SGD学习算法的实现，大同小异。希望本文对您有用~~

政通人和本通

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
误差反向传播法求梯度全过程

hello,hello~我又来刷刷存在感咯，最近学习鱼书学到了误差反向传播法，就做一个梳理并把代码整理一下。本文章是以2层神经网络为例，同时使用了层的概念，分别包含Affine层，激活函数层（本文章使用的激活函数时Relu函数），Softmax-with-Loss层。接下来我们先介绍各个层的代码（代码的复用性很高哦，可放心食用），然后还是构架一个2层的神经网络，并通过反向传播法求梯度最终成功完成分类。本文章使用的数据集是MNIST手写数字图像数字集。
复制链接

扫一扫