《深度学习入门》第4章实战：手写数字识别_基于深度学习的手写数字识别-CSDN博客

本文链接：https://blog.csdn.net/rellvera/article/details/127933046

文章目录

前言
一、理论知识
二、全部代码

前言

这篇文章根据《深度学习入门》第4章的内容，完成了手写数字识别这个小案例。这一章节的重点是，如何让神经网络“学会学习”。为了能够使得神经网络学会学习，将导入损失函数这一指标，找到使损失函数达到最小的权重参数。为了找出尽可能小的损失函数值，我们使用梯度下降法。

一、理论知识

（一）神经网络的学习步骤

mini-batch：从训练数据中随机挑选出一部分数据，这部分数据称为mini-batch。把mini-batch中的数据送入网络中，随后可以得到预测结果。根据预测结果以及正确结果，计算出损失函数。
计算梯度：为了减小mini-batch损失函数的值，需要求出各个权重参数的梯度。梯度表示损失函数的值减小最多的方向。
更新参数：将权重参数沿着梯度方向进行微小的更新。
重复：重复步骤1-3.

（二）梯度和梯度下降

梯度：由全部变量的偏导数汇总而成的向量称为梯度。梯度指示的方向是各点处函数值减小最多的方向。
梯度法：不断沿着梯度方向前进，逐渐减小函数值的过程的方法。其中，梯度上升法指的是寻找最大值的梯度法；梯度下降法指的是寻找最小值的梯度法。

（三）损失函数

损失函数：神经网络的学习中所用的指标，能够用来表示当前的神经网络对监督数据在多大程度上不拟合。常用的损失函数有均方误差、交叉熵误差。
均方误差：
在这里插入图片描述
其中y_k表示神经网络的输出，t_k表示实际数据，k表示数据的维数。
代码实现：

def mean_squared_error(y, t):
	return 0.5 * np.sum((y-t)**2)

交叉熵误差：
在这里插入图片描述
y_k表示神经网络的输出（是个概率，如sigmoid或者softmax的输出），t_k是正确解的标签（t_k采用one-hot表示）
代码实现：

def cross_entropy_error(y, t):
	delta = 1e-7
	return -np.sum(t * np.log(y + delta))

(四) epoch、iters_num

epoch：epoch是一个单位，一个epoch表示学习中所有训练数据均被使用过一次时的更新次数。对于10000笔训练数据，用大小为100笔数据的mini-batch进行学习时，重复随机梯度下降法100次，所有的训练数据就都被看过了。所以在这个例子中，epoch是100。
iters_num：梯度法的循环次数。（在本次手写数字识别的案例中，iters_num为10000。意思就是，每次随机抽一个mini_batch，重复抽取10000次。）

（五）本案例的神经网络结构

这个网络采用的是两层神经网络。网络结构大致如下：

输入层：784个神经元。
隐藏层：50个神经元。
输出层：10个神经元。

二、全部代码

import sys, os

sys.path.append(os.pardir)
import numpy as np
import matplotlib.pyplot as plt
from common.functions import *
from common.gradient import numerical_gradient
from dataset.mnist import load_mnist
from dataset.two_layer_net import TwoLayerNet


def cross_entropy_error(y, t):
    if y.ndim == 1:
        t = t.reshape(1, t.size)
        y = y.reshape(1, y.size)
        batch_size = y.shape[0]
        return -np.sum(t * np.log(y + 1e-7)) / batch_size


class TwoLayerNet:
    def __init__(self, input_size, hidden_size, output_size, weight_init_std=0.01):
        # 初始化权重
        self.params = {}
        self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size)
        self.params['b2'] = np.zeros(output_size)

    def sigmoid(a):
        return 1 / (1 + np.exp(-a))

    def softmax(a):
        exp_a = np.exp(a)
        sum = np.sum(exp_a)
        y = exp_a / sum
        return y

    def predict(self, x):
        W1, W2 = self.params['W1'], self.params['W2']
        b1, b2 = self.params['b1'], self.params['b2']
        a1 = np.dot(x, W1) + b1
        z1 = self.sigmoid(a1)
        a2 = np.dot(z1, W2) + b2
        z2 = self.softmax(a2)
        return z2

    # x是输入数据，t是标签
    def loss(self, x, t):
        y = self.predict(x)
        return cross_entropy_error(y, t)  # 交叉熵损失函数

    def accuracy(self, x, t):
        y = self.predict(x)
        y = np.argmax(y, axis=1)
        t = np.argmax(t, axis=1)
        accuracy = np.sum(y == t) / float(x.shape[0])
        return accuracy

    def numerical_gradient(self, x, t):
        loss_W = lambda W: self.loss(x, t)
        grads = {}
        grads['W1'] = numerical_gradient(loss_W, self.params['W1'])
        grads['W2'] = numerical_gradient(loss_W, self.params['W2'])
        grads['b1'] = numerical_gradient(loss_W, self.params['b1'])
        grads['b2'] = numerical_gradient(loss_W, self.params['b2'])
        return grads

    (x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)
    train_loss_list = []
    train_acc_list = []
    test_acc_list = []


    # 超参数
    iters_num = 500
    train_size = x_train.shape[0]
    batch_size = 100
    learning_rate = 0.1
    network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)
    # 平均每个epoch的重复次数
    iter_per_epoch = max(train_size / batch_size, 1)
    for i in range(iters_num):
        # 获取mini-batch
        batch_mask = np.random.choice(train_size, batch_size)
        x_batch = x_train[batch_mask]
        t_batch = t_train[batch_mask]

        # 计算梯度
        grad = network.numerical_gradient(x_batch, t_batch)
        print('hello')
        # 更新参数
        for key in ('W1', 'b1', 'W2', 'b2'):
            network.params[key] -= learning_rate * grad[key]

        # 记录学习过程
        loss = network.loss(x_batch, t_batch)
        train_loss_list.append(loss)

        # 计算每个epoch的识别精度
        if i % iter_per_epoch == 0:
            train_acc = network.accuracy(x_train, t_train)
            test_acc = network.accuracy(x_test, t_test)
            train_acc_list.append(train_acc)
            test_acc_list.append(test_acc)


    # 导入数据
    m = list(np.arange(1, iters_num+1))
    n = list(np.arange(1, len(train_acc_list)+1))
    t = list(np.arange(1, len(test_acc_list)+1))
    # 绘图命令
    print(train_loss_list)
    print(train_acc_list)
    print(test_acc_list)
    # 画第一个图
    plt.subplot(221)
    plt.plot(m, train_loss_list)
    # show出图形
    plt.show()