使用随机梯度下降法进行深度网络的学习-CSDN博客

本文链接：https://blog.csdn.net/weixin_43872532/article/details/112603013

本文介绍了神经网络的学习过程，包括均方误差和交叉熵两种损失函数，以及数值法求梯度的原理。通过实例展示了如何计算梯度并实现2层神经网络的训练，使用随机梯度下降法更新参数，最终通过训练和测试数据计算精度，展示网络性能的提升。

摘要由CSDN通过智能技术生成

神经网络的学习（随机梯度下降法）

损失函数
- 均方误差
- 交叉熵函数
梯度法
- 数值法求梯度
- 梯度法
学习算法的实现

损失函数

均方误差

可以做损失函数的函数有很多，其中有名的是均方误差，如下式所示：
$E=\frac{1}{2}\sum_{k} (y_k-t_k)^2$
这里， $y_k$ 表示神经网络的输出， $t_k$ 表示监督数据，k是数据的维度
例如，之前手写数字的识别一例中 $y_k$ ， $t_k$ 是由10各元素组成的数据

y=[0.1,0.05,0.6,0.0,0.05,0.1,0.0,0.1,0.0,0.0]
t=[0,0,1,0,0,0,0,0,0,0]#标签为one-hot格式，此处表示2

我们可以这样来实现它：

def mean_squared_error(y,t)
	return 0.5*np.sum((y-t)**2)

损失函数的值越小，和监督数据的误差就越小

交叉熵函数

除了均方差函数，交叉熵函数也经常被用作损失函数。交叉熵函数如下所示：
$E=-\sum_{k}t_klogy_k$
需要注意的是，np.log(0)会出现-inf这样会导致后续计算无法进行，所以引入一个微小量delat，防止负无穷的出现，实现函数如下：

def cross_entropy_error(y,t)
	delat = 1e-7
	return -np.sum(t*np.log(y+delat))

之后会采用此函数实现神经网络的学习过程（可能会融合批处理，详情阅读后续代码）

梯度法

数值法求梯度

使用概念来计算梯度，方式在线代，计算方法等课程有讲到，思路很简单，直接给出代码：

import numpy as np
def function(x):#定义一个简单的函数x0方+x1方
	return x[0]**2+x[1]**2

def numerical_gradient(f,x):
    h = 1e-4#0.001
    gard = np.zeros_like(x)#生成和x形状相同的数组

    for idx in range(x.size):
        tmp_val = x[idx]
		#f(x+h)
        x[idx] = tmp_val + h
        fh1 = f(x)
        #f(x+h)
        x[idx] = tmp_val - h
        fh2 = f(x)
        gard[idx] = (fh1-fh2)/(2*h)
        x[idx] = tmp_val#值还原
    
    return gard

grad = numerical_gradient(function,np.array([3.0],[4.0]))
print(grad)

梯度法

机器学习的主要任务是在学习时寻找最优参数。同样的，神经网络也必须在学习时找最优参数（权重和偏置）。这里所说的最优参数是指损失函数取最小值时的参数，一般而言，损失函数很复杂，参数空间庞大，我们不知道他在何处能取得最小值。而通过巧妙的使用梯度来寻找函数最小值的方法就是梯度法
虽然梯度的方向并不一定指向最小值，但沿着梯度的方向能最大限度的减小函数的值。
我们可以尝试用数学的方式来表示梯度法：
$x_0 = x_0-\eta\frac{\partial f}{\partial x_0}$
$x_1 = x_1-\eta\frac{\partial f}{\partial x_1}$
上式中 $\eta$ 表示更新量，在神经网络的学习中，我们称之为学习率，学习率在一次学习中，表示应该学习多少，在多大程度上更新参数。

学习算法的实现

步骤

步骤1（mini-batch）
从训练数据中随机选出一部分数据，这部分数据成为mini-batch。我们的目标是减小mini-batch的损失函数的值。
步骤2（计算梯度）
为了减小mini-batch的损失函数的值，需要求出各个权重参数的梯度
步骤3（更新参数）
将权重参数沿梯度方向进行微小更新
步骤4（重复）
重复上述3步骤
随机梯度下降法中的随机是指随机选择的意思，指对随机选择的数据进行的梯度下降法

2层神经网络类

首先，我们将这个2层的神经网络实现一个名为TwoLayerNet的类，实现过程如下所示：
functions.py见上篇博客的末尾：传送门

# coding: utf-8
import sys, os
from functions import *


class TwoLayerNet:

    def __init__(self, input_size, hidden_size, output_size, weight_init_std=0.01):
        # 初始化权重
        self.params = {}
        self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size)
        self.params['b2'] = np.zeros(output_size)

    def predict(self, x):
        W1, W2 = self.params['W1'], self.params['W2']
        b1, b2 = self.params['b1'], self.params['b2']
    
        a1 = np.dot(x, W1) + b1
        z1 = sigmoid(a1)
        a2 = np.dot(z1, W2) + b2
        y = softmax(a2)
        
        return y
        
    # x:输入数据, t:监督数据
    def loss(self, x, t):
        y = self.predict(x)
        
        return cross_entropy_error(y, t)
    
    def accuracy(self, x, t):
        y = self.predict(x)
        y = np.argmax(y, axis=1)
        t = np.argmax(t, axis=1)
        
        accuracy = np.sum(y == t) / float(x.shape[0])
        return accuracy
           
    def gradient(self, x, t):
        W1, W2 = self.params['W1'], self.params['W2']
        b1, b2 = self.params['b1'], self.params['b2']
        grads = {}
        
        batch_num = x.shape[0]
        
        # forward
        a1 = np.dot(x, W1) + b1
        z1 = sigmoid(a1)
        a2 = np.dot(z1, W2) + b2
        y = softmax(a2)
        
        # backward
        dy = (y - t) / batch_num
        grads['W2'] = np.dot(z1.T, dy)
        grads['b2'] = np.sum(dy, axis=0)
        
        da1 = np.dot(dy, W2.T)
        dz1 = sigmoid_grad(a1) * da1
        grads['W1'] = np.dot(x.T, dz1)
        grads['b1'] = np.sum(dz1, axis=0)

        return grads

变量	说明
params	保存神经网络参数的字典型变量
grads	保存梯度的字典型变量

方法	说明
init(self, input_size, hidden_size, output_size, weight_init_std=0.01)	进行初始化，参数依次表示输入层的神经元数，隐藏层的神经元数，输出层的神经元数
predict(self, x)	进行识别，参数x是图像数据
loss(self, x, t)	计算损失函数的值，参数X是图像数据，t是正确解标签（后面函数的参数也一样）
accuracy(self, x, t)	计算识别精度
gradient(self, x, t)	计算权重参数的梯度，是数值微分法的高速版，将在下一篇博客说明

实现网络训练

# coding: utf-8
import sys, os
sys.path.append(os.pardir)  # 为了导入父目录的文件而进行的设定
import numpy as np
import matplotlib.pyplot as plt
from load_mnist import load_mnist
from two_layer_net import TwoLayerNet

# 读入数据
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

iters_num = 10000  # 适当设定循环的次数
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    
    # 计算梯度
    grad = network.gradient(x_batch, t_batch)
    
    # 更新参数
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]
    
    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)
    
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print("train acc, test acc | " + str(train_acc) + ", " + str(test_acc))

# 绘制图形
markers = {'train': 'o', 'test': 's'}
x = np.arange(len(train_acc_list))
plt.plot(x, train_acc_list, label='train acc')
plt.plot(x, test_acc_list, label='test acc', linestyle='--')
plt.xlabel("epochs")
plt.ylabel("accuracy")
plt.ylim(0, 1.0)
plt.legend(loc='lower right')
plt.show()

#查看损失函数的变化
# x = np.arange(len(train_loss_list))
# plt.plot(x, train_loss_list, label='loss')
# plt.xlabel("iteration")
# plt.ylabel("loss")
# plt.ylim(0, 5.0)
# plt.legend(loc='best')
# plt.show()

运行结果如下: 在这里插入图片描述
下面来说明一下这篇代码的思路：
这里mini-batch的大小为100，需要每次从60000个数据中随机取出100个数据。然后对这个包含100笔数据的mini-batch求梯度，使用随机梯度下降法更新参数。这里，梯度法的更新次数（循环的次数）为10000.每更新一次，都对训练数据计算损失函数的值，并添加到数组中。运行上文代码中被注释掉的部分（将绘图的部分注释掉），得到下图结果：
在这里插入图片描述
可以看到，通过反复向神经网络输入数据，神经网络的损失函数的值正逐渐下降，神经网络正向最优参数靠近
接下来是数据评价部分
评价一个神经网络，必须使用不包含在训练样本中的数据，防止出现过拟合（过拟合指网络仅适用于训练用的样本集，不具有泛用性）这里每经过一个epoch，就会记录下训练数据和测试数据的识别精度