“深度学习”学习日记。神经网络的学习--神经网络的梯度

在撒哈拉卖雨伞

已于 2023-01-11 19:49:36 修改

阅读量191

点赞数 1

文章标签：深度学习神经网络学习

于 2023-01-11 19:45:24 首次发布

本文链接：https://blog.csdn.net/m0_72675651/article/details/128647962

版权

2023.1.11

神经网络的学习也要求梯度，而这个梯度是指损失函数关于权重的梯度。

以交叉熵误差函数举例子： $E=-\sum_{k}^{} t_{k} \log y_{k}$

假设权重变化的函数是w，而w就是t的函数，所以有：

$w=\begin{bmatrix} w_{1} & w_{2} & w_{3} \\ w_{4} & w_{5} & w_{6} \end{bmatrix}$

$\frac{\partial E}{\partial w}= \begin{bmatrix} \frac{\partial E}{\partial w_{1} }& \frac{\partial E}{\partial w_{2} }&\frac{\partial E}{\partial w_{3} } \\ \frac{\partial E}{\partial w_{4} }& \frac{\partial E}{\partial w_{5} }& \frac{\partial E}{\partial w_{6} } \end{bmatrix}$

代码实现：

import os, sys
import numpy as np

sys.path.append(os.pardir)


def softmax(x):  # 一种神经网络的激活函数
    if x.ndim == 2:  # 判断数组x的维度是否为2
        x = x.T  # 数组（矩阵）x的转置
        x = x - np.max(x, axis=0)
        y = np.exp(x) / np.sum(np.exp(x), axis=0)
        return y.T

    x = x - np.max(x)
    return np.exp(x) / np.sum(np.exp(x))


def cross_entropy_error(y, t):
    delta = 1e-7
    return -1 * np.sum(t * np.log(y + delta))


def numerical_gradient(f, x):
    h = 1e-4  # 0.0001
    grad = np.zeros_like(x)

    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])  # np.nditer() 迭代器处理多维数组
    while not it.finished:
        idx = it.multi_index
        tmp_val = x[idx]
        x[idx] = float(tmp_val) + h
        fxh1 = f(x)  # f(x+h)

        x[idx] = tmp_val - h
        fxh2 = f(x)  # f(x-h)
        grad[idx] = (fxh1 - fxh2) / (2 * h)

        x[idx] = tmp_val  # 还原值
        it.iternext()

    return grad


class simplenet:  # 神经网络利用测试集的分类工作
    def __init__(self):
        self.W = np.random.randn(2, 3)  # 随机生成一个（2，3）的矩阵作为权重参数

    def predict(self, x):
        return np.dot(x, self.W)  # 推理处理

    def loss(self, x, t):
        z = self.predict(x)
        y = softmax(z)  # 神经网络分类结果
        loss = cross_entropy_error(y, t)

        return loss  # 返回损失函数


# 假设神经网络的学习阶段的损失函数
def f(W):
    return net.loss(x, t)


def gradient_descent(f, x, lr, step_num):  # 利用梯度法求损失函数最小指
    init_x = x

    for i in range(step_num):
        grad = numerical_gradient(f, init_x)
        x -= lr * grad

    return x


net = simplenet()
print("参与计算的权重：", "\n", net.W)
print("分别输入测试集x0，x1", "\n")
x0 = float(input())  # 假设输入测试集
x1 = float(input())
x = np.array([x0, x1])

y = net.predict(x)  # 神经网络推理处理得到 ”分类结果“
print("假设的分类结果：", "\n", y)
y1 = np.argmax(y)  # 输出最大索引
t = np.zeros_like(y)
# 正确解标签
for i in range(y.size):
    if i == y1:
        t[i] = 1

print("监督数据：", "\n", t)  # 得到监督数据t

dw = numerical_gradient(f, net.W)  # 求取神经网络的学习阶段的损失函数的梯度
print("神经网络的梯度：", "\n", dw)  # 内置了神经网络损失函数和权重计算的梯度

print("求取损失函数的最小值", "\n", gradient_descent(f, x, 0.01, 100))  # 内置损失函数 利用梯度法求取损失函数的最小值

numerical_gradient(f,net.W) 的结果dw是一个形状为2×3的二维数组（如图）：

这里结合数学意义，我们可以知道， $\frac{\partial E}{\partial w_{1}}$ 的值大约是-1，这表示如果w1增加 $\Delta w$ ，那么损失函数E的值会增加 $-\Delta w$ 。因此，我们可以知道正值偏导数应该向负方向更新，而正值偏导数应该向正方向更新。