前馈神经网络实验(er)

基于上次前馈神经网络实验的探究自动梯度计算和算子实验

目录

自动梯度计算和预定义算子

利用预定义算子重新实现前馈神经网络

加练

完善Runner类

模型训练

性能评价

思考题

优化问题

参数初始化

​编辑

梯度消失问题

死亡 ReLU 问题

参考


自动梯度计算和预定义算子

虽然我们能够通过模块化的方式比较好地对神经网络进行组装,但是每个模块的梯度计算过程仍然十分繁琐且容易出错。在深度学习框架中,已经封装了自动梯度计算的功能,我们只需要聚焦模型架构,不再需要耗费精力进行计算梯度。

利用预定义算子重新实现前馈神经网络

下面我们使用torch的预定义算子来重新实现二分类任务。 主要使用到的预定义算子为torch.nn.Linear

class torch.nn.Linear(in_features, out_features, weight_attr=None, bias_attr=None, name=None)

torch.nn.Linear算子可以接受一个形状为[batch_size,∗,in_features]的输入张量,其中"∗"表示张量中可以有任意的其它额外维度,并计算它与形状为[in_features, out_features]的权重矩阵的乘积,然后生成形状为[batch_size,∗,out_features]的输出张量。 torch.nn.Linear算子默认有偏置参数,可以通过bias_attr=False设置不带偏置。

代码实现如下:

class Model_MLP_L2_V2(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Model_MLP_L2_V2, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)

        self.fc2 = nn.Linear(hidden_size, output_size)
        # 使用'torch.nn.functional.sigmoid'定义 Logistic 激活函数
        self.act_fn = F.sigmoid

    # 前向计算
    def forward(self, inputs):
        z1 = self.fc1(inputs)
        a1 = self.act_fn(z1)
        z2 = self.fc2(a1)
        a2 = self.act_fn(z2)
        return a2

加练

增加一个3个神经元的隐藏层,再次实现二分类,并与1做对比。

具体的改动有:

class Model_MLP_L2_V3(torch.nn.Module):
    def __init__(self, input_size, hidden_size, hidden_size2, output_size):
        super(Model_MLP_L2_V3, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        w1=torch.normal(0,0.1,size=(hidden_size,input_size),requires_grad=True)
        self.fc1.weight = nn.Parameter(w1)

        self.fc2 = nn.Linear(hidden_size, hidden_size2)
        w2 = torch.normal(0, 0.1, size=(hidden_size2, hidden_size), requires_grad=True)
        self.fc2.weight = nn.Parameter(w2)

        self.fc3 = nn.Linear(hidden_size2, output_size)
        w3 = torch.normal(0, 0.1, size=(output_size, hidden_size2), requires_grad=True)
        self.fc3.weight = nn.Parameter(w3)

        # 使用'torch.nn.functional.sigmoid'定义 Logistic 激活函数
        self.act_fn = torch.sigmoid

    # 前向计算
    def forward(self, inputs):
        z1 = self.fc1(inputs)
        a1 = self.act_fn(z1)
        z2 = self.fc2(a1)
        a2 = self.act_fn(z2)
        z3 = self.fc3(a2)
        a3 = self.act_fn(z3)
        return a3
input_size = 2
hidden_size = 5
hidden_size2 = 3
output_size = 1
model = Model_MLP_L2_V3(input_size=input_size, hidden_size=hidden_size, hidden_size2=hidden_size2, output_size=output_size)

主要是添加了一个隐藏层hidden_size2 ,然后它含有3个神经元,所以hidden_size2=3,然后在传播计算参数的forward部分也需要进行相关参数的更新。

运行结果

[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.50000
[Train] epoch: 0/1000, loss: 0.6949098110198975
[Train] epoch: 50/1000, loss: 0.6932579278945923
[Train] epoch: 100/1000, loss: 0.6932216882705688
[Train] epoch: 150/1000, loss: 0.6931880712509155
[Train] epoch: 200/1000, loss: 0.6931560039520264
[Train] epoch: 250/1000, loss: 0.6931244730949402
[Train] epoch: 300/1000, loss: 0.6930925846099854
[Train] epoch: 350/1000, loss: 0.6930593848228455
[Train] epoch: 400/1000, loss: 0.6930238604545593
[Train] epoch: 450/1000, loss: 0.6929848790168762
[Train] epoch: 500/1000, loss: 0.6929409503936768
[Train] epoch: 550/1000, loss: 0.692890465259552
[Train] epoch: 600/1000, loss: 0.6928313374519348
[Train] epoch: 650/1000, loss: 0.6927607655525208
[Train] epoch: 700/1000, loss: 0.692674994468689
[Train] epoch: 750/1000, loss: 0.6925693154335022
[Train] epoch: 800/1000, loss: 0.6924368143081665
[Train] epoch: 850/1000, loss: 0.6922679543495178
[Evaluate] best accuracy performence has been updated: 0.50000 --> 0.50625
[Train] epoch: 900/1000, loss: 0.6920491456985474
[Evaluate] best accuracy performence has been updated: 0.50625 --> 0.51250
[Evaluate] best accuracy performence has been updated: 0.51250 --> 0.51875
[Evaluate] best accuracy performence has been updated: 0.51875 --> 0.52500
[Evaluate] best accuracy performence has been updated: 0.52500 --> 0.53125
[Evaluate] best accuracy performence has been updated: 0.53125 --> 0.54375
[Evaluate] best accuracy performence has been updated: 0.54375 --> 0.55000
[Train] epoch: 950/1000, loss: 0.6917603015899658
[Evaluate] best accuracy performence has been updated: 0.55000 --> 0.55625
[Evaluate] best accuracy performence has been updated: 0.55625 --> 0.56250
[Evaluate] best accuracy performence has been updated: 0.56250 --> 0.56875
[Evaluate] best accuracy performence has been updated: 0.56875 --> 0.57500

Process finished with exit code 0

 

完善Runner类

基于上个实验中的Runner类,本次实验中的Runner类加入了自动梯度计算;模型保存时,使用state_dict方法获取模型参数;模型加载时,使用set_state_dict方法加载模型参数.

class RunnerV2_2(nn.Module):
    def __init__(self, model, optimizer, metric, loss_fn, **kwargs):
        super().__init__()
        self.model = model
        self.optimizer = optimizer
        self.loss_fn = loss_fn
        self.metric = metric

        # 记录训练过程中的评估指标变化情况
        self.train_scores = []
        self.dev_scores = []

        # 记录训练过程中的评价指标变化情况
        self.train_loss = []
        self.dev_loss = []

    def train(self, train_set, dev_set, **kwargs):
        # 将模型切换为训练模式
        self.model.train()
        # 传入训练轮数,如果没有传入值则默认为0
        num_epochs = kwargs.get("num_epochs", 0)
        # 传入log打印频率,如果没有传入值则默认为100
        log_epochs = kwargs.get("log_epochs", 100)
        # 传入模型保存路径,如果没有传入值则默认为"best_model.pdparams"
        save_path = kwargs.get("save_path", "best_model.pdparams")
        # log打印函数,如果没有传入则默认为"None"
        custom_print_log = kwargs.get("custom_print_log", None)
        # 记录全局最优指标
        best_score = 0
        # 进行num_epochs轮训练
        for epoch in range(num_epochs):
            X, y = train_set
            # 获取模型预测
            logits = self.model(X)
            # 计算交叉熵损失
            trn_loss = self.loss_fn(logits, y)
            self.train_loss.append(trn_loss.item())
            # 计算评估指标
            trn_score = self.metric(logits, y).item()
            self.train_scores.append(trn_score)

            # 自动计算参数梯度
            trn_loss.backward()
            if custom_print_log is not None:
                # 打印每一层的梯度
                custom_print_log(self)

            # 参数更新
            self.optimizer.step()
            # 清空梯度
            self.optimizer.zero_grad()

            dev_score, dev_loss = self.evaluate(dev_set)
            # 如果当前指标为最优指标,保存该模型
            if dev_score > best_score:
                self.save_model(save_path)
                print(f"[Evaluate] best accuracy performence has been updated: {best_score:.5f} --> {dev_score:.5f}")
                best_score = dev_score

            if log_epochs and epoch % log_epochs == 0:
                print(f"[Train] epoch: {epoch}/{num_epochs}, loss: {trn_loss.item()}")
    # 模型评估阶段,使用'paddle.no_grad()'控制不计算和存储梯度
    @torch.no_grad()
    def evaluate(self, data_set):
        # 将模型切换为评估模式
        self.model.eval()
        X, y = data_set
        # 计算模型输出
        logits = self.model(X)
        # 计算损失函数
        loss = self.loss_fn(logits, y).item()
        self.dev_loss.append(loss)
        # 计算评估指标
        score = self.metric(logits, y).item()
        self.dev_scores.append(score)
        return score, loss

    # 模型测试阶段,使用'paddle.no_grad()'控制不计算和存储梯度
    @torch.no_grad()
    def predict(self, X):
        # 将模型切换为评估模式
        self.model.eval()
        return self.model(X)

    # 使用'model.state_dict()'获取模型参数,并进行保存
    def save_model(self, saved_path):
        torch.save(self.model.state_dict(), saved_path)

    # 使用'model.set_state_dict'加载模型参数
    def load_model(self, model_path):
        state_dict = torch.load(model_path)
        self.model.set_state_dict(state_dict)

模型训练

实例化Runner类,并传入训练配置,代码实现如下:

# 设置模型
input_size = 2
hidden_size = 5
output_size = 1
model = Model_MLP_L2_V2(input_size=input_size, hidden_size=hidden_size, output_size=output_size)

# 设置损失函数
loss_fn = F.binary_cross_entropy
# 设置优化器
optimizer = torch.optim.SGD(model.parameters(), lr=0.2)
# 设置评价指标
metric = accuracy
# 其他参数
epoch_num = 1000
saved_path = 'best_model.pdparams'
# 实例化RunnerV2类,并传入训练配置
runner = RunnerV2_2(model, optimizer, metric, loss_fn)
runner.train([X_train, y_train], [X_dev, y_dev], num_epochs=epoch_num, log_epochs=50, save_path="best_model.pdparams")

其中所附计算准确率函数:

# 准确率 函数
def accuracy(preds, labels):
    # 判断是二分类任务还是多分类任务,preds.shape[1]=1时为二分类任务,preds.shape[1]>1时为多分类任务
    if preds.shape[1] == 1:
        # 二分类时,判断每个概率值是否大于0.5,当大于0.5时,类别为1,否则类别为0
        # preds的数据类型转换为float32类型
        preds = (preds >= 0.5).to(torch.float32)
    else:
        # 多分类时,使用torch.argmax计算最大元素索引作为类别
        preds = torch.argmax(preds, 1)
        preds = preds.to(torch.int32)
    return torch.mean(torch.as_tensor((preds == labels), dtype=torch.float32))
# 假设模型的预测值为[[0.],[1.],[1.],[0.]],真实类别为[[1.],[1.],[0.],[0.]],计算准确率
preds = torch.tensor([[0.], [1.], [1.], [0.]])
labels = torch.tensor([[1.], [1.], [0.], [0.]])
print("accuracy is:", accuracy(preds, labels))

运行结果为:

[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.21875
[Train] epoch: 0/1000, loss: 0.7022157311439514
[Evaluate] best accuracy performence has been updated: 0.21875 --> 0.26250
[Evaluate] best accuracy performence has been updated: 0.26250 --> 0.31875
[Evaluate] best accuracy performence has been updated: 0.31875 --> 0.43750
[Evaluate] best accuracy performence has been updated: 0.43750 --> 0.48750
[Evaluate] best accuracy performence has been updated: 0.48750 --> 0.52500
[Evaluate] best accuracy performence has been updated: 0.52500 --> 0.53125
[Evaluate] best accuracy performence has been updated: 0.53125 --> 0.54375
[Evaluate] best accuracy performence has been updated: 0.54375 --> 0.55625
[Evaluate] best accuracy performence has been updated: 0.55625 --> 0.57500
[Evaluate] best accuracy performence has been updated: 0.57500 --> 0.59375
[Evaluate] best accuracy performence has been updated: 0.59375 --> 0.60625
[Evaluate] best accuracy performence has been updated: 0.60625 --> 0.63125
[Evaluate] best accuracy performence has been updated: 0.63125 --> 0.66875
[Evaluate] best accuracy performence has been updated: 0.66875 --> 0.68125
[Evaluate] best accuracy performence has been updated: 0.68125 --> 0.71875
[Evaluate] best accuracy performence has been updated: 0.71875 --> 0.72500
[Evaluate] best accuracy performence has been updated: 0.72500 --> 0.75000
[Evaluate] best accuracy performence has been updated: 0.75000 --> 0.75625
[Evaluate] best accuracy performence has been updated: 0.75625 --> 0.76875
[Evaluate] best accuracy performence has been updated: 0.76875 --> 0.78125
[Evaluate] best accuracy performence has been updated: 0.78125 --> 0.80000
[Evaluate] best accuracy performence has been updated: 0.80000 --> 0.81250
[Evaluate] best accuracy performence has been updated: 0.81250 --> 0.81875
[Evaluate] best accuracy performence has been updated: 0.81875 --> 0.82500
[Train] epoch: 50/1000, loss: 0.6558495759963989
[Train] epoch: 100/1000, loss: 0.5948771238327026
[Train] epoch: 150/1000, loss: 0.5388158559799194
[Train] epoch: 200/1000, loss: 0.5058477520942688
[Train] epoch: 250/1000, loss: 0.4894803464412689
[Train] epoch: 300/1000, loss: 0.4813789427280426
[Train] epoch: 350/1000, loss: 0.47720932960510254
[Train] epoch: 400/1000, loss: 0.47499004006385803
[Train] epoch: 450/1000, loss: 0.4737810492515564
[Train] epoch: 500/1000, loss: 0.4731082022190094
[Train] epoch: 550/1000, loss: 0.47272247076034546
[Train] epoch: 600/1000, loss: 0.4724907875061035
[Train] epoch: 650/1000, loss: 0.4723418354988098
[Train] epoch: 700/1000, loss: 0.4722374379634857
[Train] epoch: 750/1000, loss: 0.47215747833251953
[Train] epoch: 800/1000, loss: 0.4720911979675293
[Train] epoch: 850/1000, loss: 0.47203296422958374
[Train] epoch: 900/1000, loss: 0.4719797670841217
[Train] epoch: 950/1000, loss: 0.4719299376010895

 将训练过程中训练集与验证集的准确率变化情况进行可视化处理:

# 可视化观察训练集与验证集的指标变化情况
def plot(runner, fig_name):
    plt.figure(figsize=(10, 5))
    epochs = [i for i in range(len(runner.train_scores))]

    plt.subplot(1, 2, 1)
    plt.plot(epochs, runner.train_loss, color='#e4007f', label="Train loss")
    plt.plot(epochs, runner.dev_loss, color='#f19ec2', linestyle='--', label="Dev loss")
    # 绘制坐标轴和图例
    plt.ylabel("loss", fontsize='large')
    plt.xlabel("epoch", fontsize='large')
    plt.legend(loc='upper right', fontsize='x-large')

    plt.subplot(1, 2, 2)
    plt.plot(epochs, runner.train_scores, color='#e4007f', label="Train accuracy")
    plt.plot(epochs, runner.dev_scores, color='#f19ec2', linestyle='--', label="Dev accuracy")
    # 绘制坐标轴和图例
    plt.ylabel("score", fontsize='large')
    plt.xlabel("epoch", fontsize='large')
    plt.legend(loc='lower right', fontsize='x-large')
    plt.savefig(fig_name)
    plt.show()

plot(runner, 'fw-acc.pdf')

性能评价

使用测试数据对训练完成后的最优模型进行评价,观察模型在测试集上的准确率以及loss情况。

# 模型评价
runner.load_model("best_model.pdparams")
score, loss = runner.evaluate([X_test, y_test])
print("[Test] score/loss: {:.4f}/{:.4f}".format(score, loss))

运行结果为:

[Test] score/loss: 0.7600/0.4883

从结果来看,模型在测试集上取得了较高的准确率。 

思考题

自定义梯度计算自动梯度计算从计算性能、计算结果等多方面比较,谈谈自己的看法。

自定义梯度计算

可能会由于不合适的参数选择,且权重是随机初始化的,导致拟合效果不好。

自动梯度计算

Pytorch所提供的autograd包能够能够根据输入和向前传播过程自动构建计算图,并执行反向图,进行梯度的自动计算。 

Tensor是这个pytorch的自动求导部分的核心类,如果将其属性.requires_grad=True,它将开始追踪(track) 在该tensor上的所有操作,从而实现利用链式法则进行的梯度传播。完成计算后,可以调用.backward()来完成所有梯度计算,此Tensor的梯度将累积到.grad属性中。

如果不想要被继续对tensor进行追踪,可以调用.detach()将其从追踪记录中分离出来,接下来的梯度就传不过去了。此外,还可以用with torch.no_grad()将不想被追踪的操作代码块包裹起来,这种方法在评估模型的时候很常用,因为此时并不需要继续对梯度进行计算。

注意:在y.backward()时,如果y是标量,则不需要为backward()传入任何参数,否则,需要传入一个与y同形状的Tensor。

具体如何使用Pytorch去实现自动梯度计算可参考最后链接,这里不做阐述。

关于自定义梯度计算和自动梯度计算的比较,可以使用本次实验代码尝试:

例如Model_MLP_L2_V2类中,使用的就是自定义梯度计算,其中

激活函数使用的是Logistic激活函数

 损失函数

 

 添加程序运行时间代码后,输出自定义梯度运行时长

 接下来使用pytorch自动梯度计算,只需要将参数计算部分修改为自动计算函数就可以

# 自动梯度计算
trn_loss.backward()

 

 可以看出自动梯度计算的速度是快于自定义梯度计算的,即有可能是因为自定义梯度计算选择的参数或随机化的权重导致的。

回答参考:

pytorch实现自动梯度计算

优化问题

通过实践来发现神经网络模型的优化问题,并思考如何改进。

参数初始化

实现一个神经网络前,需要先初始化模型参数。如果对每一层的权重和偏置都用0初始化,那么通过第一遍前向计算,所有隐藏层神经元的激活值都相同;在反向传播时,所有权重的更新也都相同,这样会导致隐藏层神经元没有差异性,出现对称权重现象

接下来,将模型参数全都初始化为0,看实验结果。这里重新定义了一个类TwoLayerNet_Zeros,两个线性层的参数全都初始化为0。

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import torch.nn.functional as F

from nndl2.dataset import make_moons
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

class Model_MLP_L2_V4(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Model_MLP_L2_V4, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)

        self.fc2 = nn.Linear(hidden_size, output_size)

        self.act_fn = F.sigmoid

    # 前向计算
    def forward(self, inputs):
        z1 = self.fc1(inputs)
        a1 = self.act_fn(z1)
        z2 = self.fc2(a1)
        a2 = self.act_fn(z2)
        return a2

这里附上月亮数据集代码

import torch
# 新增make_moons函数
def make_moons(n_samples=1000, shuffle=True, noise=None):
    n_samples_out = n_samples // 2
    n_samples_in = n_samples - n_samples_out
    outer_circ_x = torch.cos(torch.linspace(0, math.pi, n_samples_out))
    outer_circ_y = torch.sin(torch.linspace(0, math.pi, n_samples_out))
    inner_circ_x = 1 - torch.cos(torch.linspace(0, math.pi, n_samples_in))
    inner_circ_y = 0.5 - torch.sin(torch.linspace(0, math.pi, n_samples_in))
    X = torch.stack(
        [torch.cat([outer_circ_x, inner_circ_x]),
         torch.cat([outer_circ_y, inner_circ_y])],
         axis=1
    )
    y = torch.cat(
        [torch.zeros([n_samples_out]), torch.ones([n_samples_in])]
    )
    if shuffle:
        idx = torch.randperm(X.shape[0])
        X = X[idx]
        y = y[idx]
    if noise is not None:
        X += np.random.normal(0.0, noise, X.shape)

    return X, y
def print_weights(runner):
    print('The weights of the Layers:')
    for item in runner.model.named_parameters():
        print(item)
    for _, param in enumerate(runner.model.named_parameters()):
        print(param)

使用Runner类训练模型:

# 设置模型
input_size = 2
hidden_size = 5
output_size = 1
model = Model_MLP_L2_V4(input_size=input_size, hidden_size=hidden_size, output_size=output_size)

# 设置损失函数
loss_fn = F.binary_cross_entropy

# 设置优化器
optimizer = torch.optim.SGD(model.parameters(), lr=0.2)

# 设置评价指标
metric = accuracy

# 其他参数
epoch = 2000
saved_path = 'best_model.pdparams'
# 实例化RunnerV2类,并传入训练配置
runner = RunnerV2_2(model, optimizer, metric, loss_fn)

runner.train([X_train, y_train], [X_dev, y_dev], num_epochs=5, log_epochs=50, save_path="best_model.pdparams", custom_print_log=print_weights)

 打印出相应的权重和梯度:

The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[ 0.4618, -0.2339],
        [-0.5633,  0.3300],
        [-0.6991, -0.2421],
        [ 0.1939, -0.0767],
        [-0.0565,  0.4028]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0.2812, 0.5646, 0.1304, 0.3827, 0.0918], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[ 0.0198,  0.0295, -0.1418,  0.4028, -0.2293]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([-0.3413], requires_grad=True))
('fc1.weight', Parameter containing:
tensor([[ 0.4618, -0.2339],
        [-0.5633,  0.3300],
        [-0.6991, -0.2421],
        [ 0.1939, -0.0767],
        [-0.0565,  0.4028]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0.2812, 0.5646, 0.1304, 0.3827, 0.0918], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[ 0.0198,  0.0295, -0.1418,  0.4028, -0.2293]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([-0.3413], requires_grad=True))
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.41250
[Train] epoch: 0/5, loss: 0.6785968542098999
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[ 0.4620, -0.2341],
        [-0.5630,  0.3297],
        [-0.7005, -0.2408],
        [ 0.1988, -0.0802],
        [-0.0596,  0.4048]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0.2812, 0.5647, 0.1303, 0.3831, 0.0914], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[ 0.0312,  0.0240, -0.1446,  0.4098, -0.2303]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([-0.3347], requires_grad=True))
('fc1.weight', Parameter containing:
tensor([[ 0.4620, -0.2341],
        [-0.5630,  0.3297],
        [-0.7005, -0.2408],
        [ 0.1988, -0.0802],
        [-0.0596,  0.4048]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0.2812, 0.5647, 0.1303, 0.3831, 0.0914], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[ 0.0312,  0.0240, -0.1446,  0.4098, -0.2303]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([-0.3347], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[ 0.4623, -0.2343],
        [-0.5628,  0.3296],
        [-0.7020, -0.2395],
        [ 0.2036, -0.0839],
        [-0.0626,  0.4068]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0.2812, 0.5647, 0.1302, 0.3834, 0.0910], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[ 0.0421,  0.0181, -0.1477,  0.4166, -0.2316]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([-0.3288], requires_grad=True))
('fc1.weight', Parameter containing:
tensor([[ 0.4623, -0.2343],
        [-0.5628,  0.3296],
        [-0.7020, -0.2395],
        [ 0.2036, -0.0839],
        [-0.0626,  0.4068]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0.2812, 0.5647, 0.1302, 0.3834, 0.0910], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[ 0.0421,  0.0181, -0.1477,  0.4166, -0.2316]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([-0.3288], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[ 0.4627, -0.2347],
        [-0.5626,  0.3294],
        [-0.7034, -0.2383],
        [ 0.2085, -0.0876],
        [-0.0656,  0.4088]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0.2812, 0.5648, 0.1301, 0.3836, 0.0906], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[ 0.0527,  0.0120, -0.1511,  0.4231, -0.2333]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([-0.3234], requires_grad=True))
('fc1.weight', Parameter containing:
tensor([[ 0.4627, -0.2347],
        [-0.5626,  0.3294],
        [-0.7034, -0.2383],
        [ 0.2085, -0.0876],
        [-0.0656,  0.4088]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0.2812, 0.5648, 0.1301, 0.3836, 0.0906], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[ 0.0527,  0.0120, -0.1511,  0.4231, -0.2333]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([-0.3234], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[ 0.4633, -0.2352],
        [-0.5624,  0.3293],
        [-0.7048, -0.2369],
        [ 0.2135, -0.0914],
        [-0.0686,  0.4108]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0.2812, 0.5648, 0.1301, 0.3838, 0.0902], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[ 0.0630,  0.0056, -0.1547,  0.4293, -0.2353]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([-0.3185], requires_grad=True))
('fc1.weight', Parameter containing:
tensor([[ 0.4633, -0.2352],
        [-0.5624,  0.3293],
        [-0.7048, -0.2369],
        [ 0.2135, -0.0914],
        [-0.0686,  0.4108]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0.2812, 0.5648, 0.1301, 0.3838, 0.0902], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[ 0.0630,  0.0056, -0.1547,  0.4293, -0.2353]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([-0.3185], requires_grad=True))

 可视化训练和验证集上的主准确率和loss变化:

plot(runner, "fw-zero.pdf")

从输出结果看,二分类准确率为50%左右,说明模型没有学到任何内容。

但是老师所给出的网页中说该训练和验证loss几乎没有怎么下降,而我这里运行出的训练和验证loss是明显上升的,这是也可以称作没有下降还是实验结果出错了呢?

然后我多运行了几次发现结果图片变为 

 这次就比较符合所给出网页中的说法了。

为了避免对称权重现象,可以使用高斯分布或均匀分布初始化神经网络的参数。

高斯分布和均匀分布采样的实现和可视化代码如下:

# 使用'torch.normal'实现高斯分布采样,其中'mean'为高斯分布的均值,'std'为高斯分布的标准差,'shape'为输出形状
gausian_weights = torch.normal(mean=0.0, std=1.0, size=[10000])
# 使用'torch.uniform'实现在[min,max)范围内的均匀分布采样,其中'shape'为输出形状
uniform_weights = torch.Tensor(10000)
uniform_weights.uniform_(-1,1)
print(uniform_weights)
# 绘制两种参数分布
plt.figure()
plt.subplot(1,2,1)
plt.title('Gausian Distribution')
plt.hist(gausian_weights, bins=200, density=True, color='#f19ec2')
plt.subplot(1,2,2)
plt.title('Uniform Distribution')
plt.hist(uniform_weights, bins=200, density=True, color='#e4007f')
plt.savefig('fw-gausian-uniform.pdf')
plt.show()

梯度消失问题

在神经网络的构建过程中,随着网络层数的增加,理论上网络的拟合能力也应该是越来越好的。但是随着网络变深,参数学习更加困难,容易出现梯度消失问题。

由于Sigmoid型函数的饱和性,饱和区的导数更接近于0,误差经过每一层传递都会不断衰减。当网络层数很深时,梯度就会不停衰减,甚至消失,使得整个网络很难训练,这就是所谓的梯度消失问题。
在深度神经网络中,减轻梯度消失问题的方法有很多种,一种简单有效的方式就是使用导数比较大的激活函数,如:ReLU。

下面通过一个简单的实验观察前馈神经网络的梯度消失现象和改进方法。

模型构建

定义一个前馈神经网络,包含4个隐藏层和1个输出层,通过传入的参数指定激活函数。

# 定义多层前馈神经网络
class Model_MLP_L5(nn.Module):
    def __init__(self, input_size, output_size, act='sigmoid', w_init=torch.normal(mean=torch.tensor(0.0), std=torch.tensor(0.01)), b_init=torch.tensor(1.0)):
        super(Model_MLP_L5, self).__init__()
        self.fc1 = torch.nn.Linear(input_size, 3)
        self.fc2 = torch.nn.Linear(3, 3)
        self.fc3 = torch.nn.Linear(3, 3)
        self.fc4 = torch.nn.Linear(3, 3)
        self.fc5 = torch.nn.Linear(3, output_size)
        # 定义网络使用的激活函数
        if act == 'sigmoid':
            self.act = F.sigmoid
        elif act == 'relu':
            self.act = F.relu
        elif act == 'lrelu':
            self.act = F.leaky_relu
        else:
            raise ValueError("Please enter sigmoid relu or lrelu!")
        # 初始化线性层权重和偏置参数
        self.init_weights(w_init, b_init)

    # 初始化线性层权重和偏置参数
    def init_weights(self, w_init, b_init):
        # 使用'named_sublayers'遍历所有网络层
        for n, m in self.named_parameters():
            # 如果是线性层,则使用指定方式进行参数初始化
            if isinstance(m, nn.Linear):
                w_init(m.weight)
                b_init(m.bias)

    def forward(self, inputs):
        outputs = self.fc1(inputs)
        outputs = self.act(outputs)
        outputs = self.fc2(outputs)
        outputs = self.act(outputs)
        outputs = self.fc3(outputs)
        outputs = self.act(outputs)
        outputs = self.fc4(outputs)
        outputs = self.act(outputs)
        outputs = self.fc5(outputs)
        outputs = F.sigmoid(outputs)
        return outputs

使用Sigmoid型函数进行训练

使用Sigmoid型函数作为激活函数,为了便于观察梯度消失现象,只进行一轮网络优化。

定义梯度打印函数:

def print_grads(runner):
    # 打印每一层的权重的模
    print('The gradient of the Layers:')
    for name, item in runner.model.named_parameters():
        if(len(item.size())==2):
             print(name, torch.norm(input=item, p=2))
torch.manual_seed(102)
# 学习率大小
lr = 0.01

# 定义网络,激活函数使用sigmoid
model = Model_MLP_L5(input_size=2, output_size=1, act='sigmoid')

# 定义优化器
optimizer = torch.optim.SGD(model.parameters(), lr)

# 定义损失函数,使用交叉熵损失函数
loss_fn = F.binary_cross_entropy

# 定义评价指标
metric = accuracy

# 指定梯度打印函数
custom_print_log = print_grads

 实例化Runner类,并传入训练配置。代码实现如下:

runner = RunnerV2_2(model, optimizer, metric, loss_fn)

模型训练,打印网络每层梯度值的ℓ2范数。代码实现如下: 

# 启动训练
runner.train([X_train, y_train], [X_dev, y_dev],
            num_epochs=1, log_epochs=None,
            save_path="best_model.pdparams",
            custom_print_log=custom_print_log)

运行结果为: 

The gradient of the Layers:
fc1.weight tensor(1.0447, grad_fn=<NormBackward1>)
fc2.weight tensor(1.2803, grad_fn=<NormBackward1>)
fc3.weight tensor(0.8694, grad_fn=<NormBackward1>)
fc4.weight tensor(1.0071, grad_fn=<NormBackward1>)
fc5.weight tensor(0.5389, grad_fn=<NormBackward1>)
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.53125

 观察实验结果可以发现,梯度经过神经元每一层的传递都会不断衰减,当传递到第一层时,梯度几乎完全消失。

使用ReLU函数进行模型训练

torch.manual_seed(102)
lr = 0.01  # 学习率大小

# 定义网络,激活函数使用relu
model =Model_MLP_L5(input_size=2, output_size=1, act='relu')

# 定义优化器
optimizer = torch.optim.SGD(model.parameters(), lr)

# 定义损失函数
# 定义损失函数,这里使用交叉熵损失函数
loss_fn = F.binary_cross_entropy

# 定义评估指标
metric = accuracy

# 实例化Runner
runner = RunnerV2_2(model, optimizer, metric, loss_fn)

# 启动训练
runner.train([X_train, y_train], [X_dev, y_dev],
            num_epochs=1, log_epochs=None,
            save_path="best_model.pdparams",
            custom_print_log=custom_print_log)

运行结果为:

The gradient of the Layers:
fc1.weight tensor(0.8176, grad_fn=<NormBackward1>)
fc2.weight tensor(0.9802, grad_fn=<NormBackward1>)
fc3.weight tensor(0.9874, grad_fn=<NormBackward1>)
fc4.weight tensor(1.0451, grad_fn=<NormBackward1>)
fc5.weight tensor(0.4850, grad_fn=<NormBackward1>)
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.53125

下图展示了使用不同激活函数时,网络每层梯度值的ℓ2ℓ2范数情况。从结果可以看到,5层的全连接前馈神经网络使用Sigmoid型函数作为激活函数时,梯度经过每一个神经层的传递都会不断衰减,最终传递到第一个神经层时,梯度几乎完全消失。改为ReLU激活函数后,梯度消失现象得到了缓解,每一层的参数都具有梯度值。

死亡 ReLU 问题

ReLU激活函数可以一定程度上改善梯度消失问题,但是ReLU函数在某些情况下容易出现死亡 ReLU问题,使得网络难以训练。这是由于当x<0时,ReLU函数的输出恒为0。在训练过程中,如果参数在一次不恰当的更新后,某个ReLU神经元在所有训练数据上都不能被激活(即输出为0),那么这个神经元自身参数的梯度永远都会是0,在以后的训练过程中永远都不能被激活。而一种简单有效的优化方式就是将激活函数更换为Leaky ReLU、ELU等ReLU的变种。

为什么ReLU会导致死亡节点呢?

使用ReLU进行模型训练

使用上面定义的多层全连接前馈网络进行实验,使用ReLU作为激活函数,观察死亡ReLU现象和优化方法。当神经层的偏置被初始化为一个相对于权重较大的负值时,可以想像,输入经过神经层的处理,最终的输出会为负值,从而导致死亡ReLU现象。

# 定义网络,并使用较大的负值来初始化偏置
model = Model_MLP_L5(input_size=2, output_size=1, act='relu', b_init=torch.tensor(-0.8))

实例化Runner类,启动模型训练,打印网络每层梯度值的ℓ2ℓ2范数。代码实现如下: 

# 实例化Runner类
runner = RunnerV2_2(model, optimizer, metric, loss_fn)

# 启动训练
runner.train([X_train, y_train], [X_dev, y_dev],
            num_epochs=1, log_epochs=0,
            save_path="best_model.pdparams",
            custom_print_log=custom_print_log)

 运行结果为:

The gradient of the Layers:
linear_14 0.0
linear_15 0.0
linear_16 0.0
linear_17 0.0
linear_18 0.0
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.53750

从输出结果可以发现,使用 ReLU 作为激活函数,当满足条件时,会发生死亡ReLU问题,网络训练过程中 ReLU 神经元的梯度始终为0,参数无法更新。

针对死亡ReLU问题,一种简单有效的优化方式就是将激活函数更换为Leaky ReLU、ELU等ReLU 的变种。接下来,观察将激活函数更换为 Leaky ReLU时的梯度情况。

使用Leaky ReLU进行模型训练

将激活函数更换为Leaky ReLU进行模型训练,观察梯度情况。

# 重新定义网络,使用Leaky ReLU激活函数
model =  Model_MLP_L5(input_size=2, output_size=1, act='lrelu', b_init=torch.tensor(-0.8))

# 实例化Runner类
runner = RunnerV2_2(model, optimizer, metric, loss_fn)

# 启动训练
runner.train([X_train, y_train], [X_dev, y_dev],
            num_epochs=1, log_epochps=None,
            save_path="best_model.pdparams",
            custom_print_log=custom_print_log)

输出结果为:

The gradient of the Layers:
fc1.weight tensor(0.7548, grad_fn=<NormBackward1>)
fc2.weight tensor(1.1612, grad_fn=<NormBackward1>)
fc3.weight tensor(1.0495, grad_fn=<NormBackward1>)
fc4.weight tensor(1.0805, grad_fn=<NormBackward1>)
fc5.weight tensor(0.5799, grad_fn=<NormBackward1>)
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.4965
[Train] epoch: 0/1, loss: 0.7061845328474692

Process finished with exit code 0

当前向传递中一个神经元的值恒等于 0,该神经元对应的权重的梯度将为0,这时权重得不到更新,这便是ReLU死亡问题。
如果一个 R e L U ReLUReLU 神经元由于被不恰当地初始化而恒等于 0(这时不是模型参数的问题),或是其对应的参数在训练过程中由于大幅度的更新而接近于 0(这时在下一样本的计算中该神经元的值就会趋于为 0,随着而来的是权重的梯度为 0,权重无法更新,导致该神经元的值恒为 0),那么这个神经元将永远处于死亡状态。

从输出结果可以看到,将激活函数更换为Leaky ReLU后,死亡ReLU问题得到了改善,梯度恢复正常,参数也可以正常更新。但是由于 Leaky ReLU 中,x<0时的斜率默认只有0.01,所以反向传播时,随着网络层数的加深,梯度值越来越小。如果想要改善这一现象,将 Leaky ReLU 中,x<0时的斜率调大即可。

     通过做这部分关于前置神经网络的实验,我个人做的比较吃力,对样例代码中很多部分都不是很了解,很多时候知识一昧地再找bug,上学期机器学习所学的知识忘记了很多,后面需要找时间将这部分的知识重新学一次,如果哪位同学有好的神经网络学习资源希望可以推荐一波。

 

参考

课程魏老师csdn主页:(https://blog.csdn.net/qq_38975453?type=blog)

AI Studio,博客园(https://blog.csdn.net/lanchunhui/article/details/52083273)

  • 3
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值