NNDL 作业4：第四章课后题 Relu死亡节点问题

最新推荐文章于 2023-10-31 20:02:44 发布

OhMy棒棒糖yqy

最新推荐文章于 2023-10-31 20:02:44 发布

阅读量860

点赞数 1

文章标签：深度学习 pytorch 神经网络

本文链接：https://blog.csdn.net/qq_58153224/article/details/127212721

版权

在这里插入图片描述

习题4-2

XOR问题即异或问题。
坐标为（0，0），（1，1）的点的标签的标签表示为0，坐标为（0，1），（1，0）的点的标签为1。我们都知道用一条直线并不能将这些点按标签分为两类，也就是说线性模型不能解决异或问题。现用非线性二分类方法解决异或问题。

根据题目要求用pytorch设计神经网络模型。
可以预见的是会出现问题，因为激活函数选用的是Relu，之前选用的都是Sigmoid函数。
因此打印每次迭代时激活函数都输入和输出以深入观察。

import torch

class XOR_net(torch.nn.Module):

    def __init__(self):
        super(XOR_net, self).__init__()
        self.hide=torch.nn.Linear(2,2)
        self.out=torch.nn.Linear(2,1)
        self.relu=torch.nn.ReLU()

    def forward(self,x,i):
        print('##########################epoches:',i,'##############################')
        x1=self.hide(x)
        print('hide activate input:',x1)
        x1=self.relu(x1)
        print('hide activate output:',x1)
        x2=self.out(x1)
        print('out activate input:',x2)
        pre_y=self.relu(x2)
        print('out activate output:',pre_y)
        return pre_y

    def save_model(self,path):
        torch.save(self,path)

    def load_model(self,path):
        torch.load(path)


if __name__=='__main__':
    X = torch.Tensor([[0, 0], [0, 1], [1, 0], [1, 1]])
    y = torch.Tensor([0, 1, 1, 0]).reshape(4, 1)
    net=XOR_net()
    epoches=20
    for i in range(epoches):
        optim=torch.optim.SGD(net.parameters(),lr=0.1)
        loss=torch.nn.BCELoss()
        pre_y=net.forward(X,i)
        print(net.state_dict())
        l=loss(pre_y,y)
        print('loss:',l.item())
        optim.zero_grad()
        l.backward()
        optim.step()

经过多次运行发现大致有以下几种情况：

第一种

##########################epoches: 0 ##############################
hide activate input: tensor([[0.3926, 0.5016],
        [0.9581, 0.4340],
        [0.5886, 0.4157],
        [1.1542, 0.3481]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.3926, 0.5016],
        [0.9581, 0.4340],
        [0.5886, 0.4157],
        [1.1542, 0.3481]], grad_fn=<ReluBackward0>)
out activate input: tensor([[-0.3059],
        [-0.5245],
        [-0.4223],
        [-0.6410]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.],
        [0.],
        [0.],
        [0.]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[ 0.1961,  0.5655],
        [-0.0859, -0.0676]])), ('hide.bias', tensor([0.3926, 0.5016])), ('out.weight', tensor([[-0.3088,  0.6508]])), ('out.bias', tensor([-0.5111]))])
loss: 50.0
...
##########################epoches: 199 ##############################
hide activate input: tensor([[0.3926, 0.5016],
        [0.9581, 0.4340],
        [0.5886, 0.4157],
        [1.1542, 0.3481]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.3926, 0.5016],
        [0.9581, 0.4340],
        [0.5886, 0.4157],
        [1.1542, 0.3481]], grad_fn=<ReluBackward0>)
out activate input: tensor([[-0.3059],
        [-0.5245],
        [-0.4223],
        [-0.6410]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.],
        [0.],
        [0.],
        [0.]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[ 0.1961,  0.5655],
        [-0.0859, -0.0676]])), ('hide.bias', tensor([0.3926, 0.5016])), ('out.weight', tensor([[-0.3088,  0.6508]])), ('out.bias', tensor([-0.5111]))])
loss: 50.0

隐藏层Relu输入的全为正数，输出全是正数。
但是输出层Relu输入的全是负数，输出的结果全为零。
并且在之后的迭代过程中所有参数都没有发生更新。

第二种

##########################epoches: 0 ##############################
hide activate input: tensor([[-0.0369, -0.0821],
        [-0.3322,  0.0315],
        [ 0.0280,  0.5838],
        [-0.2673,  0.6974]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.0000, 0.0000],
        [0.0000, 0.0315],
        [0.0280, 0.5838],
        [0.0000, 0.6974]], grad_fn=<ReluBackward0>)
out activate input: tensor([[-0.6933],
        [-0.6838],
        [-0.5170],
        [-0.4830]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.],
        [0.],
        [0.],
        [0.]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[ 0.0649, -0.2953],
        [ 0.6659,  0.1136]])), ('hide.bias', tensor([-0.0369, -0.0821])), ('out.weight', tensor([[0.0100, 0.3015]])), ('out.bias', tensor([-0.6933]))])
loss: 50.0
...
##########################epoches: 199 ##############################
hide activate input: tensor([[-0.0369, -0.0821],
        [-0.3322,  0.0315],
        [ 0.0280,  0.5838],
        [-0.2673,  0.6974]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.0000, 0.0000],
        [0.0000, 0.0315],
        [0.0280, 0.5838],
        [0.0000, 0.6974]], grad_fn=<ReluBackward0>)
out activate input: tensor([[-0.6933],
        [-0.6838],
        [-0.5170],
        [-0.4830]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.],
        [0.],
        [0.],
        [0.]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[ 0.0649, -0.2953],
        [ 0.6659,  0.1136]])), ('hide.bias', tensor([-0.0369, -0.0821])), ('out.weight', tensor([[0.0100, 0.3015]])), ('out.bias', tensor([-0.6933]))])
loss: 50.0

隐藏层激活函数的部分输入小于零，导致其部分输出为零。
输出层激活函数输入全为负数，导致输出全为零。
所有参数都没更新。

第三种

##########################epoches: 0 ##############################
hide activate input: tensor([[0.6826, 0.0530],
        [1.1280, 0.3284],
        [1.1546, 0.1576],
        [1.6000, 0.4330]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.6826, 0.0530],
        [1.1280, 0.3284],
        [1.1546, 0.1576],
        [1.6000, 0.4330]], grad_fn=<ReluBackward0>)
out activate input: tensor([[0.7043],
        [0.6422],
        [0.7387],
        [0.6766]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.7043],
        [0.6422],
        [0.7387],
        [0.6766]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[0.4720, 0.4453],
        [0.1046, 0.2754]])), ('hide.bias', tensor([0.6826, 0.0530])), ('out.weight', tensor([[ 0.1914, -0.5348]])), ('out.bias', tensor([0.6020]))])
loss: 0.7732471823692322
...
##########################epoches: 199 ##############################
hide activate input: tensor([[0.6724, 0.0879],
        [1.1153, 0.3708],
        [1.1399, 0.2079],
        [1.5828, 0.4908]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.6724, 0.0879],
        [1.1153, 0.3708],
        [1.1399, 0.2079],
        [1.5828, 0.4908]], grad_fn=<ReluBackward0>)
out activate input: tensor([[0.5755],
        [0.4777],
        [0.5699],
        [0.4721]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.5755],
        [0.4777],
        [0.5699],
        [0.4721]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[0.4674, 0.4429],
        [0.1200, 0.2829]])), ('hide.bias', tensor([0.6724, 0.0879])), ('out.weight', tensor([[ 0.1281, -0.5464]])), ('out.bias', tensor([0.5374]))])
loss: 0.699190080165863

隐藏层激活函数输入全为正，输出全为正。
输出层激活函数输入全为正，输出全为正。
所有参数更新。

第四种

##########################epoches: 0 ##############################
hide activate input: tensor([[ 0.5341, -0.1522],
        [ 0.5581, -0.2097],
        [ 0.7221,  0.3766],
        [ 0.7461,  0.3191]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.5341, 0.0000],
        [0.5581, 0.0000],
        [0.7221, 0.3766],
        [0.7461, 0.3191]], grad_fn=<ReluBackward0>)
out activate input: tensor([[0.2988],
        [0.2895],
        [0.4744],
        [0.4274]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.2988],
        [0.2895],
        [0.4744],
        [0.4274]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[ 0.1880,  0.0240],
        [ 0.5288, -0.0574]])), ('hide.bias', tensor([ 0.5341, -0.1522])), ('out.weight', tensor([[-0.3837,  0.6579]])), ('out.bias', tensor([0.5037]))])
loss: 0.7244118452072144
...
##########################epoches: 199 ##############################
hide activate input: tensor([[ 0.5104, -0.1533],
        [ 0.5165, -0.2734],
        [ 0.6989,  0.3744],
        [ 0.7050,  0.2544]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.5104, 0.0000],
        [0.5165, 0.0000],
        [0.6989, 0.3744],
        [0.7050, 0.2544]], grad_fn=<ReluBackward0>)
out activate input: tensor([[0.3902],
        [0.3880],
        [0.5736],
        [0.4915]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.3902],
        [0.3880],
        [0.5736],
        [0.4915]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[ 0.1885,  0.0061],
        [ 0.5277, -0.1201]])), ('hide.bias', tensor([ 0.5104, -0.1533])), ('out.weight', tensor([[-0.3495,  0.6657]])), ('out.bias', tensor([0.5685]))])
loss: 0.6683593392372131

隐藏层激活函数的部分输入小于零，导致其部分输出为零。
输出层激活函数输入全为正，输出全为正。
所有参数更新。

第五种

##########################epoches: 0 ##############################
hide activate input: tensor([[-0.6011,  0.0203],
        [-0.6969, -0.4559],
        [-0.3011, -0.1647],
        [-0.3969, -0.6409]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.0000, 0.0203],
        [0.0000, 0.0000],
        [0.0000, 0.0000],
        [0.0000, 0.0000]], grad_fn=<ReluBackward0>)
out activate input: tensor([[-0.4616],
        [-0.4527],
        [-0.4527],
        [-0.4527]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.],
        [0.],
        [0.],
        [0.]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[ 0.3000, -0.0958],
        [-0.1850, -0.4762]])), ('hide.bias', tensor([-0.6011,  0.0203])), ('out.weight', tensor([[-0.4714, -0.4393]])), ('out.bias', tensor([-0.4527]))])
loss: 50.0
...
##########################epoches: 199 ##############################
hide activate input: tensor([[-0.6011,  0.0203],
        [-0.6969, -0.4559],
        [-0.3011, -0.1647],
        [-0.3969, -0.6409]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.0000, 0.0203],
        [0.0000, 0.0000],
        [0.0000, 0.0000],
        [0.0000, 0.0000]], grad_fn=<ReluBackward0>)
out activate input: tensor([[-0.4616],
        [-0.4527],
        [-0.4527],
        [-0.4527]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.],
        [0.],
        [0.],
        [0.]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[ 0.3000, -0.0958],
        [-0.1850, -0.4762]])), ('hide.bias', tensor([-0.6011,  0.0203])), ('out.weight', tensor([[-0.4714, -0.4393]])), ('out.bias', tensor([-0.4527]))])
loss: 50.0

隐藏层激活函数的输入全是负数，输出全是0。
输出层激活函数输入全是负数，输出全是0。
参数不更新。

第六种

##########################epoches: 0 ##############################
hide activate input: tensor([[-0.6065, -0.6682],
        [-1.3076, -0.7910],
        [-1.2815, -0.8835],
        [-1.9827, -1.0063]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]], grad_fn=<ReluBackward0>)
out activate input: tensor([[0.4852],
        [0.4852],
        [0.4852],
        [0.4852]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.4852],
        [0.4852],
        [0.4852],
        [0.4852]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[-0.6750, -0.7011],
        [-0.2153, -0.1228]])), ('hide.bias', tensor([-0.6065, -0.6682])), ('out.weight', tensor([[0.6479, 0.0477]])), ('out.bias', tensor([0.4852]))])
loss: 0.6935874223709106
...
##########################epoches: 199 ##############################
hide activate input: tensor([[-0.6065, -0.6682],
        [-1.3076, -0.7910],
        [-1.2815, -0.8835],
        [-1.9827, -1.0063]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]], grad_fn=<ReluBackward0>)
out activate input: tensor([[0.4933],
        [0.4933],
        [0.4933],
        [0.4933]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.4933],
        [0.4933],
        [0.4933],
        [0.4933]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[-0.6750, -0.7011],
        [-0.2153, -0.1228]])), ('hide.bias', tensor([-0.6065, -0.6682])), ('out.weight', tensor([[0.6479, 0.0477]])), ('out.bias', tensor([0.4933]))])
loss: 0.6932364106178284

隐藏层激活函数的输入全是负数，输出全是0。
输出层激活函数输入全是正数，输出全是正数。
隐藏参数不更新，输出层参数只有偏置b更新。
（这里意识到和之前的结论相违背，至少在pytorch中的偏置b是在更新的。）

第七种

##########################epoches: 0 ##############################
hide activate input: tensor([[-0.4791,  0.4359],
        [ 0.0395, -0.1372],
        [-0.0212,  0.5126],
        [ 0.4974, -0.0605]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.0000, 0.4359],
        [0.0395, 0.0000],
        [0.0000, 0.5126],
        [0.4974, 0.0000]], grad_fn=<ReluBackward0>)
out activate input: tensor([[-0.0088],
        [ 0.1920],
        [-0.0430],
        [ 0.2640]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.0000],
        [0.1920],
        [0.0000],
        [0.2640]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[ 0.4578,  0.5186],
        [ 0.0767, -0.5731]])), ('hide.bias', tensor([-0.4791,  0.4359])), ('out.weight', tensor([[ 0.1572, -0.4464]])), ('out.bias', tensor([0.1858]))])
loss: 25.48916244506836
...
##########################epoches: 98 ##############################
hide activate input: tensor([[-0.4676, -0.7569],
        [ 0.0624, -1.3301],
        [-0.0150, -1.8832],
        [ 0.5150, -2.4563]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.0000, 0.0000],
        [0.0624, 0.0000],
        [0.0000, 0.0000],
        [0.5150, 0.0000]], grad_fn=<ReluBackward0>)
out activate input: tensor([[2.8760],
        [2.8850],
        [2.8760],
        [2.9508]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[2.8760],
        [2.8850],
        [2.8760],
        [2.9508]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[ 0.4526,  0.5300],
        [-1.1263, -0.5731]])), ('hide.bias', tensor([-0.4676, -0.7569])), ('out.weight', tensor([[0.1453, 0.9219]])), ('out.bias', tensor([2.8760]))])

报错了：

Traceback (most recent call last):
  File "C:/Users/lenovo/PycharmProjects/pythonProject1/deep_learning/实验五 前馈神经网络/XOR.py", line 40, in <module>
    l=loss(pre_y,y)
  File "E:\anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\loss.py", line 613, in forward
    return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
  File "E:\anaconda\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 3083, in binary_cross_entropy
    return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
RuntimeError: all elements of input should be between 0 and 1

隐藏层激活函数的输入部分是负数，输出部分是0。
输出层激活函数输入部分是负数，输出部分是0。
参数全部在更新。

第八种

##########################epoches: 0 ##############################
hide activate input: tensor([[-0.0765,  0.0088],
        [ 0.5111,  0.0178],
        [ 0.5068, -0.3536],
        [ 1.0944, -0.3445]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.0000, 0.0088],
        [0.5111, 0.0178],
        [0.5068, 0.0000],
        [1.0944, 0.0000]], grad_fn=<ReluBackward0>)
out activate input: tensor([[ 0.1682],
        [-0.1535],
        [-0.1582],
        [-0.5324]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.1682],
        [0.0000],
        [0.0000],
        [0.0000]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[ 0.5833,  0.5876],
        [-0.3623,  0.0091]])), ('hide.bias', tensor([-0.0765,  0.0088])), ('out.weight', tensor([[-0.6368,  0.4180]])), ('out.bias', tensor([0.1646]))])
loss: 50.046051025390625
...
##########################epoches: 199 ##############################
hide activate input: tensor([[-7.6483e-02, -1.7142e-05],
        [ 5.1114e-01,  9.0340e-03],
        [ 5.0682e-01, -3.6234e-01],
        [ 1.0944e+00, -3.5329e-01]], grad_fn=<AddmmBackward0>)
hide activate output: tensor([[0.0000, 0.0000],
        [0.5111, 0.0090],
        [0.5068, 0.0000],
        [1.0944, 0.0000]], grad_fn=<ReluBackward0>)
out activate input: tensor([[ 0.1069],
        [-0.2148],
        [-0.2158],
        [-0.5900]], grad_fn=<AddmmBackward0>)
out activate output: tensor([[0.1069],
        [0.0000],
        [0.0000],
        [0.0000]], grad_fn=<ReluBackward0>)
OrderedDict([('hide.weight', tensor([[ 0.5833,  0.5876],
        [-0.3623,  0.0091]])), ('hide.bias', tensor([-7.6483e-02, -1.7142e-05])), ('out.weight', tensor([[-0.6368,  0.4179]])), ('out.bias', tensor([0.1069]))])
loss: 50.02827835083008

隐藏层激活函数的输入部分是负数，输出部分是0。
输出层激活函数输入部分是负数，输出部分是0。
隐藏层只有偏置在更新，输出层参数部分在更新。

实际上还有其他情况。比如说当隐藏层和输出层中都有负数时，其负数的分布情况不同也会有不同的结果。
综上，使用relu函数作为激活函数出现了出现参数不更新的情况，当relu函数输入负数时，其输出变为了0。当输入全部为负数时，输出全为零。这时候反向传播计算梯度时，因为损失对零求偏导是没有意义的，这个节点的参数自然也就没法进行更新，那么它前面的节点也就没办法更新和这个无意义的节点有关的部分。
只有当每次参数更新都保证Relu的输入都为正数的时候，模型才会理想的进行，但是这有一定运气的成分，因此经常使用Sigmoid作为激活函数进行二分类任务。

习题4-3

上个习题就是个很好的例子，用之前的图修改一下。
在这里插入图片描述

例如，当h1的Relu输出为0时，也就是h1输出为0时，这个节点的参数就没办法更新了。
$\frac{\partial loss}{\partial w_1}=\frac{\partial loss}{\partial y}*\frac{\partial y}{\partial h_1}*\frac{\partial h_1}{\partial w_1}$
y对h1求偏导是无意义的，即 $\frac{\partial y}{\partial h_1}$ 没有意义，自然就没办法计算，参数w1无法更新。
而当隐藏层数更深时，例如在前面再加一层隐藏层：
在这里插入图片描述
那么当h3输出为0时，首先它本身的参数时没办法更新的，w5,w6无法更新。
$\frac{\partial loss}{\partial w_5}=\frac{\partial loss}{\partial y}*\frac{\partial y}{\partial h_3}*\frac{\partial h_3}{\partial w_5}$
$\frac{\partial loss}{\partial w_6}=\frac{\partial loss}{\partial y}*\frac{\partial y}{\partial h_3}*\frac{\partial h_3}{\partial w_6}$
而他前面的几个参数w1,w2,w3,w4关于h3节点的那一部分导数也就没办法求了，只能靠h4来计算导数，进而进行参数的更新。
$\frac{\partial loss}{\partial w_1}=\frac{\partial loss}{\partial y}*（\frac{\partial y}{\partial h_3}*\frac{\partial h_3}{\partial h_1}*\frac{\partial h_1}{\partial w_1}+\frac{\partial y}{\partial h_4}*\frac{\partial h_4}{\partial h_1}*\frac{\partial h_1}{\partial w_1}）$
这里的
$\frac{\partial y}{\partial h_3}*\frac{\partial h_3}{\partial h_1}*\frac{\partial h_1}{\partial w_1}$
是没办法计算的，因此偏导数的计算只能来自
$\frac{\partial y}{\partial h_4}*\frac{\partial h_4}{\partial h_1}*\frac{\partial h_1}{\partial w_1}$
那么可以认为：
$\frac{\partial loss}{\partial w_1}=\frac{\partial loss}{\partial y}*（\frac{\partial y}{\partial h_4}*\frac{\partial h_4}{\partial h_1}*\frac{\partial h_1}{\partial w_1}）$
那么h1这个节点，就死亡了，他无法更新参数，也无法为其他节点的参数更新做出贡献。
这就是Relu的死亡结点问题。
解决办法：用不输出0的激活函数就没有这种问题了，
Elu或LeakyRelu是Relu的变种，之前分析过，现在用到这张图。
在这里插入图片描述
注意：这里的Relu函数求导的grad of Relu小于零的部分实际上是不存在的，因为对0求导无意义。
那么用Elu或LeakyRelu，作为激活函数尝试一下。

self.relu=torch.nn.ELU()

试了很多次，都没有出现问题，但是有时候还会报错：

Traceback (most recent call last):
  File "C:/Users/lenovo/PycharmProjects/pythonProject1/deep_learning/实验五 前馈神经网络/XOR.py", line 40, in <module>
    l=loss(pre_y,y)
  File "E:\anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\loss.py", line 613, in forward
    return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
  File "E:\anaconda\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 3083, in binary_cross_entropy
    return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
RuntimeError: all elements of input should be between 0 and 1

报的错和上边一样，不过可以肯定的是参数都在正常更新，且没有死亡节点问题的出现了。至于问什么会报这个错误还亟待研究。

在这里插入图片描述

习题4-7

对偏置b进行正则化会导致偏置b的欠拟合。

对于神经网络正则化，一般只对每一层仿射变换的weights进行正则化惩罚，而不对偏置bias进行正则化。

相比于weight，bias训练准确需要的数据要更少。每个weight指定了两个变量之间的关系。weights训练准确需要在很多种情况下的同时观察两个变量。每个bias只控制一个变量。这意味着不对bias正则化，没有引入很多方差（variance）。同时，对bias进行正则化容易引起欠拟合。
（来自《DeepLearning》Chapter 7.1）

习题4-8

这个之前也分析过，参数初始全为0会导致每次参数更新值都一样，出现对称权重的现象。因此常会随机初始化参数。

习题4-9

梯度消失问题之前分析过，主要是因为对Sigmoid函数求导时，导数向前叠加越来越小，对前几层参数的偏导会小到消失。如果增加这几层参数的学习率，偏导乘上学习率会使参数继续利用上这些很小很小的偏导数，进行更明显的更新，在一定程度上能缓解梯度消失的问题。但是如果将全部学习率都提高，会容易对后面节点产生更大的’刺激‘，后面层的参数更新世由于步幅太大容易跳过最优解。

总结：
1.深入分析并了解了Relu死亡节点问题及其解决方法。
2.违背了之前偏置b不更新的结论。这个的原因尚不是特别清楚。
参考：
为什么一般不对偏置b进行正则化？
这个作者有过类似的讨论。
但是只分析了偏置b对模型的过拟合现象不做贡献。似乎偏置b在模型中显得不是特别重要。
3.对于偏置b的理解加深了一些。
ref：
https://blog.csdn.net/xylin1012/article/details/71429566
http://events.jianshu.io/p/00a405962dca
https://blog.csdn.net/qq_36810398/article/details/90756828
https://blog.csdn.net/qq_58153224/article/details/127188012?spm=1001.2014.3001.5501

OhMy棒棒糖yqy

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
NNDL 作业4：第四章课后题 Relu死亡节点问题

这时候反向传播计算梯度时，因为损失对零求偏导是没有意义的，这个节点的参数自然也就没法进行更新，那么它前面的节点也就没办法更新和这个无意义的节点有关的部分。坐标为（0，0），（1，1）的点的标签的标签表示为0，坐标为（0，1），（1，0）的点的标签为1。只有当每次参数更新都保证Relu的输入都为正数的时候，模型才会理想的进行，但是这有一定运气的成分，因此经常使用Sigmoid作为激活函数进行二分类任务。实际上还有其他情况。例如，当h1的Relu输出为0时，也就是h1输出为0时，这个节点的参数就没办法更新了。
复制链接

扫一扫