NNDL 作业四

最新推荐文章于 2024-07-25 22:26:10 发布

CikL160

最新推荐文章于 2024-07-25 22:26:10 发布

阅读量59

点赞数 1

分类专栏：作业文章标签：深度学习

本文链接：https://blog.csdn.net/weixin_63316615/article/details/133844557

版权

作业专栏收录该内容

13 篇文章 0 订阅

订阅专栏

1.过程推导-了解BP原理

2.数值计算-手动计算，掌握细节

3.代码实现-numpy手推+pytorch自动

过程推导和数值计算如下：

代码实现：

1.对比【numpy】和【pytorch】程序，总结并陈述。

numpy:

import numpy as np
w1,w2,w3,w4,w5,w6,w7,w8=0.2,-0.4,0.5,0.6,0.1,-0.5,-0.3,0.8
x1,x2=0.5,0.3
y1,y2=0.23,-0.07
print("输入：",x1,x2)
print("输出：",y1,y2)
def sigmoid(x):
    return 1/(1+np.exp(-x))
def forward(x1,x2,y1,y2,w1,w2,w3,w4,w5,w6,w7,w8):
    in_h1=w1*x1+x2*w3
    h1=sigmoid(in_h1)
    in_h2=w2*x1+w4*x2
    h2=sigmoid(in_h2)
    in_o1=w5*h1+h2*w7
    o1=sigmoid(in_o1)
    in_o2=h1*w6+h2*w8
    o2=sigmoid(in_o2)
    print(round(h1,5),round(h2,5))
    print(round(o1,5),round(o2,5))
    error=(1/2)*(o1-y1)**2+(1/2)*(o2-y2)**2
    print("均方误差：",round(error,5))
    return o1,o2,h1,h2
def back_ward(o1,o2,h1,h2):
    d_o1=o1-y1
    d_o2=o2-y2
    d_w5=d_o1*o1*(1-o1)*h1
    d_w7=d_o1*o1*(1-o1)*h2
    d_w6=d_o2*o2*(1-o2)*h1
    d_w8=d_o2*o2*(1-o2)*h2
    d_w1=x1*(1-h1)*h1*(w6*(1-o2)*o2*d_o2+w5*(1-o1)*o1*d_o1)
    d_w3=x2*(1-h1)*h1*(w6*(1-o2)*o2*d_o2+w5*(1-o1)*o1*d_o1)
    d_w2=x1*(1-h2)*h2*(w7*(1-o1)*o1*d_o1+w8*(1-o2)*o2*d_o2)
    d_w4=x2*(1-h2)*h2*(w7*(1-o1)*o1*d_o1+w8*(1-o2)*o2*d_o2)
    print("w的梯度：",round(d_w1,2),round(d_w2,2),round(d_w3,2),round(d_w4,2),
          round(d_w5,2),round(d_w6,2),round(d_w7,2),round(d_w8,2))
    return d_w1,d_w2,d_w3,d_w4,d_w5,d_w6,d_w7,d_w8
def update(w1,w2,w3,w4,w5,w6,w7,w8):
    step=1
    w1=w1-step*d_w1
    w2=w2-step*d_w2
    w3 = w3 - step * d_w3
    w4 = w4 - step * d_w4
    w5 = w5 - step * d_w5
    w6 = w6 - step * d_w6
    w7 = w7 - step * d_w7
    w8 = w8 - step * d_w8
    return w1, w2, w3, w4, w5, w6, w7, w8
if __name__ == "__main__":

    print("权值w0-w7:",round(w1, 2), round(w2, 2), round(w3, 2), round(w4, 2), round(w5, 2), round(w6, 2), round(w7, 2),
          round(w8, 2))


    for i in range(1):
        print("=====第" + str(i+1) + "轮=====")
        o1, o2, h1, h2 = forward(x1, x2, y1, y2, w1, w2, w3, w4, w5, w6, w7, w8)
        d_w1, d_w2, d_w3, d_w4, d_w5, d_w6, d_w7, d_w8 = back_ward(o1, o2,h1, h2)
        w1, w2, w3, w4, w5, w6, w7, w8 = update(w1, w2, w3, w4, w5, w6, w7, w8)

    print("更新后的权值w:",round(w1, 2), round(w2, 2), round(w3, 2), round(w4, 2), round(w5, 2), round(w6, 2), round(w7, 2),
          round(w8, 2))

pytoch:

import torch
x=[0.5,0.3]
y=[0.23,-0.07]
print("输入x0,x1:",x[0],x[1])
print("输出值y0,y1:",y[0],y[1])
w = [torch.Tensor([0.2]), torch.Tensor([-0.4]), torch.Tensor([0.5]), torch.Tensor(
    [0.6]), torch.Tensor([0.1]), torch.Tensor([-0.5]), torch.Tensor([-0.3]), torch.Tensor([0.8])]
for i in range(0,8):
    w[i].requires_grad=True
print("权值w0-w7:")
for i in range(0,8):
    print(w[i].data,end=' ')

def forward(x):
    in_h1=w[0]*x[0]+w[2]*x[1]
    h1=torch.sigmoid(in_h1)
    in_h2=w[1]*x[0]+w[3]*x[1]
    h2=torch.sigmoid(in_h2)

    in_o1=w[4]*h1+w[6]*h2
    o1=torch.sigmoid(in_o1)
    in_o2=w[5]*h1+w[7]*h2
    o2=torch.sigmoid(in_o2)
    print("隐藏层h1,h2:",end=' ')
    print(h1.data,h2.data)
    print("预测值o1,o2：",end=' ')
    print(o1.data,o2.data)
    return o1,o2
def loss(x,y):
    y_pre=forward(x)
    loss_mse=(1/2)*(y_pre[0]-y[0])**2+(1/2)*(y_pre[1]-y[1])**2
    print("损失函数：",loss_mse.item())
    return loss_mse
if __name__=="__main__":
    for k in range(1000):
        print("\n=====第"+str(k+1)+"轮=====")
        l=loss(x,y)
        l.backward()
        print("w的梯度：",end=' ')
        for i in range(0,8):
            print(round(w[i].grad.item(),2),end=' ')
        step=1
        for i in range(0,8):
            w[i].data=w[i].data-step*w[i].grad.data
            w[i].grad.data.zero_()
        print("\n更新后的权值w:")
        for i in range(0,8):
            print(w[i].data,end=' ')

pytorch是一个深度学习框架，所有函数几乎都已经包装在里边了，使用时直接调用即可，例如：求导，各种激活函数等。而numpy需要实现激活函数，手动求导。所以使用numpy写的代码量会比使用pytorch多。但使用numpy会让你对整个流程包括细节更加熟悉。

2.激活函数Sigmoid用pytorch自带函数torch.sigmoid(),观察、总结并陈述。

使用numpy实现的sigmoid函数训练结果：

权值w0-w7: 0.2 -0.4 0.5 0.6 0.1 -0.5 -0.3 0.8
=====第1轮=====
0.56218 0.495
0.47695 0.5287
均方误差： 0.20971
w的梯度： -0.01 0.01 -0.01 0.01 0.03 0.08 0.03 0.07
=====第5轮=====
0.56913 0.48834
0.44506 0.44726
均方误差： 0.1569
w的梯度： -0.01 0.01 -0.01 0.0 0.03 0.07 0.03 0.06
=====第10轮=====
0.58087 0.48571
0.41089 0.36465
均方误差： 0.11082
w的梯度： -0.02 0.0 -0.01 0.0 0.03 0.06 0.02 0.05
更新后的权值w: 0.33 -0.45 0.58 0.57 -0.2 -1.21 -0.56 0.19
=====第1000轮=====
0.77501 0.592
0.22963 0.00981
均方误差： 0.00319
w的梯度： -0.0 -0.0 -0.0 -0.0 -0.0 0.0 -0.0 0.0
更新后的权值w: 1.65 0.18 1.37 0.95 -0.78 -4.27 -1.02 -2.2

自带函数torch.sigmoid训练结果：

权值w0-w7:
tensor([0.2000]) tensor([-0.4000]) tensor([0.5000]) tensor([0.6000]) tensor([0.1000]) tensor([-0.5000]) tensor([-0.3000]) tensor([0.8000]) 
=====第1轮=====
隐藏层h1,h2: tensor([0.5622]) tensor([0.4950])
预测值o1,o2： tensor([0.4769]) tensor([0.5287])
损失函数： 0.2097097933292389
w的梯度： -0.01 0.01 -0.01 0.01 0.03 0.08 0.03 0.07
=====第5轮=====
隐藏层h1,h2: tensor([0.5691]) tensor([0.4883])
预测值o1,o2： tensor([0.4451]) tensor([0.4473])
损失函数： 0.15690487623214722
w的梯度： -0.01 0.01 -0.01 0.0 0.03 0.07 0.03 0.06 
=====第10轮=====
隐藏层h1,h2: tensor([0.5809]) tensor([0.4857])
预测值o1,o2： tensor([0.4109]) tensor([0.3647])
损失函数： 0.11082295328378677
w的梯度： -0.02 0.0 -0.01 0.0 0.03 0.06 0.02 0.05 
更新后的权值w:
tensor([0.3273]) tensor([-0.4547]) tensor([0.5764]) tensor([0.5672]) tensor([-0.1985]) tensor([-1.2127]) tensor([-0.5561]) tensor([0.1883]) 
=====第1000轮=====
隐藏层h1,h2: tensor([0.7750]) tensor([0.5920])
预测值o1,o2： tensor([0.2296]) tensor([0.0098])
损失函数： 0.003185197012498975
w的梯度： -0.0 -0.0 -0.0 -0.0 -0.0 0.0 -0.0 0.0 
更新后的权值w:
tensor([1.6515]) tensor([0.1770]) tensor([1.3709]) tensor([0.9462]) tensor([-0.7798]) tensor([-4.2741]) tensor([-1.0236]) tensor([-2.1999])

从结果上来看，使用自带函数和用numpy写没有区别，所谓结果上的微小差异，是因为使用numpy写这个程序的时候使用round函数限制了小数点后边的位数，这里取两位小数，如果取4位，和pytorch无异。

不过，torch.sigmoid()需要传入tensor格式的数据。

3.激活函数Sigmoid改变为Relu，观察、总结并陈述。

更换成relu:

tensor([0.2000]) tensor([-0.4000]) tensor([0.5000]) tensor([0.6000]) tensor([0.1000]) tensor([-0.5000]) tensor([-0.3000]) tensor([0.8000]) 
=====第1轮=====
隐藏层h1,h2: tensor([0.2500]) tensor([0.])
预测值o1,o2： tensor([0.0250]) tensor([0.])
损失函数： 0.023462500423192978
w的梯度： -0.01 0.0 -0.01 0.0 -0.05 0.0 -0.0 0.0 
更新后的权值w:
tensor([0.2103]) tensor([-0.4000]) tensor([0.5062]) tensor([0.6000]) tensor([0.1513]) tensor([-0.5000]) tensor([-0.3000]) tensor([0.8000]) 
=====第5轮=====
隐藏层h1,h2: tensor([0.2924]) tensor([0.])
预测值o1,o2： tensor([0.0855]) tensor([0.])
损失函数： 0.012893404811620712
w的梯度： -0.02 0.0 -0.01 0.0 -0.04 0.0 0.0 0.0 
更新后的权值w:
tensor([0.2834]) tensor([-0.4000]) tensor([0.5501]) tensor([0.6000]) tensor([0.3346]) tensor([-0.5000]) tensor([-0.3000]) tensor([0.8000]) 
=====第10轮=====
隐藏层h1,h2: tensor([0.3596]) tensor([0.])
预测值o1,o2： tensor([0.1679]) tensor([0.])
损失函数： 0.004378797020763159
w的梯度： -0.01 0.0 -0.01 0.0 -0.02 0.0 0.0 0.0 
更新后的权值w:
tensor([0.3757]) tensor([-0.4000]) tensor([0.6054]) tensor([0.6000]) tensor([0.4892]) tensor([-0.5000]) tensor([-0.3000]) tensor([0.8000])

可以看出，更换成relu函数后，模型收敛速度变得更快。

4. 损失函数MSE用PyTorch自带函数 t.nn.MSELoss()替代，观察、总结并陈述。
更改之后的结果：

=====第999轮=====
正向计算：o1 ,o2
tensor([0.2298]) tensor([0.0050])
损失函数（均方误差）： 0.005628135986626148
	grad W:  -0.0 -0.0 -0.0 -0.0 -0.0 0.0 -0.0 0.0
更新后的权值
tensor([1.8441]) tensor([0.3147]) tensor([1.4865]) tensor([1.0288]) tensor([-0.7469]) tensor([-4.6932]) tensor([-0.9992]) tensor([-2.5217])

与第二问的结果对比，使用MSELoss会使得模型收敛速度变慢。

5.损失函数MSE改变为交叉熵，观察、总结并陈述。

更换之后，训练结果：

=====第999轮=====
正向计算：o1 ,o2
tensor([0.9929]) tensor([0.0072])
损失函数（交叉熵损失）： -0.018253758549690247
	grad W:  -0.0 -0.0 -0.0 -0.0 -0.0 0.0 -0.0 0.0
更新后的权值
tensor([2.2809]) tensor([0.6580]) tensor([1.7485]) tensor([1.2348]) tensor([3.8104]) tensor([-4.2013]) tensor([2.5933]) tensor([-2.0866])

这时候，损失变成了负数。这和交叉熵函数表达式有关。如果输出的概率大于一，结果H就是负的。当概率为0-1之间，结果是正数。交叉熵损失多用于分类问题中，在这个问题中，并不是一个分类问题。

6.改变步长，训练次数，观察，总结并陈述。

更改numpy程序的步长，训练次数：

步长1，训练次数10：

步长10，训练次数10：

步长5，训练次数10：

由图，它的步长越大，下降速度越快。

步长取10，训练次数50：

步长取10，训练次数100：

训练次数越大，损失越小。但是到一定次数后，下降的就非常慢了，损失变化也不大，这时候没必要训练更多次了。

7.权值w1-w8初始值换为随机数，对比“指定权值”的结果，观察、总结并陈述。

指定：

权值w0-w7:
tensor([0.2000]) tensor([-0.4000]) tensor([0.5000]) tensor([0.6000]) tensor([0.1000]) tensor([-0.5000]) tensor([-0.3000]) tensor([0.8000]) 
=====第1轮=====
隐藏层h1,h2: tensor([0.2500]) tensor([0.])
预测值o1,o2： tensor([0.0250]) tensor([0.])
损失函数： 0.023462500423192978
w的梯度： -0.01 0.0 -0.01 0.0 -0.05 0.0 -0.0 0.0 
更新后的权值w:
tensor([0.2103]) tensor([-0.4000]) tensor([0.5062]) tensor([0.6000]) tensor([0.1513]) tensor([-0.5000]) tensor([-0.3000]) tensor([0.8000])

随机：

权值w0-w7:
tensor([[-0.6710]]) tensor([[0.1085]]) tensor([[0.6052]]) tensor([[0.2582]]) tensor([[0.7214]]) tensor([[-1.4928]]) tensor([[0.0076]]) tensor([[-0.8092]]) 
=====第1轮=====
隐藏层h1,h2: tensor([[0.]]) tensor([[0.1317]])
预测值o1,o2： tensor([[0.0010]]) tensor([[0.]])
损失函数： 0.028670920059084892
w的梯度： 0.0 -0.0 0.0 -0.0 -0.0 0.0 -0.03 0.0 
更新后的权值w:
tensor([[-0.6710]]) tensor([[0.1094]]) tensor([[0.6052]]) tensor([[0.2587]]) tensor([[0.7214]]) tensor([[-1.4928]]) tensor([[0.0377]]) tensor([[-0.8092]])

初始值的改变会影响更新速度，从梯度值可以看出来。

8.权值w1-w8初始值换为0，观察、总结并陈述。

梯度变为零了，题目没了意义。

=====第10轮=====
隐藏层h1,h2: tensor([0.]) tensor([0.])
预测值o1,o2： tensor([0.]) tensor([0.])
损失函数： 0.02890000119805336
w的梯度： 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
更新后的权值w:
tensor([0.]) tensor([0.]) tensor([0.]) tensor([0.]) tensor([0.]) tensor([0.]) tensor([0.]) tensor([0.])

9.总结：

1、在推导过程中，事实上，和老师程序中所体现的细节不一样。但是结果一样。详情可见推导中，d_w1部分画框处。

2、步长和激活函数都会影响模型收敛速度。合适的步长能减少时间开销，加快模型收敛速度。

3、使用numpy自己写，会加深对传播过程的记忆和理解，但是会花费较长时间，使用框架虽然能节省很多时间，但不利于理解详细过程。这两种实现方法都应该掌握，在理解的基础之上，使用框架让程序更简洁。

参考博客：https://blog.csdn.net/liuzi_hang/article/details/127116020?spm=1001.2014.3001.5502

CikL160

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
NNDL 作业四

pytorch是一个深度学习框架，所有函数几乎都已经包装在里边了，使用时直接调用即可，例如：求导，各种激活函数等。从结果上来看，使用自带函数和用numpy写没有区别，所谓结果上的微小差异，是因为使用numpy写这个程序的时候使用round函数限制了小数点后边的位数，这里取两位小数，如果取4位，和pytorch无异。3、使用numpy自己写，会加深对传播过程的记忆和理解，但是会花费较长时间，使用框架虽然能节省很多时间，但不利于理解详细过程。1、在推导过程中，事实上，和老师程序中所体现的细节不一样。
复制链接

扫一扫