HBU_神经网络与深度学习作业4 前馈神经网络

ZodiAc7

已于 2022-10-29 23:32:11 修改

阅读量647

点赞数 1

文章标签：深度学习神经网络机器学习

于 2022-10-09 22:53:26 首次发布

本文链接：https://blog.csdn.net/m0_61227501/article/details/127226671

版权

写在前面的一些内容

本次习题来源于神经网络与深度学习 pdf电子书的第117页和第118页（对应纸质版第102页和第103页）的习题4-2、4-3、4-7、4-8和4-9，具体内容详见 NNDL 作业4 。
本实验报告参考了HBU-NNDL 作业4 作者：不是蒋承翰的部分内容。
水平有限，难免有误，如有错漏之处敬请指正。

习题4-2

试设计一个前馈神经网络来解决 XOR 问题，要求该前馈神经网络具有两个隐藏神经元和一个输出神经元，并使用 ReLU 作为激活函数。

XOR运算网络结构图

事实证明没法用ReLU作为激活函数，因为（习题4-3），同时，用Leaky ReLU训练也会出现问题（包括但不仅限于-0问题），所以这里使用ELU作为激活函数进行训练。

注：上述问题可以参考 Julia：用多层感知机解决异或问题作者：强劲九解决，此处还是继续使用ELU。

代码如下：

import torch

input_x = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]]).to(torch.float32)
real_y = torch.tensor([[0], [1], [1], [0]]).to(torch.float32)

from torch.nn.init import constant_, normal_
import numpy as np

class XOR_module(torch.nn.Module):
    def __init__(self, input_size=2, output_size=1, mean_init=0., std_init=1., b_init=0.0):
        super(XOR_module, self).__init__()
        self.fc1 = torch.nn.Linear(input_size, 2)
        normal_(tensor=self.fc1.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc1.bias, val=b_init)
        self.fc2 = torch.nn.Linear(2, output_size)
        normal_(tensor=self.fc2.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc2.bias, val=b_init)
        # 使用'torch.nn.ELU'定义 elu 激活函数
        self.act = torch.nn.ELU()

    # 前向计算
    def forward(self, inputs):
        outputs = self.fc1(inputs)
        outputs = self.act(outputs)
        outputs = self.fc2(outputs)
        outputs = self.act(outputs)
        return outputs

net = XOR_module()
learing_rate = 0.1
epochs = 10000
loss_function = torch.nn.MSELoss()  # 用交叉熵损失函数会出现维度错误
optimizer = torch.optim.SGD(net.parameters(), lr=learing_rate)

# 进行训练
for epoch in range(epochs):
    out_y = net(input_x)
    loss = loss_function(out_y, real_y)  # 计算损失函数
    loss.backward()  # 反向传播
    optimizer.step()  # 参数更新
    optimizer.zero_grad()  # 对梯度清零，避免造成累加

# 打印计算的权值和偏置
print('w1 = ', net.fc1.weight.detach().numpy())
print('b1 = ', net.fc1.bias.detach().numpy())
print('w2 = ', net.fc2.weight.detach().numpy())
print('b2 = ', net.fc2.bias.detach().numpy())

# 进行测试
input_test = input_x
out_test = net(input_test)
print('input_x', input_test.detach().numpy())
print('out_y', np.around(out_test.detach().numpy()))

代码执行结果：

w1 =  [[ 1.650061   -1.1503348 ]
 [-1.5363238   0.98035395]]
b1 =  [-0.2242281  -0.18534027]
w2 =  [[1.3331912 1.7923447]]
b2 =  [0.5710211]
input_x [[0. 0.]
 [0. 1.]
 [1. 0.]
 [1. 1.]]
out_y [[0.]
 [1.]
 [1.]
 [0.]]