实验4：自然语言处理前馈网络

最新推荐文章于 2024-09-17 20:39:46 发布

Root brother

最新推荐文章于 2024-09-17 20:39:46 发布

阅读量831

点赞数 39

文章标签：自然语言处理人工智能

本文链接：https://blog.csdn.net/hsidkxkalls/article/details/140058328

版权

简单感知机可以用于解决二分类问题，但在线性不可分问题和多维度数据中存在短板，本次我们学习前馈神经网络，包括多层感知机和卷积神经网络。

1、前馈神经网络

1.1多层感知机

多层感知机相比简单感知机具有更强的计算能力，它引入了隐藏层来处理非线性问题。隐藏层的神经元可以学习到更复杂的特征表示，从而提升分类准确度。多层感知机通过反向传播算法来训练神经网络，不断优化权重和阈值，使得网络能够逐渐逼近目标函数。

1.1.1多层感知机结构

多层感知机包括三个层次，由输入层，隐藏层，输出层构成。每个层都由多个神经元组成，每个神经元与上一层的所有神经元相连接。多层感知机的基本原理是通过输入层接收输入数据，然后通过权重与偏置进行线性组合和激活函数处理，最终得到输出结果。

最简单的多层感知机由三个表示阶段和两个线性层组成。第一阶段是输入向量。这是给定给模型的向量。给定输入向量，第一个线性层计算一个隐藏向量——表示的第二阶段。隐藏向量之所以这样被调用，是因为它是位于输入和输出之间的层的输出。使用这个隐藏的向量，第二个线性层计算一个输出向量。

1.1.2多层感知机工作原理

输入传递：输入数据经过输入层传递到隐藏层，每个神经元对输入进行加权和，并将结果应用非线性激活函数（如Sigmoid、ReLU等），生成隐藏层的输出。

隐藏层传递：隐藏层的输出再传递到下一个隐藏层，依次类推，直到传递到输出层。每一层的输出作为下一层的输入，通过权重和激活函数处理。

输出生成：输出层接收最后一个隐藏层的输出，应用适当的激活函数（如Softmax用于分类任务、恒等函数用于回归任务），生成最终的预测结果。

1.1.3多层感知机优势

多层感知机通过多层隐藏层进行特征提取和抽象，提高了模型的表达能力和学习能力。MLP能够适应各种数据类型和问题，包括分类和回归任务，在合适的数据量和参数设置下能够表现出很好的泛化能力，成为解决实际问题的强大工具。

1.2卷积神经网络

卷积神经网络（Convolutional Neural Network，CNN）是一种深度学习模型，特别适用于处理具有网格结构（比如图像、音频）的数据。

1.2.1卷积

定义卷积核：卷积操作需要一个卷积核（也称为滤波器），它是一个小的矩阵，包含一些权重参数。

滑动窗口：将卷积核与输入数据进行卷积操作时，需要选择一个初始位置，然后按照一定的步幅（stride）和方向（水平、垂直）进行滑动，将卷积核的每个元素与输入数据中对应位置的元素相乘。

点积运算：将卷积核的每个元素与输入数据中对应位置的元素相乘后，再将所有乘积结果相加，得到一个数值。

输出特征图：将滑动窗口在输入数据上移动后，得到的所有数值放到一个新的矩阵中，即为输出特征图。

1.2.2卷积神经网络结构

输入层：接收原始数据，并将其转化为适合卷积操作的形式，通常是一个多维的数组。

卷积层：用于对输入数据进行特征提取。卷积层包含多个卷积核，每个卷积核与输入数据进行卷积操作，生成特征图。

池化层：用于对特征图进行下采样，减少特征图的尺寸和参数数量，常用的池化操作有最大池化和平均池化。

全连接层：将池化层输出的特征图转化为一个一维向量，并通过全连接层进行分类或回归等任务。

输出层：根据具体的任务，选择适当的激活函数来处理全连接层的输出，例如对于二分类任务可以使用Sigmoid函数，多分类任务可以使用Softmax函数。

2、多层感知机处理姓氏分类到其原籍国任务

我们用PyTorch的两个线性模块实例化了这个想法。线性对象被命名为fc1和fc2，它们遵循一个通用约定，即将线性模块称为“完全连接层”，简称为“fc层”。除了这两个线性层外，还有一个修正的线性单元(ReLU)非线性(在实验3“激活函数”一节中介绍)，它在被输入到第二个线性层之前应用于第一个线性层的输出。由于层的顺序性，必须确保层中的输出数量等于下一层的输入数量。使用两个线性层之间的非线性是必要的，因为没有它，两个线性层在数学上等价于一个线性层4，因此不能建模复杂的模式。MLP的实现只实现反向传播的前向传递。这是因为PyTorch根据模型的定义和向前传递的实现，自动计算出如何进行向后传递和梯度更新。

2.1代码实现

class MultilayerPerceptron(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        """
        Args:
            input_dim (int): the size of the input vectors
            hidden_dim (int): the output size of the first Linear layer
            output_dim (int): the output size of the second Linear layer
        """
        super(MultilayerPerceptron, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x_in, apply_softmax=False):
        """The forward pass of the MLP

        Args:
            x_in (torch.Tensor): an input data tensor.
                x_in.shape should be (batch, input_dim)
            apply_softmax (bool): a flag for the softmax activation
                should be false if used with the Cross Entropy losses
        Returns:
            the resulting tensor. tensor.shape should be (batch, output_dim)
        """
        intermediate = F.relu(self.fc1(x_in))
        output = self.fc2(F.dropout(intermediate, p=0.5))

        if apply_softmax:
            output = F.softmax(output, dim=1)
        return output

训练函数定义：

classifier = classifier.to(args.device)
dataset.class_weights = dataset.class_weights.to(args.device)

try:
    for epoch_index in range(args.num_epochs):
        train_state['epoch_index'] = epoch_index

        for batch_index, batch_dict in enumerate(batch_generator):
            # the training routine is these 5 steps:

            # --------------------------------------
            # step 1. zero the gradients
            optimizer.zero_grad()

            # step 2. compute the output
            y_pred = classifier(batch_dict['x_surname'])

            # step 3. compute the loss
            loss = loss_func(y_pred, batch_dict['y_nationality'])
            loss_t = loss.item()
            running_loss += (loss_t - running_loss) / (batch_index + 1)

            # step 4. use loss to produce gradients
            loss.backward()

        train_state['train_loss'].append(running_loss)
        train_state['train_acc'].append(running_acc)

        train_bar.n = 0
        val_bar.n = 0
        epoch_bar.update()
except KeyboardInterrupt:
    print("Exiting loop")

训练损失和准确率：

classifier.load_state_dict(torch.load(train_state['model_filename']))


for batch_index, batch_dict in enumerate(batch_generator):
    # compute the output
    y_pred =  classifier(batch_dict['x_surname'])
    
    # compute the loss
    loss = loss_func(y_pred, batch_dict['y_nationality'])
    loss_t = loss.item()
    running_loss += (loss_t - running_loss) / (batch_index + 1)

train_state['test_loss'] = running_loss
train_state['test_acc'] = running_acc

print("Test loss: {};".format(train_state['test_loss']))
print("Test Accuracy: {}".format(train_state['test_acc']))

一些关键函数：

def make_train_state(args):
    return {'stop_early': False,
            'early_stopping_step': 0,
            'early_stopping_best_val': 1e8,
            'learning_rate': args.learning_rate,
            'epoch_index': 0,
            'train_loss': [],
            'train_acc': [],}

def set_seed_everywhere(seed, cuda):
    np.random.seed(seed)
    torch.manual_seed(seed)
    if cuda:
        torch.cuda.manual_seed_all(seed)

def handle_dirs(dirpath):
    if not os.path.exists(dirpath):
        os.makedirs(dirpath)