自然语言处理：使用前馈网络原理进行姓氏分类

最新推荐文章于 2024-10-12 21:15:24 发布

くどうしんいち596

最新推荐文章于 2024-10-12 21:15:24 发布

阅读量1.2k

点赞数 28

文章标签：自然语言处理人工智能

本文链接：https://blog.csdn.net/qq_65409321/article/details/139977847

版权

1.深度前馈网络

1.1什么是前馈网络

深度前馈网络（Deep Feedforward Network），也被称为前馈神经网络或多层感知器（Multilayer Perceptron，MLP），是深度学习领域中最基本和最经典的模型之一。它是一种具有多个隐藏层的前馈神经网络模型，通过多层非线性变换来学习数据的表示和特征，从而实现各种机器学习任务。

在深度前馈网络中，信息从输入层流向输出层，每一层都通过一系列的权重和偏置来对输入进行线性组合和非线性变换。这些层之间的连接是前向的，即信息只能从前一层传递到后一层，没有反馈连接。这种结构使得深度前馈网络成为一种有向无环图（DAG），并且可以通过简单的前向传播算法计算输出。

深度前馈网络的每个节点被称为神经元，每个神经元接收来自前一层神经元的输入，并通过激活函数进行非线性变换。激活函数的作用是引入非线性因素，使得深度前馈网络能够学习到更复杂的模式和表示。常见的激活函数包括Sigmoid、ReLU、Tanh等。

深度前馈网络的隐藏层可以视为对输入数据进行一系列非线性映射和特征提取的过程。随着隐藏层数的增加，网络可以学习到越来越复杂的表示和抽象特征，从而提高模型的性能和表达能力。这种深层结构使得深度前馈网络能够逐渐构建出对输入数据更加丰富和高级的表示，有助于解决更复杂的机器学习问题。

深度前馈网络的训练通常使用反向传播（Backpropagation）算法，它基于梯度下降的思想，通过计算输出与真实标签之间的误差，并将误差信号沿着网络反向传播，更新网络中的权重和偏置参数。这个过程不断迭代，直到达到一定的收敛条件。

深度前馈网络在许多机器学习任务中取得了巨大的成功，特别是在图像分类、语音识别、自然语言处理等领域。通过大规模数据集的训练，深度前馈网络可以自动地学习到输入数据的高级特征表示，从而实现高性能的模式识别和预测。

总而言之，深度前馈网络是一种基于多层非线性变换的前馈神经网络模型，通过多层的隐藏层逐渐构建复杂的特征表示。它通过反向传播算法进行训练，并在许多机器学习任务中取得了显著的成果。深度前馈网络的发展推动了深度学习的快速发展，并在人工智能领域产生了深远的影响。

前馈神经网络(FNN)是一种最简单的神经网络结构，其中信息在一个方向上流动，从输入层到输出层，不会有反馈环路。该网络由多个神经元组成，这些神经元通常被组织成几个层，包括输入层、隐藏层和输出层。

1.2多层感知机介绍

多层感知机（MLP，Multilayer Perceptron）也叫人工神经网络（ANN，Artificial Neural Network），除了输入输出层，它中间可以有多个隐层，最简单的MLP只含一个隐层，即三层的结构。

多层感知器(multilayer Perceptron，MLP)是指可以是感知器的人工神经元组成的多个层次。MPL的层次结构是一个有向无环图。通常，每一层都全连接到下一层，某一层上的每个人工神经元的输出成为下一层若干人工神经元的输入。MLP至少有三层人工神经元。分别为输入层、隐藏层、输出层。它中间可以有多个隐层，最简单的MLP只含一个隐层，即三层的结构。

1.3多层感知机工作原理

输入层：

多层感知机的输入层接收特征向量作为输入，每个输入特征对应一个输入神经元。输入层不做任何计算，只是将输入特征传递给下一层隐藏层。

隐藏层：隐藏层是MLP的核心部分，负责学习输入数据的非线性映射关系。每个隐藏层包含多个神经元，每个神经元接收上一层的输出，并进行加权求和、加偏置、激活函数等操作，最终产生输出传递给下一层。

输出层：输出层接收最后一个隐藏层的输出，并进行最终的计算，例如分类、回归等。输出层的神经元数量根据具体任务而定，比如二分类问题通常只有一个输出神经元，多分类问题则有多个输出神经元。

激活函数：在多层感知机中，每个隐藏层和输出层的神经元通常都会使用激活函数来引入非线性。常见的激活函数包括ReLU（Rectified Linear Unit）、Sigmoid、Tanh等，用于增加网络的表达能力，使神经网络可以学习复杂的非线性关系。

反向传播算法：多层感知机通过反向传播算法（Backpropagation）来训练网络。反向传播算法基于梯度下降法，通过计算损失函数对网络参数的偏导数（梯度），然后根据梯度更新网络参数，使得网络的预测结果尽可能接近真实标签。

前向传播：在训练完成后，在推理阶段，输入数据沿着网络的前向传播路径传递，经过隐藏层和输出层，最终得到网络的预测结果。

1.4深度前馈网络的应用

图像识别和计算机视觉：深度前馈网络在图像识别、目标检测、人脸识别等领域中表现出色。通过训练深层神经网络，可以实现对复杂图像的高效分类和识别。

语音识别：深度前馈网络在语音识别系统中被广泛应用，能够将语音信号转换为文本，实现自动语音识别功能。深度学习模型在提高语音识别准确率方面取得了显著的进展。

自然语言处理：在自然语言处理任务中，如文本分类、情感分析、机器翻译等，深度前馈网络也被广泛应用。通过训练深度神经网络，可以提取文本中的关键特征，从而实现自然语言处理任务。

推荐系统：深度前馈网络在推荐系统中能够根据用户历史行为和偏好，预测用户可能喜欢的物品或内容，以实现个性化推荐功能。深度学习提高了推荐系统的准确性和效率。

金融领域：在金融领域，深度前馈网络被用于股票价格预测、风险管理、欺诈检测等任务。通过深度学习模型，可以更好地发现金融数据中的规律和趋势。

医疗诊断：深度前馈网络在医疗领域中用于医疗影像识别、病理诊断、疾病风险评估等任务。深度学习在医疗领域有望提高疾病预测和诊断的准确性。

2.卷积神经网络

2.1卷积神经网络简介

卷积神经网络(convolutional neural network, CNN)，是一类包含卷积计算且具有深度结构的前馈神经网络。卷积神经网络是受生物学上感受野（Receptive Field）的机制而提出的。卷积神经网络专门用来处理具有类似网格结构的数据的神经网络。例如，时间序列数据（可以认为是在时间轴上有规律地采样形成的一维网格）和图像数据（可以看作是二维的像素网格）。

2.2卷积神经网络原理

典型的CNN由卷积层、池化层、全连接层3个部分构成：

卷积层：负责提取图像中的局部特征；
池化层：大幅降低参数量级（降维）；
全连接层：类似传统神经网络的部分，用来输出想要的结果。

2.2.1卷积层

卷积可以理解为使用一个过滤器（卷积核）来过滤图像的各个小区域，从而得到这些小区域的特征值。在具体应用中，往往有多个卷积核，可以认为「每个卷积核代表了一种图像模式」，如果某个图像块与此卷积核卷积出的值大，则认为此图像块十分接近于此卷积核。如果我们设计了 6 个卷积核，可以理解：我们认为这个图像上有 6 种底层纹理模式，也就是我们用 6 种基础模式就能描绘出一副图像。卷积层通过卷积核的过滤提取出图片中局部的特征，与人类视觉的特征提取类似。

2.2 池化层

池化层简单说就是下采样，他可以大大降低数据的维度。需要池化层的原因：即使做完了卷积，图像仍然很大（因为卷积核通常比较小），所以为了降低数据维度，就进行下采样。池化层函数实际上是一个统计函数，例如最大池化、平均池化、累加池化等。

关于池化层，有一个局部线性变换的不变性（invariant）理论：如果输入数据的局部进行了线性变换操作（如平移或旋转等），那么经过池化操作后，输出的结果并不会发生变化。局部平移“不变性”特别有用，尤其是我们关心某个特征是否出现，而不关心它出现的位置时（例如，在模式识别场景中，当我们检测人脸时，我们只关心图像中是否具备人脸的特征，而并不关心人脸是在图像的左上角和右下角）。

2.3 全连接层

经过卷积层和池化层处理过的数据输入到全连接层，得到最终想要的结果。经过卷积层和池化层降维过的数据，全连接层才能”跑得动”，不然数据量太大，计算成本高，效率低下。“全连接”意味着，前层网络中的所有神经元都与下一层的所有神经元连接。

全连接层设计目的在于：它将前面各个层学习到的“分布式特征表示”映射到“样本标记空间”，然后利用损失函数来调控学习过程，最后给出对象的分类预测。

虽然池化层看似是整个网络结构中最不起眼的一步，但是由于其对所有的参数进行“连接”，其会造成大量的冗余参数，不良的设计会导致在全连接层极易出现「过拟合」的现象，对此，可以使用 Dropout 方法来缓解；同时其极高的参数量会导致性能的降低，对此，颜水成博士团队曾发表论文 Network in Network（NIN），提出使用全局均值池化策略（Global Average Pooling，GAP）取代全连接层。

3.实验介绍

3.1 实验目的

了解前馈神经网络在自然语言处理中的应用：通过实验，可以深入了解前馈神经网络在文本分类任务中的应用。了解前馈神经网络如何处理文本数据，提取特征并进行分类。
探究姓氏分类问题：姓氏分类是一个有趣且具有挑战性的自然语言处理问题。通过实验，可以探索如何使用机器学习技术对不同姓氏进行分类，在跨文化背景下体现语言差异。
学习特征提取与表示学习：在前馈神经网络中，通过隐藏层学习到的特征对于分类任务至关重要。实验可以帮助理解神经网络如何学习并利用数据中的特征，并体会表示学习在自然语言处理任务中的重要性。
评估模型性能：实验可以帮助评估前馈神经网络在姓氏分类任务上的性能表现。可以通过准确率、精确率、召回率等指标来评估模型的分类效果，并比较不同模型的性能。
进一步研究与应用：了解前馈神经网络在姓氏分类任务中的表现，可以为进一步研究和应用提供基础。可以探索更多复杂的自然语言处理问题，或者将该技术应用于实际场景，如姓名识别、文本分类等方面。

3.2 实验环境

python 3.7

4.具体代码实现

4.1数据预处理

from argparse import Namespace
from collections import Counter
import json
import os
import string

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm_notebook


class Vocabulary(object):
    """Class to process text and extract vocabulary for mapping"""

    def __init__(self, token_to_idx=None, add_unk=True, unk_token="<UNK>"):
        """
        Args:
            token_to_idx (dict): a pre-existing map of tokens to indices
            add_unk (bool): a flag that indicates whether to add the UNK token
            unk_token (str): the UNK token to add into the Vocabulary
        """

        if token_to_idx is None:
            token_to_idx = {}
        self._token_to_idx = token_to_idx

        self._idx_to_token = {idx: token
                              for token, idx in self._token_to_idx.items()}

        self._add_unk = add_unk
        self._unk_token = unk_token

        self.unk_index = -1
        if add_unk:
            self.unk_index = self.add_token(unk_token)

    def to_serializable(self):
        """ returns a dictionary that can be serialized """
        return {'token_to_idx': self._token_to_idx,
                'add_unk': self._add_unk,
                'unk_token': self._unk_token}

    @classmethod
    def from_serializable(cls, contents):
        """ instantiates the Vocabulary from a serialized dictionary """
        return cls(**contents)

    def add_token(self, token):
        """Update mapping dicts based on the token.
        Args:
            token (str): the item to add into the Vocabulary
        Returns:
            index (int): the integer corresponding to the token
        """
        try:
            index = self._token_to_idx[token]
        except KeyError:
            index = len(self._token_to_idx)
            self._token_to_idx[token] = index
            self._idx_to_token[index] = token
        return index

    def add_many(self, tokens):
        """Add a list of tokens into the Vocabulary

        Args:
            tokens (list): a list of string tokens
        Returns:
            indices (list): a list of indices corresponding to the tokens
        """
        return [self.add_token(token) for token in tokens]

    def lookup_token(self, token):
        """Retrieve the index associated with the token
          or the UNK index if token isn't present.

        Args:
            token (str): the token to look up
        Returns:
            index (int): the index corresponding to the token
        Notes:
            `unk_index` needs to be >=0 (having been added into the Vocabulary)
              for the UNK functionality
        """
        if self.unk_index >= 0:
            return self._token_to_idx.get(token, self.unk_index)
        else:
            return self._token_to_idx[token]

    def lookup_index(self, index):
        """Return the token associated with the index

        Args:
            index (int): the index to look up
        Returns:
            token (str): the token corresponding to the index
        Raises:
            KeyError: if the index is not in the Vocabulary
        """
        if index not in self._idx_to_token:
            raise KeyError("the index (%d) is not in the Vocabulary" % index)
        return self._idx_to_token[index]

    def __str__(self):
        return "<Vocabulary(size=%d)>" % len(self)

    def __len__(self):
        return len(self._token_to_idx)


class SurnameVectorizer(object):
    """ The Vectorizer which coordinates the Vocabularies and puts them to use"""

    def __init__(self, surname_vocab, nationality_vocab):
        """
        Args:
            surname_vocab (Vocabulary): maps characters to integers
            nationality_vocab (Vocabulary): maps nationalities to integers
        """
        self.surname_vocab = surname_vocab
        self.nationality_vocab = nationality_vocab

    def vectorize(self, surname):
        """
        Args:
            surname (str): the surname
        Returns:
            one_hot (np.ndarray): a collapsed one-hot encoding
        """
        vocab = self.surname_vocab
        one_hot = np.zeros(len(vocab), dtype=np.float32)
        for token in surname:
            one_hot[vocab.lookup_token(token)] = 1

        return one_hot

    @classmethod
    def from_dataframe(cls, surname_df):
        """Instantiate the vectorizer from the dataset dataframe

        Args:
            surname_df (pandas.DataFrame): the surnames dataset
        Returns:
            an instance of the SurnameVectorizer
        """
        surname_vocab = Vocabulary(unk_token="@")
        nationality_vocab = Vocabulary(add_unk=False)

        for index, row in surname_df.iterrows():
            for letter in row.surname:
                surname_vocab.add_token(letter)
            nationality_vocab.add_token(row.nationality)

        return cls(surname_vocab, nationality_vocab)

    @classmethod
    def from_serializable(cls, contents):
        surname_vocab = Vocabulary.from_serializable(contents['surname_vocab'])
        nationality_vocab = Vocabulary.from_serializable(contents['nationality_vocab'])
        return cls(surname_vocab=surname_vocab, nationality_vocab=nationality_vocab)

    def to_serializable(self):
        return {'surname_vocab': self.surname_vocab.to_serializable(),
                'nationality_vocab': self.nationality_vocab.to_serializable()}


class SurnameDataset(Dataset):
    def __init__(self, surname_df, vectorizer):
        """
        Args:
            surname_df (pandas.DataFrame): the dataset
            vectorizer (SurnameVectorizer): vectorizer instatiated from dataset
        """
        self.surname_df = surname_df
        self._vectorizer = vectorizer

        self.train_df = self.surname_df[self.surname_df.split == 'train']
        self.train_size = len(self.train_df)

        self.val_df = self.surname_df[self.surname_df.split == 'val']
        self.validation_size = len(self.val_df)

        self.test_df = self.surname_df[self.surname_df.split == 'test']
        self.test_size = len(self.test_df)

        self._lookup_dict = {'train': (self.train_df, self.train_size),
                             'val': (self.val_df, self.validation_size),
                             'test': (self.test_df, self.test_size)}

        self.set_split('train')

        # Class weights
        class_counts = surname_df.nationality.value_counts().to_dict()

        def sort_key(item):
            return self._vectorizer.nationality_vocab.lookup_token(item[0])

        sorted_counts = sorted(class_counts.items(), key=sort_key)
        frequencies = [count for _, count in sorted_counts]
        self.class_weights = 1.0 / torch.tensor(frequencies, dtype=torch.float32)

    @classmethod
    def load_dataset_and_make_vectorizer(cls, surname_csv):
        """Load dataset and make a new vectorizer from scratch

        Args:
            surname_csv (str): location of the dataset
        Returns:
            an instance of SurnameDataset
        """
        surname_df = pd.read_csv(surname_csv)
        train_surname_df = surname_df[surname_df.split == 'train']
        return cls(surname_df, SurnameVectorizer.from_dataframe(train_surname_df))

    @classmethod
    def load_dataset_and_load_vectorizer(cls, surname_csv, vectorizer_filepath):
        """Load dataset and the corresponding vectorizer.
        Used in the case in the vectorizer has been cached for re-use

        Args:
            surname_csv (str): location of the dataset
            vectorizer_filepath (str): location of the saved vectorizer
        Returns:
            an instance of SurnameDataset
        """
        surname_df = pd.read_csv(surname_csv)
        vectorizer = cls.load_vectorizer_only(vectorizer_filepath)
        return cls(surname_df, vectorizer)

    @staticmethod
    def load_vectorizer_only(vectorizer_filepath):
        """a static method for loading the vectorizer from file

        Args:
            vectorizer_filepath (str): the location of the serialized vectorizer
        Returns:
            an instance of SurnameVectorizer
        """
        with open(vectorizer_filepath) as fp:
            return SurnameVectorizer.from_serializable(json.load(fp))

    def save_vectorizer(self, vectorizer_filepath):
        """saves the vectorizer to disk using json

        Args:
            vectorizer_filepath (str): the location to save the vectorizer
        """
        with open(vectorizer_filepath, "w") as fp:
            json.dump(self._vectorizer.to_serializable(), fp)

    def get_vectorizer(self):
        """ returns the vectorizer """
        return self._vectorizer

    def set_split(self, split="train"):
        """ selects the splits in the dataset using a column in the dataframe """
        self._target_split = split
        self._target_df, self._target_size = self._lookup_dict[split]

    def __len__(self):
        return self._target_size

    def __getitem__(self, index):
        """the primary entry point method for PyTorch datasets

        Args:
            index (int): the index to the data point
        Returns:
            a dictionary holding the data point's:
                features (x_surname)
                label (y_nationality)
        """
        row = self._target_df.iloc[index]

        surname_vector = \
            self._vectorizer.vectorize(row.surname)

        nationality_index = \
            self._vectorizer.nationality_vocab.lookup_token(row.nationality)

        return {'x_surname': surname_vector,
                'y_nationality': nationality_index}

    def get_num_batches(self, batch_size):
        """Given a batch size, return the number of batches in the dataset

        Args:
            batch_size (int)
        Returns:
            number of batches in the dataset
        """
        return len(self) // batch_size


def generate_batches(dataset, batch_size, shuffle=True,
                     drop_last=True, device="cpu"):
    """
    A generator function which wraps the PyTorch DataLoader. It will
      ensure each tensor is on the write device location.
    """
    dataloader = DataLoader(dataset=dataset, batch_size=batch_size,
                            shuffle=shuffle, drop_last=drop_last)

    for data_dict in dataloader:
        out_data_dict = {}
        for name, tensor in data_dict.items():
            out_data_dict[name] = data_dict[name].to(device)
        yield out_data_dict

4.2构建多层感知机模型

class SurnameClassifier(nn.Module):
    """ A 2-layer Multilayer Perceptron for classifying surnames """

    def __init__(self, input_dim, hidden_dim, output_dim):
        """
        Args:
            input_dim (int): the size of the input vectors
            hidden_dim (int): the output size of the first Linear layer
            output_dim (int): the output size of the second Linear layer
        """
        super(SurnameClassifier, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x_in, apply_softmax=False):
        """The forward pass of the classifier

        Args:
            x_in (torch.Tensor): an input data tensor. 
                x_in.shape should be (batch, input_dim)
            apply_softmax (bool): a flag for the softmax activation
                should be false if used with the Cross Entropy losses
        Returns:
            the resulting tensor. tensor.shape should be (batch, output_dim)
        """
        intermediate_vector = F.relu(self.fc1(x_in))
        prediction_vector = self.fc2(intermediate_vector)

        if apply_softmax:
            prediction_vector = F.softmax(prediction_vector, dim=1)

        return prediction_vector


def make_train_state(args):
    return {'stop_early': False,
            'early_stopping_step': 0,
            'early_stopping_best_val': 1e8,
            'learning_rate': args.learning_rate,
            'epoch_index': 0,
            'train_loss': [],
            'train_acc': [],
            'val_loss': [],
            'val_acc': [],
            'test_loss': -1,
            'test_acc': -1,
            'model_filename': args.model_state_file}


def update_train_state(args, model, train_state):
    """Handle the training state updates.
    Components:
     - Early Stopping: Prevent overfitting.
     - Model Checkpoint: Model is saved if the model is better
    :param args: main arguments
    :param model: model to train
    :param train_state: a dictionary representing the training state values
    :returns:
        a new train_state
    """

    # Save one model at least
    if train_state['epoch_index'] == 0:
        torch.save(model.state_dict(), train_state['model_filename'])
        train_state['stop_early'] = False

    # Save model if performance improved
    elif train_state['epoch_index'] >= 1:
        loss_tm1, loss_t = train_state['val_loss'][-2:]

        # If loss worsened
        if loss_t >= train_state['early_stopping_best_val']:
            # Update step
            train_state['early_stopping_step'] += 1
        # Loss decreased
        else:
            # Save the best model
            if loss_t < train_state['early_stopping_best_val']:
                torch.save(model.state_dict(), train_state['model_filename'])

            # Reset early stopping step
            train_state['early_stopping_step'] = 0

        # Stop early ?
        train_state['stop_early'] = \
            train_state['early_stopping_step'] >= args.early_stopping_criteria

    return train_state


def compute_accuracy(y_pred, y_target):
    _, y_pred_indices = y_pred.max(dim=1)
    n_correct = torch.eq(y_pred_indices, y_target).sum().item()
    return n_correct / len(y_pred_indices) * 100


def set_seed_everywhere(seed, cuda):
    np.random.seed(seed)
    torch.manual_seed(seed)
    if cuda:
        torch.cuda.manual_seed_all(seed)


def handle_dirs(dirpath):
    if not os.path.exists(dirpath):
        os.makedirs(dirpath)

4.3训练模型

args = Namespace(
    # Data and path information
    surname_csv="surnames_with_splits.csv",
    vectorizer_file="vectorizer.json",
    model_state_file="model.pth",
    save_dir="model_storage/ch4/surname_mlp",
    # Model hyper parameters
    hidden_dim=300,
    # Training  hyper parameters
    seed=1337,
    num_epochs=100,
    early_stopping_criteria=5,
    learning_rate=0.001,
    batch_size=64,
    # Runtime options
    cuda=False,
    reload_from_files=False,
    expand_filepaths_to_save_dir=True,
)

if args.expand_filepaths_to_save_dir:
    args.vectorizer_file = os.path.join(args.save_dir,
                                        args.vectorizer_file)

    args.model_state_file = os.path.join(args.save_dir,
                                         args.model_state_file)

    print("Expanded filepaths: ")
    print("\t{}".format(args.vectorizer_file))
    print("\t{}".format(args.model_state_file))

# Check CUDA
if not torch.cuda.is_available():
    args.cuda = False

args.device = torch.device("cuda" if args.cuda else "cpu")

print("Using CUDA: {}".format(args.cuda))

# Set seed for reproducibility
set_seed_everywhere(args.seed, args.cuda)

# handle dirs
handle_dirs(args.save_dir)

if args.reload_from_files:
    # training from a checkpoint
    print("Reloading!")
    dataset = SurnameDataset.load_dataset_and_load_vectorizer(args.surname_csv,
                                                              args.vectorizer_file)
else:
    # create dataset and vectorizer
    print("Creating fresh!")
    dataset = SurnameDataset.load_dataset_and_make_vectorizer(args.surname_csv)
    dataset.save_vectorizer(args.vectorizer_file)

vectorizer = dataset.get_vectorizer()
classifier = SurnameClassifier(input_dim=len(vectorizer.surname_vocab),
                               hidden_dim=args.hidden_dim,
                               output_dim=len(vectorizer.nationality_vocab))

classifier = classifier.to(args.device)
dataset.class_weights = dataset.class_weights.to(args.device)

loss_func = nn.CrossEntropyLoss(dataset.class_weights)
optimizer = optim.Adam(classifier.parameters(), lr=args.learning_rate)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer,
                                                 mode='min', factor=0.5,
                                                 patience=1)

train_state = make_train_state(args)

epoch_bar = tqdm_notebook(desc='training routine',
                          total=args.num_epochs,
                          position=0)

dataset.set_split('train')
train_bar = tqdm_notebook(desc='split=train',
                          total=dataset.get_num_batches(args.batch_size),
                          position=1,
                          leave=True)
dataset.set_split('val')
val_bar = tqdm_notebook(desc='split=val',
                        total=dataset.get_num_batches(args.batch_size),
                        position=1,
                        leave=True)

try:
    for epoch_index in range(args.num_epochs):
        train_state['epoch_index'] = epoch_index

        # Iterate over training dataset

        # setup: batch generator, set loss and acc to 0, set train mode on

        dataset.set_split('train')
        batch_generator = generate_batches(dataset,
                                           batch_size=args.batch_size,
                                           device=args.device)
        running_loss = 0.0
        running_acc = 0.0
        classifier.train()

        for batch_index, batch_dict in enumerate(batch_generator):
            # the training routine is these 5 steps:

            # --------------------------------------
            # step 1. zero the gradients
            optimizer.zero_grad()

            # step 2. compute the output
            y_pred = classifier(batch_dict['x_surname'])

            # step 3. compute the loss
            loss = loss_func(y_pred, batch_dict['y_nationality'])
            loss_t = loss.item()
            running_loss += (loss_t - running_loss) / (batch_index + 1)

            # step 4. use loss to produce gradients
            loss.backward()

            # step 5. use optimizer to take gradient step
            optimizer.step()
            # -----------------------------------------
            # compute the accuracy
            acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
            running_acc += (acc_t - running_acc) / (batch_index + 1)

            # update bar
            train_bar.set_postfix(loss=running_loss, acc=running_acc,
                                  epoch=epoch_index)
            train_bar.update()

        train_state['train_loss'].append(running_loss)
        train_state['train_acc'].append(running_acc)

        # Iterate over val dataset

        # setup: batch generator, set loss and acc to 0; set eval mode on
        dataset.set_split('val')
        batch_generator = generate_batches(dataset,
                                           batch_size=args.batch_size,
                                           device=args.device)
        running_loss = 0.
        running_acc = 0.
        classifier.eval()

        for batch_index, batch_dict in enumerate(batch_generator):
            # compute the output
            y_pred = classifier(batch_dict['x_surname'])

            # step 3. compute the loss
            loss = loss_func(y_pred, batch_dict['y_nationality'])
            loss_t = loss.to("cpu").item()
            running_loss += (loss_t - running_loss) / (batch_index + 1)

            # compute the accuracy
            acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
            running_acc += (acc_t - running_acc) / (batch_index + 1)
            val_bar.set_postfix(loss=running_loss, acc=running_acc,
                                epoch=epoch_index)
            val_bar.update()

        train_state['val_loss'].append(running_loss)
        train_state['val_acc'].append(running_acc)

        train_state = update_train_state(args=args, model=classifier,
                                         train_state=train_state)

        scheduler.step(train_state['val_loss'][-1])

        if train_state['stop_early']:
            break

        train_bar.n = 0
        val_bar.n = 0
        epoch_bar.update()
except KeyboardInterrupt:
    print("Exiting loop")

4.4预测结果

# compute the loss & accuracy on the test set using the best available model

classifier.load_state_dict(torch.load(train_state['model_filename']))

classifier = classifier.to(args.device)
dataset.class_weights = dataset.class_weights.to(args.device)
loss_func = nn.CrossEntropyLoss(dataset.class_weights)

dataset.set_split('test')
batch_generator = generate_batches(dataset,
                                   batch_size=args.batch_size,
                                   device=args.device)
running_loss = 0.
running_acc = 0.
classifier.eval()

for batch_index, batch_dict in enumerate(batch_generator):
    # compute the output
    y_pred = classifier(batch_dict['x_surname'])

    # compute the loss
    loss = loss_func(y_pred, batch_dict['y_nationality'])
    loss_t = loss.item()
    running_loss += (loss_t - running_loss) / (batch_index + 1)

    # compute the accuracy
    acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
    running_acc += (acc_t - running_acc) / (batch_index + 1)

train_state['test_loss'] = running_loss
train_state['test_acc'] = running_acc

print("Test loss: {};".format(train_state['test_loss']))
print("Test Accuracy: {}".format(train_state['test_acc']))

def predict_nationality(surname, classifier, vectorizer):
    """Predict the nationality from a new surname

    Args:
        surname (str): the surname to classifier
        classifier (SurnameClassifer): an instance of the classifier
        vectorizer (SurnameVectorizer): the corresponding vectorizer
    Returns:
        a dictionary with the most likely nationality and its probability
    """
    vectorized_surname = vectorizer.vectorize(surname)
    vectorized_surname = torch.tensor(vectorized_surname).view(1, -1)
    result = classifier(vectorized_surname, apply_softmax=True)

    probability_values, indices = result.max(dim=1)
    index = indices.item()

    predicted_nationality = vectorizer.nationality_vocab.lookup_index(index)
    probability_value = probability_values.item()

    return {'nationality': predicted_nationality, 'probability': probability_value}


new_surname = input("Enter a surname to classify: ")
classifier = classifier.to("cpu")
prediction = predict_nationality(new_surname, classifier, vectorizer)
print("{} -> {} (p={:0.2f})".format(new_surname,
                                    prediction['nationality'],
                                    prediction['probability']))

vectorizer.nationality_vocab.lookup_index(8)


def predict_topk_nationality(name, classifier, vectorizer, k=5):
    vectorized_name = vectorizer.vectorize(name)
    vectorized_name = torch.tensor(vectorized_name).view(1, -1)
    prediction_vector = classifier(vectorized_name, apply_softmax=True)
    probability_values, indices = torch.topk(prediction_vector, k=k)

    # returned size is 1,k
    probability_values = probability_values.detach().numpy()[0]
    indices = indices.detach().numpy()[0]

    results = []
    for prob_value, index in zip(probability_values, indices):
        nationality = vectorizer.nationality_vocab.lookup_index(index)
        results.append({'nationality': nationality,
                        'probability': prob_value})

    return results


new_surname = input("Enter a surname to classify: ")
classifier = classifier.to("cpu")

k = int(input("How many of the top predictions to see? "))
if k > len(vectorizer.nationality_vocab):
    print("Sorry! That's more than the # of nationalities we have.. defaulting you to max size :)")
    k = len(vectorizer.nationality_vocab)

predictions = predict_topk_nationality(new_surname, classifier, vectorizer, k=k)

print("Top {} predictions:".format(k))
print("===================")
for prediction in predictions:
    print("{} -> {} (p={:0.2f})".format(new_surname,
                                        prediction['nationality'],

5.实验总结

对前馈网络在姓氏分类任务上的实验效果进行总结分析，包括模型训练效果、泛化能力、超参数调节对模型性能的影响等方面。分析实验结果并提出改进方向，如模型结构调整、数据增强等。

结论：通过实验可以得出使用前馈网络进行姓氏分类具有一定的效果，同时也能发现潜在的问题和改进空间，为相关领域的后续工作提供参考和指导。

对于NLP领域中使用前馈神经网络进行姓氏分类的项目，我总结了以下几点重要的感悟：

数据的重要性：数据是模型训练的基础，因此数据的质量和多样性对于模型性能至关重要。在姓氏分类任务中，需要尽可能多地收集具有代表性的姓氏数据，以提高模型的泛化能力。
特征表示的关键性：对数据进行合适的特征表示是模型学习的关键。选择合适的表示方式可以提高模型的准确性和效率。在NLP任务中，词嵌入等技术能够有效地表示词语的语义信息。
模型参数的调优：在训练模型过程中，调节模型的超参数（如学习率、隐藏层节点数等）是提高模型性能的重要手段。通过反复实验和调参，可以找到最优的模型配置。
模型评估的重要性：在训练完成后，需要通过合适的评估指标来评估模型的性能，包括准确率、精确率、召回率等。通过评估结果可以判断模型的优劣，并进一步改进和优化模型。
持续学习和实践：NLP领域的发展日新月异，持续学习最新的技术和方法是提高实践能力的关键。通过实际项目实践，不断积累经验和思考，在解决实际问题中不断提升自己的能力。

参考文献
Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting.