自然语言处理(NLP)——前馈网络

日暖夕凉

已于 2024-06-26 09:51:03 修改

阅读量802

点赞数 20

文章标签：自然语言处理人工智能

于 2024-06-26 09:41:58 首次发布

本文链接：https://blog.csdn.net/m0_62434767/article/details/139975433

版权

一、实验介绍

*1.实验内容

在这个实验中，我们将研究传统上被称为前馈网络的神经网络模型，包括两种前馈神经网络：多层感知器和卷积神经网络。多层感知器通过在单层内组合多个感知器并将多个层堆叠在一起来扩展我们在实验3中研究的简单感知器的结构。稍后我们将介绍多层感知器，并在“示例:带有多层感知器的姓氏分类”中展示它们在多类分类中的应用。

另一种前馈神经网络是卷积神经网络，深受窗口滤波器在处理数字信号时的启发。卷积神经网络通过窗口特性可以学习输入中的局部模式，使其成为计算机视觉的主力，也是检测序列数据中子结构（如单词和句子）的理想选择。我们在“卷积神经网络”中概述卷积神经网络，并在“示例:使用CNN对姓氏进行分类”中展示它们的应用。

在本实验中，多层感知器和卷积神经网络一起讨论，因为它们都是前馈神经网络，并且与递归神经网络（RNNs）形成对比，RNNs允许反馈或循环连接，从而获得来自之前计算的信息。在实验6和实验7中，我们将介绍RNNs以及循环结构对网络的益处。理解这些模型如何运作的一个有用方法是关注计算数据张量时的大小和形状，不同类型的神经网络层对计算的数据张量有不同的影响，深入理解这些影响对于理解这些模型至关重要。

*2.实验要点

* 通过“示例:带有多层感知器的姓氏分类”，掌握多层感知器在多层分类中的应用
* 掌握每种类型的神经网络层对它所计算的数据张量的大小和形状的影响

*3实验环境

Python 3.6.7

*4.实验目录

        * 请将本实验所需数据文件**(`surnames.csv`)**上传至目录：**`/data/surnames/`**.
        * 示例完整代码：
        * exp4-In-Text-Examples.ipynb
        * exp4-munging_surname_dataset.ipynb
        * exp4-2D-Perceptron-MLP.ipynb
        * exp4_4_Classify_Surnames_CNN.ipynb
        * exp4_4_Classify_Surnames_MLP.ipynb

二、The Multilayer Perceptron（多层感知器）

多层感知机（MLP，Multilayer Perceptron）也叫人工神经网络（ANN，Artificial Neural Network），除了输入输出层，它中间可以有多个隐层，最简单的MLP只含一个隐层，即三层的结构。

多层感知器(multilayer Perceptron，MLP)是指可以是感知器的人工神经元组成的多个层次。MPL的层次结构是一个有向无环图。通常，每一层都全连接到下一层，某一层上的每个人工神经元的输出成为下一层若干人工神经元的输入。MLP至少有三层人工神经元。分别为输入层、隐藏层、输出层。它中间可以有多个隐层，最简单的MLP只含一个隐层，即三层的结构。下图为MLP的网络结构。输入层(input layer)由简单的输入人工神经元构成。每个输入神经元至少连接一个隐藏层(hidden layer)的人工神经元。隐藏层表示潜在的变量；层的输入和输出都不会出现在训练集中。隐藏层后面连接的是输出层(output layer)。

*1.A Simple Example: XOR

在以上描述中，介绍了在XOR问题中感知器和多层感知器（MLP）之间的性能差异。通过对星形和圆形数据点的二元分类任务进行训练，图4-3展示了感知器和MLP学习到的决策边界。在左侧面板中，感知器难以找到一个能够准确将星形和圆形数据点分开的决策边界，导致数据点错误分类。而在右侧面板中，MLP学习到了更精确地分离星形和圆形数据点的决策边界，准确地将它们分类。

要注意的是，虽然在图中展示了MLP有两个决策边界，但实际上这两个边界代表的是一个超平面，这是由于MLP在中间表示阶段对数据空间进行了扭曲，使得数据点可以被单个超平面正确分离。在图4-4中，展示了MLP计算的中间表示，这些点的形状表示了它们的类别（星形或圆形）。通过这种“扭曲”数据空间的方式，神经网络可以在数据通过最后一层时使用一条线来将它们分隔开。

相比之下，如图所示，感知器没有额外的中间表示来处理数据的形状，直到数据变得线性可分。这意味着感知器无法有效地将星形和圆形数据点分开。

这个比较清楚地展示了MLP相对于简单感知器的优势，尤其在处理非线性可分问题时。通过引入更多的层和非线性激活函数，MLP能够更好地学习数据之间的复杂关系，从而提高模型的表征能力和泛化能力。

* 2. Implementing MLPs in PyTorch

在本节中，将介绍PyTorch中实现MLP的方法。与简单感知器相比，MLP包含了额外的计算层。在例4-1中提供的实现中，我们使用PyTorch的两个线性模块来实例化这个概念。这些线性模块被命名为fc1和fc2，遵循通用约定将线性模块称为“全连接层”，简称为“fc层”。除了这两个线性层外，还包括修正的线性单元（ReLU）作为非线性，在第一个线性层的输出传递到第二个线性层之前应用。由于层之间的顺序性，必须确保一个层的输出数量等于下一层的输入数量。

在MLP中，使用两个线性层之间的非线性是必要的。如果缺少这一非线性激活函数，两个线性层在数学上会等同于一个线性层，因此无法捕捉复杂的模式。需要强调的是，MLP的实现仅包含了向前传播，PyTorch会根据模型的定义和向前传播的实现来自动计算反向传播和梯度更新，简化了模型训练的过程。

import torch.nn as nn
import torch.nn.functional as F

# 定义多层感知机模型
class MultilayerPerceptron(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        """
        Args:
            input_dim (int): 输入向量的大小
            hidden_dim (int): 第一个全连接层的输出大小
            output_dim (int): 第二个全连接层的输出大小
        """
        super(MultilayerPerceptron, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)  # 定义第一个全连接层
        self.fc2 = nn.Linear(hidden_dim, output_dim)  # 定义第二个全连接层

    def forward(self, x_in, apply_softmax=False):
        """
        MLP的前向传播
        Args:
            x_in (torch.Tensor): 输入数据张量。x_in.shape 应为 (batch, input_dim)
            apply_softmax (bool): 是否对输出进行softmax激活，通常在使用交叉熵损失时不需要进行softmax
        Returns:
            输出张量。输出张量的 shape 应为 (batch, output_dim)
        """
        intermediate = F.relu(self.fc1(x_in))  # 使用ReLU激活函数
        output = self.fc2(intermediate)

        if apply_softmax:
            output = F.softmax(output, dim=1)  # 如果需要使用softmax激活，则进行softmax操作
        return output

在下列代码中我们实例化了MLP。由于MLP实现的通用性，可以为任何大小的输入建模。为了演示，我们使用大小为3的输入维度、大小为4的输出维度和大小为100的隐藏维度。请注意，在print语句的输出中，每个层中的单元数很好地排列在一起，以便为维度3的输入生成维度4的输出。

batch_size = 2  # 每次输入的样本数
input_dim = 3
hidden_dim = 100
output_dim = 4

# 初始化模型
mlp = MultilayerPerceptron(input_dim, hidden_dim, output_dim)
print(mlp)  # 打印模型结构

我们可以通过传递一些随机输入来快速测试模型的“连接”，如示例4-3所示。因为模型还没有经过训练，所以输出是随机的。在花费时间训练模型之前，这样做是一个有用的完整性检查。请注意PyTorch的交互性是如何让我们在开发过程中实时完成所有这些工作的，这与使用NumPy或panda没有太大区别:

import torch

# 定义函数，打印张量的类型、形状和内容
def describe(x):
    print("Type: {}".format(x.type()))
    print("Shape/size: {}".format(x.shape))
    print("Values: \n{}".format(x))

x_input = torch.rand(batch_size, input_dim)  # 生成随机输入张量
describe(x_input)  # 打印输入张量的信息

上述代码的运行结果：

学习如何读取PyTorch模型的输入和输出非常重要。在前面的例子中，MLP模型的输出是一个有两行四列的张量。这个张量中的行与批处理维数对应，批处理维数是小批处理中的数据点的数量。列是每个数据点的最终特征向量。在某些情况下，例如在分类设置中，特征向量是一个预测向量。名称为“预测向量”表示它对应于一个概率分布。预测向量会发生什么取决于我们当前是在进行训练还是在执行推理。在训练期间，输出按原样使用，带有一个损失函数和目标类标签的表示。我们将在“示例:带有多层感知器的姓氏分类”中对此进行深入介绍。

但是，如果想将预测向量转换为概率，则需要额外的步骤。具体来说，需要softmax函数，它用于将一个值向量转换为概率。softmax有许多根。在物理学中，它被称为玻尔兹曼或吉布斯分布;在统计学中，它是多项式逻辑回归;在自然语言处理(NLP)社区，它是最大熵(MaxEnt)分类器。不管叫什么名字，这个函数背后的直觉是，大的正值会导致更高的概率，小的负值会导致更小的概率。在示例4-3中，apply_softmax参数应用了这个额外的步骤。在例4-4中，可以看到相同的输出，但是这次将apply_softmax标志设置为True：

y_output = mlp(x_input, apply_softmax=True)
describe(y_output)

运行结果：在上述情况下，MLP模型是一种将张量映射到另一个张量的线性层。通过在每对线性层之间引入非线性激活函数，模型可以破坏线性关系，从而允许在向量空间中进行扭曲。在分类任务中，这种扭曲应该使得各个类别在空间中更具线性可分性。此外，可以使用 softmax 函数将 MLP 的输出解释为概率值，但不应将 softmax 与特定的损失函数硬编码在一起，因为底层实现可能会利用高级数学或计算技巧。

三、实验步骤

*1.数据预处理

数据集名为surname.csv，它从互联网上不同的姓名来源收集了了来自18个不同国家的10,000个姓氏。数据预处理的目的一是为了平衡数据集中18个国家的姓氏在数据集中的比例，均匀的比例分布有利于训练有效的模型。另外要将数据集分为三个部分:70%到训练数据集，15%到验证数据集，最后15%到测试数据集，以便跨这些部分的类标签分布具有可比性。

from argparse import Namespace
from collections import Counter
import json
import os
import string
 
import numpy as np
import pandas as pd
 
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm_notebook
 
class Vocabulary(object):
    """Class to process text and extract vocabulary for mapping"""
 
    def __init__(self, token_to_idx=None, add_unk=True, unk_token="<UNK>"):
        """
        Args:
            token_to_idx (dict): a pre-existing map of tokens to indices
            add_unk (bool): a flag that indicates whether to add the UNK token
            unk_token (str): the UNK token to add into the Vocabulary
        """
 
        if token_to_idx is None:
            token_to_idx = {}
        self._token_to_idx = token_to_idx
 
        self._idx_to_token = {idx: token 
                              for token, idx in self._token_to_idx.items()}
        
        self._add_unk = add_unk
        self._unk_token = unk_token
        
        self.unk_index = -1
        if add_unk:
            self.unk_index = self.add_token(unk_token) 
        
        
    def to_serializable(self):
        """ returns a dictionary that can be serialized """
        return {'token_to_idx': self._token_to_idx, 
                'add_unk': self._add_unk, 
                'unk_token': self._unk_token}
 
    @classmethod
    def from_serializable(cls, contents):
        """ instantiates the Vocabulary from a serialized dictionary """
        return cls(**contents)
 
    def add_token(self, token):
        """Update mapping dicts based on the token.
        Args:
            token (str): the item to add into the Vocabulary
        Returns:
            index (int): the integer corresponding to the token
        """
        try:
            index = self._token_to_idx[token]
        except KeyError:
            index = len(self._token_to_idx)
            self._token_to_idx[token] = index
            self._idx_to_token[index] = token
        return index
    
    def add_many(self, tokens):
        """Add a list of tokens into the Vocabulary
        
        Args:
            tokens (list): a list of string tokens
        Returns:
            indices (list): a list of indices corresponding to the tokens
        """
        return [self.add_token(token) for token in tokens]
 
    def lookup_token(self, token):
        """Retrieve the index associated with the token 
          or the UNK index if token isn't present.
        
        Args:
            token (str): the token to look up 
        Returns:
            index (int): the index corresponding to the token
        Notes:
            `unk_index` needs to be >=0 (having been added into the Vocabulary) 
              for the UNK functionality 
        """
        if self.unk_index >= 0:
            return self._token_to_idx.get(token, self.unk_index)
        else:
            return self._token_to_idx[token]
 
    def lookup_index(self, index):
        """Return the token associated with the index
        
        Args: 
            index (int): the index to look up
        Returns:
            token (str): the token corresponding to the index
        Raises:
            KeyError: if the index is not in the Vocabulary
        """
        if index not in self._idx_to_token:
            raise KeyError("the index (%d) is not in the Vocabulary" % index)
        return self._idx_to_token[index]
 
    def __str__(self):
        return "<Vocabulary(size=%d)>" % len(self)
 
    def __len__(self):
        return len(self._token_to_idx)
 
class SurnameVectorizer(object):
    """ The Vectorizer which coordinates the Vocabularies and puts them to use"""
    def __init__(self, surname_vocab, nationality_vocab):
        """
        Args:
            surname_vocab (Vocabulary): maps characters to integers
            nationality_vocab (Vocabulary): maps nationalities to integers
        """
        self.surname_vocab = surname_vocab
        self.nationality_vocab = nationality_vocab
 
    def vectorize(self, surname):
        """
        Args:
            surname (str): the surname
        Returns:
            one_hot (np.ndarray): a collapsed one-hot encoding 
        """
        vocab = self.surname_vocab
        one_hot = np.zeros(len(vocab), dtype=np.float32)
        for token in surname:
            one_hot[vocab.lookup_token(token)] = 1
 
        return one_hot
 
    @classmethod
    def from_dataframe(cls, surname_df):
        """Instantiate the vectorizer from the dataset dataframe
        
        Args:
            surname_df (pandas.DataFrame): the surnames dataset
        Returns:
            an instance of the SurnameVectorizer
        """
        surname_vocab = Vocabulary(unk_token="@")
        nationality_vocab = Vocabulary(add_unk=False)
 
        for index, row in surname_df.iterrows():
            for letter in row.surname:
                surname_vocab.add_token(letter)
            nationality_vocab.add_token(row.nationality)
 
        return cls(surname_vocab, nationality_vocab)
 
    @classmethod
    def from_serializable(cls, contents):
        surname_vocab = Vocabulary.from_serializable(contents['surname_vocab'])
        nationality_vocab =  Vocabulary.from_serializable(contents['nationality_vocab'])
        return cls(surname_vocab=surname_vocab, nationality_vocab=nationality_vocab)
 
    def to_serializable(self):
        return {'surname_vocab': self.surname_vocab.to_serializable(),
                'nationality_vocab': self.nationality_vocab.to_serializable()}
 
class SurnameDataset(Dataset):
    def __init__(self, surname_df, vectorizer):
        """
        Args:
            surname_df (pandas.DataFrame): the dataset
            vectorizer (SurnameVectorizer): vectorizer instatiated from dataset
        """
        self.surname_df = surname_df
        self._vectorizer = vectorizer
 
        self.train_df = self.surname_df[self.surname_df.split=='train']
        self.train_size = len(self.train_df)
 
        self.val_df = self.surname_df[self.surname_df.split=='val']
        self.validation_size = len(self.val_df)
 
        self.test_df = self.surname_df[self.surname_df.split=='test']
        self.test_size = len(self.test_df)
 
        self._lookup_dict = {'train': (self.train_df, self.train_size),
                             'val': (self.val_df, self.validation_size),
                             'test': (self.test_df, self.test_size)}
 
        self.set_split('train')
        
        # Class weights
        class_counts = surname_df.nationality.value_counts().to_dict()
        def sort_key(item):
            return self._vectorizer.nationality_vocab.lookup_token(item[0])
        sorted_counts = sorted(class_counts.items(), key=sort_key)
        frequencies = [count for _, count in sorted_counts]
        self.class_weights = 1.0 / torch.tensor(frequencies, dtype=torch.float32)
 
    @classmethod
    def load_dataset_and_make_vectorizer(cls, surname_csv):
        """Load dataset and make a new vectorizer from scratch
        
        Args:
            surname_csv (str): location of the dataset
        Returns:
            an instance of SurnameDataset
        """
        surname_df = pd.read_csv(surname_csv)
        train_surname_df = surname_df[surname_df.split=='train']
        return cls(surname_df, SurnameVectorizer.from_dataframe(train_surname_df))
 
    @classmethod
    def load_dataset_and_load_vectorizer(cls, surname_csv, vectorizer_filepath):
        """Load dataset and the corresponding vectorizer. 
        Used in the case in the vectorizer has been cached for re-use
        
        Args:
            surname_csv (str): location of the dataset
            vectorizer_filepath (str): location of the saved vectorizer
        Returns:
            an instance of SurnameDataset
        """
        surname_df = pd.read_csv(surname_csv)
        vectorizer = cls.load_vectorizer_only(vectorizer_filepath)
        return cls(surname_df, vectorizer)
 
    @staticmethod
    def load_vectorizer_only(vectorizer_filepath):
        """a static method for loading the vectorizer from file
        
        Args:
            vectorizer_filepath (str): the location of the serialized vectorizer
        Returns:
            an instance of SurnameVectorizer
        """
        with open(vectorizer_filepath) as fp:
            return SurnameVectorizer.from_serializable(json.load(fp))
 
    def save_vectorizer(self, vectorizer_filepath):
        """saves the vectorizer to disk using json
        
        Args:
            vectorizer_filepath (str): the location to save the vectorizer
        """
        with open(vectorizer_filepath, "w") as fp:
            json.dump(self._vectorizer.to_serializable(), fp)
 
    def get_vectorizer(self):
        """ returns the vectorizer """
        return self._vectorizer
 
    def set_split(self, split="train"):
        """ selects the splits in the dataset using a column in the dataframe """
        self._target_split = split
        self._target_df, self._target_size = self._lookup_dict[split]
 
    def __len__(self):
        return self._target_size
 
    def __getitem__(self, index):
        """the primary entry point method for PyTorch datasets
        
        Args:
            index (int): the index to the data point 
        Returns:
            a dictionary holding the data point's:
                features (x_surname)
                label (y_nationality)
        """
        row = self._target_df.iloc[index]
 
        surname_vector = \
            self._vectorizer.vectorize(row.surname)
 
        nationality_index = \
            self._vectorizer.nationality_vocab.lookup_token(row.nationality)
 
        return {'x_surname': surname_vector,
                'y_nationality': nationality_index}
 
    def get_num_batches(self, batch_size):
        """Given a batch size, return the number of batches in the dataset
        
        Args:
            batch_size (int)
        Returns:
            number of batches in the dataset
        """
        return len(self) // batch_size
 
    
def generate_batches(dataset, batch_size, shuffle=True,
                     drop_last=True, device="cpu"): 
    """
    A generator function which wraps the PyTorch DataLoader. It will 
      ensure each tensor is on the write device location.
    """
    dataloader = DataLoader(dataset=dataset, batch_size=batch_size,
                            shuffle=shuffle, drop_last=drop_last)
 
    for data_dict in dataloader:
        out_data_dict = {}
        for name, tensor in data_dict.items():
            out_data_dict[name] = data_dict[name].to(device)
        yield out_data_dict

*2.构建多层感知机模型

class SurnameClassifier(nn.Module):
    """ A 2-layer Multilayer Perceptron for classifying surnames """
    def __init__(self, input_dim, hidden_dim, output_dim):
        """
        Args:
            input_dim (int): the size of the input vectors
            hidden_dim (int): the output size of the first Linear layer
            output_dim (int): the output size of the second Linear layer
        """
        super(SurnameClassifier, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)
 
    def forward(self, x_in, apply_softmax=False):
        """The forward pass of the classifier
        
        Args:
            x_in (torch.Tensor): an input data tensor. 
                x_in.shape should be (batch, input_dim)
            apply_softmax (bool): a flag for the softmax activation
                should be false if used with the Cross Entropy losses
        Returns:
            the resulting tensor. tensor.shape should be (batch, output_dim)
        """
        intermediate_vector = F.relu(self.fc1(x_in))
        prediction_vector = self.fc2(intermediate_vector)
 
        if apply_softmax:
            prediction_vector = F.softmax(prediction_vector, dim=1)
 
        return prediction_vector
 
def make_train_state(args):
    return {'stop_early': False,
            'early_stopping_step': 0,
            'early_stopping_best_val': 1e8,
            'learning_rate': args.learning_rate,
            'epoch_index': 0,
            'train_loss': [],
            'train_acc': [],
            'val_loss': [],
            'val_acc': [],
            'test_loss': -1,
            'test_acc': -1,
            'model_filename': args.model_state_file}
 
def update_train_state(args, model, train_state):
    """Handle the training state updates.
    Components:
     - Early Stopping: Prevent overfitting.
     - Model Checkpoint: Model is saved if the model is better
    :param args: main arguments
    :param model: model to train
    :param train_state: a dictionary representing the training state values
    :returns:
        a new train_state
    """
 
    # Save one model at least
    if train_state['epoch_index'] == 0:
        torch.save(model.state_dict(), train_state['model_filename'])
        train_state['stop_early'] = False
 
    # Save model if performance improved
    elif train_state['epoch_index'] >= 1:
        loss_tm1, loss_t = train_state['val_loss'][-2:]
 
        # If loss worsened
        if loss_t >= train_state['early_stopping_best_val']:
            # Update step
            train_state['early_stopping_step'] += 1
        # Loss decreased
        else:
            # Save the best model
            if loss_t < train_state['early_stopping_best_val']:
                torch.save(model.state_dict(), train_state['model_filename'])
 
            # Reset early stopping step
            train_state['early_stopping_step'] = 0
 
        # Stop early ?
        train_state['stop_early'] = \
            train_state['early_stopping_step'] >= args.early_stopping_criteria
 
    return train_state
 
def compute_accuracy(y_pred, y_target):
    _, y_pred_indices = y_pred.max(dim=1)
    n_correct = torch.eq(y_pred_indices, y_target).sum().item()
    return n_correct / len(y_pred_indices) * 100
 
def set_seed_everywhere(seed, cuda):
    np.random.seed(seed)
    torch.manual_seed(seed)
    if cuda:
        torch.cuda.manual_seed_all(seed)
 
def handle_dirs(dirpath):
    if not os.path.exists(dirpath):
        os.makedirs(dirpath)

*3.训练模型

args = Namespace(
    # Data and path information
    surname_csv="surnames_with_splits.csv",
    vectorizer_file="vectorizer.json",
    model_state_file="model.pth",
    save_dir="model_storage/ch4/surname_mlp",
    # Model hyper parameters
    hidden_dim=300,
    # Training  hyper parameters
    seed=1337,
    num_epochs=100,
    early_stopping_criteria=5,
    learning_rate=0.001,
    batch_size=64,
    # Runtime options
    cuda=False,
    reload_from_files=False,
    expand_filepaths_to_save_dir=True,
)
 
if args.expand_filepaths_to_save_dir:
    args.vectorizer_file = os.path.join(args.save_dir,
                                        args.vectorizer_file)
 
    args.model_state_file = os.path.join(args.save_dir,
                                         args.model_state_file)
    
    print("Expanded filepaths: ")
    print("\t{}".format(args.vectorizer_file))
    print("\t{}".format(args.model_state_file))
    
# Check CUDA
if not torch.cuda.is_available():
    args.cuda = False
 
args.device = torch.device("cuda" if args.cuda else "cpu")
    
print("Using CUDA: {}".format(args.cuda))
 
 
# Set seed for reproducibility
set_seed_everywhere(args.seed, args.cuda)
 
# handle dirs
handle_dirs(args.save_dir)
 
if args.reload_from_files:
    # training from a checkpoint
    print("Reloading!")
    dataset = SurnameDataset.load_dataset_and_load_vectorizer(args.surname_csv,
                                                              args.vectorizer_file)
else:
    # create dataset and vectorizer
    print("Creating fresh!")
    dataset = SurnameDataset.load_dataset_and_make_vectorizer(args.surname_csv)
    dataset.save_vectorizer(args.vectorizer_file)
    
vectorizer = dataset.get_vectorizer()
classifier = SurnameClassifier(input_dim=len(vectorizer.surname_vocab), 
                               hidden_dim=args.hidden_dim, 
                               output_dim=len(vectorizer.nationality_vocab))
 
classifier = classifier.to(args.device)
dataset.class_weights = dataset.class_weights.to(args.device)
 
    
loss_func = nn.CrossEntropyLoss(dataset.class_weights)
optimizer = optim.Adam(classifier.parameters(), lr=args.learning_rate)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer,
                                                 mode='min', factor=0.5,
                                                 patience=1)
 
train_state = make_train_state(args)
 
epoch_bar = tqdm_notebook(desc='training routine', 
                          total=args.num_epochs,
                          position=0)
 
dataset.set_split('train')
train_bar = tqdm_notebook(desc='split=train',
                          total=dataset.get_num_batches(args.batch_size), 
                          position=1, 
                          leave=True)
dataset.set_split('val')
val_bar = tqdm_notebook(desc='split=val',
                        total=dataset.get_num_batches(args.batch_size), 
                        position=1, 
                        leave=True)
 
try:
    for epoch_index in range(args.num_epochs):
        train_state['epoch_index'] = epoch_index
 
        # Iterate over training dataset
 
        # setup: batch generator, set loss and acc to 0, set train mode on
 
        dataset.set_split('train')
        batch_generator = generate_batches(dataset, 
                                           batch_size=args.batch_size, 
                                           device=args.device)
        running_loss = 0.0
        running_acc = 0.0
        classifier.train()
 
        for batch_index, batch_dict in enumerate(batch_generator):
            # the training routine is these 5 steps:
 
            # --------------------------------------
            # step 1. zero the gradients
            optimizer.zero_grad()
 
            # step 2. compute the output
            y_pred = classifier(batch_dict['x_surname'])
 
            # step 3. compute the loss
            loss = loss_func(y_pred, batch_dict['y_nationality'])
            loss_t = loss.item()
            running_loss += (loss_t - running_loss) / (batch_index + 1)
 
            # step 4. use loss to produce gradients
            loss.backward()
 
            # step 5. use optimizer to take gradient step
            optimizer.step()
            # -----------------------------------------
            # compute the accuracy
            acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
            running_acc += (acc_t - running_acc) / (batch_index + 1)
 
            # update bar
            train_bar.set_postfix(loss=running_loss, acc=running_acc, 
                            epoch=epoch_index)
            train_bar.update()
 
        train_state['train_loss'].append(running_loss)
        train_state['train_acc'].append(running_acc)
 
        # Iterate over val dataset
 
        # setup: batch generator, set loss and acc to 0; set eval mode on
        dataset.set_split('val')
        batch_generator = generate_batches(dataset, 
                                           batch_size=args.batch_size, 
                                           device=args.device)
        running_loss = 0.
        running_acc = 0.
        classifier.eval()
 
        for batch_index, batch_dict in enumerate(batch_generator):
 
            # compute the output
            y_pred =  classifier(batch_dict['x_surname'])
 
            # step 3. compute the loss
            loss = loss_func(y_pred, batch_dict['y_nationality'])
            loss_t = loss.to("cpu").item()
            running_loss += (loss_t - running_loss) / (batch_index + 1)
 
            # compute the accuracy
            acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
            running_acc += (acc_t - running_acc) / (batch_index + 1)
            val_bar.set_postfix(loss=running_loss, acc=running_acc, 
                            epoch=epoch_index)
            val_bar.update()
 
        train_state['val_loss'].append(running_loss)
        train_state['val_acc'].append(running_acc)
 
        train_state = update_train_state(args=args, model=classifier,
                                         train_state=train_state)
 
        scheduler.step(train_state['val_loss'][-1])
 
        if train_state['stop_early']:
            break
 
        train_bar.n = 0
        val_bar.n = 0
        epoch_bar.update()
except KeyboardInterrupt:
    print("Exiting loop")

*4.预测结果

# compute the loss & accuracy on the test set using the best available model
 
classifier.load_state_dict(torch.load(train_state['model_filename']))
 
classifier = classifier.to(args.device)
dataset.class_weights = dataset.class_weights.to(args.device)
loss_func = nn.CrossEntropyLoss(dataset.class_weights)
 
dataset.set_split('test')
batch_generator = generate_batches(dataset, 
                                   batch_size=args.batch_size, 
                                   device=args.device)
running_loss = 0.
running_acc = 0.
classifier.eval()
 
for batch_index, batch_dict in enumerate(batch_generator):
    # compute the output
    y_pred =  classifier(batch_dict['x_surname'])
    
    # compute the loss
    loss = loss_func(y_pred, batch_dict['y_nationality'])
    loss_t = loss.item()
    running_loss += (loss_t - running_loss) / (batch_index + 1)
 
    # compute the accuracy
    acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
    running_acc += (acc_t - running_acc) / (batch_index + 1)
 
train_state['test_loss'] = running_loss
train_state['test_acc'] = running_acc
 
print("Test loss: {};".format(train_state['test_loss']))
print("Test Accuracy: {}".format(train_state['test_acc']))

预测结果：

def predict_nationality(surname, classifier, vectorizer):
    """Predict the nationality from a new surname
    
    Args:
        surname (str): the surname to classifier
        classifier (SurnameClassifer): an instance of the classifier
        vectorizer (SurnameVectorizer): the corresponding vectorizer
    Returns:
        a dictionary with the most likely nationality and its probability
    """
    vectorized_surname = vectorizer.vectorize(surname)
    vectorized_surname = torch.tensor(vectorized_surname).view(1, -1)
    result = classifier(vectorized_surname, apply_softmax=True)
 
    probability_values, indices = result.max(dim=1)
    index = indices.item()
 
    predicted_nationality = vectorizer.nationality_vocab.lookup_index(index)
    probability_value = probability_values.item()
 
    return {'nationality': predicted_nationality, 'probability': probability_value}
 
new_surname = input("Enter a surname to classify: ")
classifier = classifier.to("cpu")
prediction = predict_nationality(new_surname, classifier, vectorizer)
print("{} -> {} (p={:0.2f})".format(new_surname,
                                    prediction['nationality'],
                                    prediction['probability']))

日暖夕凉

关注

20
点赞
踩
28

收藏

觉得还不错? 一键收藏
0
评论
自然语言处理(NLP)——前馈网络

我们将研究传统上被称为前馈网络的神经网络模型，包括两种前馈神经网络：多层感知器和卷积神经网络。多层感知器通过在单层内组合多个感知器并将多个层堆叠在一起来扩展我们在实验3中研究的简单感知器的结构。稍后我们将介绍多层感知器，并在“示例:带有多层感知器的姓氏分类”中展示它们在多类分类中的应用。
复制链接

扫一扫