Stanford CS224N: PyTorch Tutorial (Winter ‘21) —— 斯坦福CS224N PyTorch教程（第三部分）

最新推荐文章于 2024-07-06 02:15:17 发布

放肆荒原

最新推荐文章于 2024-07-06 02:15:17 发布

阅读量470

点赞数 4

分类专栏： AI PyTorch Python 文章标签： pytorch 人工智能 python

原文链接：https://web.stanford.edu/class/cs224n/materials/CS224N_PyTorch_Tutorial.html

版权

AI 同时被 3 个专栏收录

14 篇文章 3 订阅

订阅专栏

Python

6 篇文章 0 订阅

订阅专栏

PyTorch

4 篇文章 0 订阅

订阅专栏

本教程译文的上一部分，请见我的上一篇博文：

Stanford CS224N: PyTorch Tutorial (Winter ‘21) —— 斯坦福CS224N PyTorch教程（第二部分）_放肆荒原的博客-CSDN博客

演示：词窗分类一（Demo: Word Window Classification I）

我们已经学习了PyTorch的基本原理，并建立了解决Toy任务的基本网络。现在，我们将尝试解决一个示例NLP任务。以下是我们会学到的内容：

数据：创建批量张量数据集(Data: Creating a Dataset of Batched Tensors)
建模(Modeling)
训练(Training)
预测(Prediction)

在本节中，我们的目标是训练一个模型，该模型将在语句中找到与地点( LOCATION) 对应的单词，该 LOCATION 将始终具有跨度 1（这意味着 San Fransisco 不会被识别为 LOCATION）。这个任务被称为词窗分类是有原因的，与其让模型在每次前向走过时只查看一个单词，我们更希望它能够考虑相关单词的上下文。也就是说，对于每个单词，我们希望我们的模型知道周围的单词。来吧！

数据(Data)

任何机器学习项目的首要任务是建立训练集，通常我们会使用一个训练语料库。在 NLP 任务中，语料库通常是一个 .txt 或 .csv 文件，其中每一行对应一个句子或一个表格数据点。在我们的Toy任务中，我们假设已经将数据和相应的标签读入到Python 列表中了。

In [71]:

# Our raw data, which consists of sentences
# 我们的原始数据，由句子组成
corpus = [
          "We always come to Paris",
          "The professor is from Australia",
          "I live in Stanford",
          "He comes from Taiwan",
          "The capital of Turkey is Ankara"
         ]

预处理(Preprocessing)

为了让模型更容易去学习，我们通常会对数据进行一些预处理。这在处理文本数据时尤其重要。下面是一些文本预处理的例子：

分词标记(Tokenization)：将句子标记为单词。
小写(Lowercasing)：将所有字母改为小写。
噪声去除(Noise removal)：去除特殊字符（如标点符号）。
停用词去除(Stop words removal)：去除常用词(译者注：指去除句子中常用的辅助性单词)。

需要哪些预处理步骤取决于手头的任务。例如，虽然在某些任务中删除特殊字符很有用，但对于其他任务，它们可能很重要（例如，如果我们处理的是多语种）。对于我们的任务，我们将把单词小写并分词。

In [72]:

# The preprocessing function we will use to generate our training examples
# Our function is a simple one, we lowercase the letters
# and then tokenize the words.
# 用于生成训练示例的预处理函数
# 函数很简单，将字母小写，然后对单词进行分词。
def preprocess_sentence(sentence):
  return sentence.lower().split()

# Create our training set
train_sentences = [sent.lower().split() for sent in corpus]
train_sentences

Out [72]:

[['we', 'always', 'come', 'to', 'paris'],
 ['the', 'professor', 'is', 'from', 'australia'],
 ['i', 'live', 'in', 'stanford'],
 ['he', 'comes', 'from', 'taiwan'],
 ['the', 'capital', 'of', 'turkey', 'is', 'ankara']]

对于我们拥有的每个训练示例，我们还应该有一个相应的标签。回想一下，我们模型的目标是确定哪些词对应于 LOCATION。也就是说，我们希望我们的模型为所有不是 LOCATION 的词输出 0，为 LOCATION 的词输出 1。

In [73]:

# Set of locations that appear in our corpus
# 出现在我们语料库中的一组位置
locations = set(["australia", "ankara", "paris", "stanford", "taiwan", "turkey"])

# Our train labels
# 训练标签
train_labels = [[1 if word in locations else 0 for word in sent] for sent in train_sentences]
train_labels

Out [73]:

[[0, 0, 0, 0, 1],
 [0, 0, 0, 0, 1],
 [0, 0, 0, 1],
 [0, 0, 0, 1],
 [0, 0, 0, 1, 0, 1]]

将单词转换为词嵌入

让我们更仔细地看看训练数据，我们拥有的每个数据点都是一个单词序列。另一方面，我们知道机器学习模型处理的是向量中的数字。我们如何将文字转化为数字？您可能正在考虑词嵌入，您是对的！

想象一下，我们有一个嵌入查找表 E，其中每一行对应一个嵌入。也就是说，我们词汇表中的每个单词在这个表中都有一个对应的嵌入行 i。每当我们想找到一个词的嵌入时，我们将遵循以下步骤：

在嵌入表中找到单词对应的索引i：word->index。
索引嵌入表并获得嵌入：index->embedding。

我们来看第一步。我们应该将词汇表中的所有单词分配给相应的索引。我们可以这样做：

在我们的语料库中找到所有唯一的词。
为每个分配一个索引。

In [74]:

# Find all the unique words in our corpus 
# 在我们的语料库中找到所有唯一的词（译者注：使用set去重）
vocabulary = set(w for s in train_sentences for w in s)
vocabulary

Out [74]:

{'always',
 'ankara',
 'australia',
 'capital',
 'come',
 'comes',
 'from',
 'he',
 'i',
 'in',
 'is',
 'live',
 'of',
 'paris',
 'professor',
 'stanford',
 'taiwan',
 'the',
 'to',
 'turkey',
 'we'}

词汇现在包含我们语料库中的所有单词。另一方面，在测试期间，我们可以看到词汇表中未包含的单词。如果我们能找到一种表示未知单词的方法，我们的模型仍然可以推断它们是否是 LOCATION，因为我们还在查看每个预测的相邻单词。

我们引入了一个特殊的标记 <unk> 来处理超出词汇表的单词。如果需要的话，我们可以为未知标记选择别的字符串。唯一的要求是我们的标记应该是唯一的：我们应该只将这个标记用于未知单词。我们还会将此特殊标记添加到我们的词汇表中。

In [75]:

# Add the unknown token to our vocabulary
# 将未知标记添加到词汇表中
vocabulary.add("<unk>")

前面我们提到我们的任务被称为词窗口分类，因为我们的模型在需要进行预测时，除了给定的词之外，还会查看周围的词。

例如，让我们以句子“We always come to Paris”为例。这句话对应的训练标签是 0, 0, 0, 0, 1 因为只有最后一个词 Paris 是一个 LOCATION。在一次传递中（意味着调用 forward()），我们的模型将尝试为一个单词生成正确的标签。假设我们的模型试图为巴黎生成正确的标签 1，如果我们只让我们的模型看到巴黎，而没有看到其他任何东西，我们将错过经常与 LOCATION 一起出现的单词 to 这个重要信息。

词窗允许我们的模型在进行预测时考虑每个词周围的 +N 或 -N 个词。在我们之前的 Paris 示例中，如果我们的窗口大小为 1，这意味着我们的模型将查看紧接在 Paris 之前和之后出现的单词，这些单词是 to，好吧，没了。现在这引发了另一个问题，Paris 位于我们句子的末尾，因此后面没有其他词。请记住，我们在初始化 PyTorch 模型时定义了它们的输入的维度。如果我们将窗口大小设置为 1，则意味着我们的模型将在每次传递中接受 3 个单词。我们不能让我们的模型时不时的遇到 2 个词。

解决方案是引入一个特殊的标记，例如 <pad>，它将被添加到我们的句子中，以确保每个单词周围都有一个有效的窗口。与 <unk> 标记类似，如果我们愿意，我们可以为我们的 pad 令牌选择另一个字符串，只要我们确保它用于一个独特的目的。

In [76]:

# Add the <pad> token to our vocabulary
# 将 <pad> 标记添加到词表中
vocabulary.add("<pad>")

# Function that pads the given sentence
# We are introducing this function here as an example
# We will be utilizing it later in the tutorial
# 填充给定句子的函数
# 我们这里引入这个函数做个例子
# 后面的教程中会使用它
def pad_window(sentence, window_size, pad_token="<pad>"):
  window = [pad_token] * window_size
  return window + sentence + window

# Show padding example
# 显示填充示例
window_size = 2
pad_window(train_sentences[0], window_size=window_size)

Out [76]:

['<pad>', '<pad>', 'we', 'always', 'come', 'to', 'paris', '<pad>', '<pad>']

现在词汇准备好了，我们为每个词分配一个索引。

In [77]:

# We are just converting our vocabularly to a list to be able to index into it
# Sorting is not necessary, we sort to show an ordered word_to_ind dictionary
# That being said, we will see that having the index for the padding token
# be 0 is convenient as some PyTorch functions use it as a default value
# such as nn.utils.rnn.pad_sequence, which we will cover in a bit
# 我们只是将我们的词汇转换为一个列表，以便能够对其进行索引
# 排序不是必须的，我们排序是为了显示一个有序的 word_to_ind 字典
# 我们将看到将填充标记的索引设为 0 很方便，因为某些 PyTorch 函数将其用作默认值，
# 例如 nn.utils.rnn.pad_sequence，我们将稍后介绍
ix_to_word = sorted(list(vocabulary))

# Creating a dictionary to find the index of a given word
# 创建一个字典来查找给定单词的索引
word_to_ix = {word: ind for ind, word in enumerate(ix_to_word)}
word_to_ix

Out [77]:

{'<pad>': 0,
 '<unk>': 1,
 'always': 2,
 'ankara': 3,
 'australia': 4,
 'capital': 5,
 'come': 6,
 'comes': 7,
 'from': 8,
 'he': 9,
 'i': 10,
 'in': 11,
 'is': 12,
 'live': 13,
 'of': 14,
 'paris': 15,
 'professor': 16,
 'stanford': 17,
 'taiwan': 18,
 'the': 19,
 'to': 20,
 'turkey': 21,
 'we': 22}

好了！准备将训练句子转换为与每个标记对应的索引序列。

In [78]:

# Given a sentence of tokens, return the corresponding indices
# 给定一个标记的句子，返回对应的索引
def convert_token_to_indices(sentence, word_to_ix):
  indices = []
  for token in sentence:
    # Check if the token is in our vocabularly. If it is, get it's index. 
    # If not, get the index for the unknown token.
    # 检查令牌是否在我们的词汇表中。 如果是，获取它的索引。
    # 如果没有，获取未知标记的索引。
    if token in word_to_ix:
      index = word_to_ix[token]
    else:
      index = word_to_ix["<unk>"]
    indices.append(index)
  return indices

# More compact version of the same function
# 相同功能的更紧凑版本
def _convert_token_to_indices(sentence, word_to_ix):
  return [word_to_ind.get(token, word_to_ix["<unk>"]) for token in sentence]

# Show an example
# 展示一个例子
example_sentence = ["we", "always", "come", "to", "kuwait"]
example_indices = convert_token_to_indices(example_sentence, word_to_ix)
restored_example = [ix_to_word[ind] for ind in example_indices]

print(f"Original sentence is: {example_sentence}")
print(f"Going from words to indices: {example_indices}")
print(f"Going from indices to words: {restored_example}")

Original sentence is: ['we', 'always', 'come', 'to', 'kuwait']
Going from words to indices: [22, 2, 6, 20, 1]
Going from indices to words: ['we', 'always', 'come', 'to', '<unk>']

在上面的例子中，kuwait 显示为 <unk>，因为它不包括在词汇表中。我们把 train_sentences 转换为 example_padded_indices。

In [79]:

# Converting our sentences to indices
# 将我们的句子转换为索引
example_padded_indices = [convert_token_to_indices(s, word_to_ix) for s in train_sentences]
example_padded_indices

Out [79]:

[[22, 2, 6, 20, 15],
 [19, 16, 12, 8, 4],
 [10, 13, 11, 17],
 [9, 7, 8, 18],
 [19, 5, 14, 21, 12, 3]]

现在我们有了词汇表中每个单词的索引，我们可以在 PyTorch 中创建一个带有 nn.Embedding 类的嵌入表：nn.Embedding(num_words, embedding_dimension) ，其中 num_words 是我们词汇表中的单词数，embedding_dimension 是我们想要的嵌入维度。 nn.Embedding 没有什么特别之处：它只是一个围绕 NxE 维可训练张量的包装类，其中 N 是我们词汇表中的单词数，E 是嵌入维度的数量。这张表最初是随机的，但会随着时间的推移而改变。当我们训练我们的网络时，梯度将一直反向传播到嵌入层，因此我们的词嵌入将被更新。我们将在我们的模型中初始化我们将用于我们的模型的嵌入层，但这里先展示一个例子。

In [80]:

# Creating an embedding table for our words
# 为我们的单词创建一个嵌入表
embedding_dim = 5
embeds = nn.Embedding(len(vocabulary), embedding_dim)

# Printing the parameters in our embedding table
# 打印嵌入表中的参数
list(embeds.parameters())

Out [80]:

[Parameter containing:
 tensor([[-0.5421,  0.6919,  0.8236, -1.3510,  1.4048],
         [ 1.2983,  1.4740,  0.1002, -0.5475,  1.0871],
         [ 1.4604, -1.4934, -0.4363, -0.3231, -1.9746],
         [ 0.8021,  1.5121,  0.8239,  0.9865, -1.3801],
         [ 0.3502, -0.5920,  0.9295,  0.6062, -0.6258],
         [ 0.5038, -1.0187,  0.2860,  0.3231, -1.2828],
         [ 1.5232, -0.5983, -0.4971, -0.5137,  1.4319],
         [ 0.3826,  0.6501, -0.3948,  1.3998, -0.5133],
         [-0.1728, -0.7658,  0.2873, -2.1812,  0.9506],
         [-0.5617,  0.4552,  0.0618, -1.7503,  0.2192],
         [-0.5405,  0.7887, -0.9843, -0.6110,  0.6391],
         [ 0.6581, -0.7067,  1.3208,  1.3860, -1.5113],
         [ 1.1594,  0.4977, -1.9175,  0.0916,  0.0085],
         [ 0.3317,  1.8169,  0.0802, -0.1456, -0.7304],
         [ 0.4997, -1.4895,  0.1237, -0.4121,  0.8909],
         [ 0.6732,  0.4117, -0.5378,  0.6632, -2.7096],
         [-0.4580, -0.9436, -1.6345,  0.1284, -1.6147],
         [-0.3537,  1.9635,  1.0702, -0.1894, -0.8822],
         [-0.4057, -1.2033, -0.7083,  0.4087, -1.1708],
         [-0.6373,  0.5272,  1.8711, -0.5865, -0.7643],
         [ 0.4714, -2.5822,  0.4338,  0.1537, -0.7650],
         [-2.1828,  1.3178,  1.3833,  0.5018, -1.7209],
         [-0.5354,  0.2153, -0.1482,  0.3903,  0.0900]], requires_grad=True)]

为了在词汇表中获得一个词的词嵌入，我们需要做的就是创建一个查找张量。查找张量只是一个包含了我们要在nn.Embedding 类中查找的索引张量，并期望它是一个 Long Tensor 类型的索引张量，因此我们应该相应地创建该张量。

In [81]:

# Get the embedding for the word Paris
# 获取单词巴黎的嵌入
index = word_to_ix["paris"]
index_tensor = torch.tensor(index, dtype=torch.long)
paris_embed = embeds(index_tensor)
paris_embed

Out [81]:

tensor([ 0.6732,  0.4117, -0.5378,  0.6632, -2.7096],
       grad_fn=<EmbeddingBackward>)

In [82]:

# We can also get multiple embeddings at once
# 我们也可以一次得到多个嵌入
index_paris = word_to_ix["paris"]
index_ankara = word_to_ix["ankara"]
indices = [index_paris, index_ankara]
indices_tensor = torch.tensor(indices, dtype=torch.long)
embeddings = embeds(indices_tensor)
embeddings

Out [82]:

tensor([[ 0.6732,  0.4117, -0.5378,  0.6632, -2.7096],
        [ 0.8021,  1.5121,  0.8239,  0.9865, -1.3801]],
       grad_fn=<EmbeddingBackward>)

通常，我们将嵌入层定义为模型的一部分，您将在本notebook的后面部分中看到。

批处理语句(Batching Sentences)

我们在课堂上学习了批处理，在更新之前等待处理整个训练语料库是经常性的。另一方面，在每个训练示例之后更新参数会导致更新之间的损失不太稳定。为了解决这些问题，我们改为在对一批数据进行训练之后再更新我们的参数。这使我们能够更好地估计全局损失的梯度。在本节中，我们将学习如何使用 torch.util.data.DataLoader 类将我们的数据组织成批次。

我们按如下方式调用 DataLoader 类：DataLoader(data, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)。 batch_size 参数确定每批的样本数。在每个epoch，我们将使用 DataLoader 迭代所有批次。默认情况下，批次的顺序是确定性的，但我们可以通过将 shuffle 参数设置为 True 来要求 DataLoader 对批次进行乱序。这样我们就可以确保我们不会多次遇到坏批次。

如果提供，DataLoader 会将它准备的批次传递给 collate_fn。我们可以编写一个自定义函数来传递给 collate_fn 参数，以便打印有关我们批次的统计信息或执行额外的处理。在我们的例子中，我们将使用 collate_fn 来：

窗口式填充我们的训练语句。
将训练样本中的单词转换为索引。
填充训练样本，使所有句子和标签具有相同的长度。同样，我们也需要填充标签。这会产生一个问题，因为在计算损失时，我们需要知道给定样本中的实际单词数。我们还要在传递给 collate_fn 参数的函数中跟踪这个数字。

因为我们的 collate_fn 函数版本需要访问我们的 word_to_ix 字典（以便它可以将单词转换为索引），所以我们将使用 Python 中的部分函数，它将我们提供的参数传递给函数。

In [83]:

from torch.utils.data import DataLoader
from functools import partial

def custom_collate_fn(batch, window_size, word_to_ix):
  # Break our batch into the training examples (x) and labels (y)
  # We are turning our x and y into tensors because nn.utils.rnn.pad_sequence
  # method expects tensors. This is also useful since our model will be
  # expecting tensor inputs. 
  # 将我们的批次拆分成训练样本(x)和标签(y)
  # 将 x 和 y 转换为张量，因为 nn.utils.rnn.pad_sequence 方法需要张量。
  # 这很有用，模型也需要张量输入。
  x, y = zip(*batch)

  # Now we need to window pad our training examples. We have already defined a 
  # function to handle window padding. We are including it here again so that
  # everything is in one place.
  # 现在我们需要对我们的训练样本进行窗口填充。 
  # 我们已经定义了一个函数来处理窗口填充。 我们再次将它包含在这里，以便一切都在一个地方。
  def pad_window(sentence, window_size, pad_token="<pad>"):
    window = [pad_token] * window_size
    return window + sentence + window

  # Pad the train examples.
  # 填充训练样本。
  x = [pad_window(s, window_size=window_size) for s in x]

  # Now we need to turn words in our training examples to indices. We are
  # copying the function defined earlier for the same reason as above.
  # 现在我们需要将训练样本中的单词转换为索引。 
  # 出于与之前相同的原因，我们复制之前定义的函数。
  def convert_tokens_to_indices(sentence, word_to_ix):
    return [word_to_ix.get(token, word_to_ix["<unk>"]) for token in sentence]

  # Convert the train examples into indices.
  # 将训练样本转换为索引。
  x = [convert_tokens_to_indices(s, word_to_ix) for s in x]

  # We will now pad the examples so that the lengths of all the example in 
  # one batch are the same, making it possible to do matrix operations. 
  # We set the batch_first parameter to True so that the returned matrix has 
  # the batch as the first dimension.
  # 我们现在将填充样本，以便一批中所有样本的长度相同，从而可以进行矩阵运算。
  # 我们将batch_first 参数设置为True，以便返回的矩阵以batch作为第一维。
  pad_token_ix = word_to_ix["<pad>"]

  # pad_sequence function expects the input to be a tensor, so we turn x into one
  # pad_sequence 函数期望的输入是张量，所以我们把 x 变成 1
  x = [torch.LongTensor(x_i) for x_i in x]
  x_padded = nn.utils.rnn.pad_sequence(x, batch_first=True, padding_value=pad_token_ix)

  # We will also pad the labels. Before doing so, we will record the number 
  # of labels so that we know how many words existed in each example. 
  # 我们还要填充标签。在此之前，我们要记录标签的数量，以便知道每个样本中存在多少个单词。
  lengths = [len(label) for label in y]
  lenghts = torch.LongTensor(lengths)

  y = [torch.LongTensor(y_i) for y_i in y]
  y_padded = nn.utils.rnn.pad_sequence(y, batch_first=True, padding_value=0)

  # We are now ready to return our variables. The order we return our variables
  # here will match the order we read them in our training loop.
  # 我们现在准备返回变量。在此处返回变量的顺序与我们在训练循环中读取它们的顺序相匹配。
  return x_padded, y_padded, lenghts

这个函数看起来很长，但其实没必要。查看下面的替代版本，我们删除了额外的函数声明和注释。

In [84]:

def _custom_collate_fn(batch, window_size, word_to_ix):
  # Prepare the datapoints
  # 准备数据点
  x, y = zip(*batch)  
  x = [pad_window(s, window_size=window_size) for s in x]
  x = [convert_tokens_to_indices(s, word_to_ix) for s in x]

  # Pad x so that all the examples in the batch have the same size
  # 填充 x 使批次中的所有样本具有相同的大小
  pad_token_ix = word_to_ix["<pad>"]
  x = [torch.LongTensor(x_i) for x_i in x]
  x_padded = nn.utils.rnn.pad_sequence(x, batch_first=True, padding_value=pad_token_ix)

  # Pad y and record the length
  # 填充 y 并记录长度
  lengths = [len(label) for label in y]
  lenghts = torch.LongTensor(lengths)
  y = [torch.LongTensor(y_i) for y_i in y]
  y_padded = nn.utils.rnn.pad_sequence(y, batch_first=True, padding_value=0)

  return x_padded, y_padded, lenghts

现在，我们可以看到 DataLoader 正在运行。

In [85]:

# Parameters to be passed to the DataLoader
# 传递给DataLoader的参数
data = list(zip(train_sentences, train_labels))
batch_size = 2
shuffle = True
window_size = 2
collate_fn = partial(custom_collate_fn, window_size=window_size, word_to_ix=word_to_ix)

# Instantiate the DataLoader
# 实例化DataLoader
loader = DataLoader(data, batch_size=batch_size, shuffle=shuffle, collate_fn=collate_fn)

# Go through one loop
# 遍历一个循环
counter = 0
for batched_x, batched_y, batched_lengths in loader:
  print(f"Iteration {counter}")
  print("Batched Input:")
  print(batched_x)
  print("Batched Labels:")
  print(batched_y)
  print("Batched Lengths:")
  print(batched_lengths)
  print("")
  counter += 1

Iteration 0
Batched Input:
tensor([[ 0,  0, 22,  2,  6, 20, 15,  0,  0],
        [ 0,  0, 19, 16, 12,  8,  4,  0,  0]])
Batched Labels:
tensor([[0, 0, 0, 0, 1],
        [0, 0, 0, 0, 1]])
Batched Lengths:
tensor([5, 5])

Iteration 1
Batched Input:
tensor([[ 0,  0, 19,  5, 14, 21, 12,  3,  0,  0],
        [ 0,  0, 10, 13, 11, 17,  0,  0,  0,  0]])
Batched Labels:
tensor([[0, 0, 0, 1, 0, 1],
        [0, 0, 0, 1, 0, 0]])
Batched Lengths:
tensor([6, 4])

Iteration 2
Batched Input:
tensor([[ 0,  0,  9,  7,  8, 18,  0,  0]])
Batched Labels:
tensor([[0, 0, 0, 1]])
Batched Lengths:
tensor([4])

在上面看到的批处理的输入张量被传递到了我们的模型中。另一方面，我们在本文开头说我们的模型是一个窗口分类器。目前我们输入张量的方式是格式化的，我们在一个数据点中包含一个句子中的所有单词。当我们将此输入传递给我们的模型时，它需要为每个词创建窗口，对每个窗口的中心词是否为 LOCATION 进行预测，将预测放在一起并返回。

如果我们事先将数据分成多个窗口来格式化数据，就可以避免这个问题。在这个例子中，我们将换一下模型格式化的方式。

鉴于我们的 window_size 是 N，我们希望模型对每 2N+1 个标记进行预测。也就是说，如果我们有一个包含 9 个标记的输入，并且 window_size 为 2，我们希望模型返回 5 个预测。这是有道理的，因为在我们在每侧填充 2 个标记之前，输入中也有 5 个标记！

我们可以通过使用 for 循环来创建这些窗口，但有一个更快的 PyTorch 替代方法，即unfold(dimension, size, step)方法。我们可以使用这个方法创建我们需要的窗口，如下所示：

In [86]:

# Print the original tensor
# 打印原始张量
print(f"Original Tensor: ")
print(batched_x)
print("")

# Create the 2 * 2 + 1 chunks
# 创建 2 * 2 + 1 块
chunk = batched_x.unfold(1, window_size*2 + 1, 1)
print(f"Windows: ")
print(chunk)

Original Tensor: 
tensor([[ 0,  0,  9,  7,  8, 18,  0,  0]])

Windows: 
tensor([[[ 0,  0,  9,  7,  8],
         [ 0,  9,  7,  8, 18],
         [ 9,  7,  8, 18,  0],
         [ 7,  8, 18,  0,  0]]])

（第四部分：演示：词窗分类二（Demo: Word Window Classification II））