本教程译文的上一部分,请见我的上一篇博文:
Stanford CS224N: PyTorch Tutorial (Winter ‘21) —— 斯坦福CS224N PyTorch教程 (第三部分)_放肆荒原的博客-CSDN博客
演示:词窗分类 二(Demo: Word Window Classification II)
模型Model
现在我们已经准备好了数据,可以构建模型了。 我们已经学了如何编写自定义 nn.Module 类。 接下来我们会把迄今为止学到的一切整合在一起。
class WordWindowClassifier(nn.Module):
def __init__(self, hyperparameters, vocab_size, pad_ix=0):
super(WordWindowClassifier, self).__init__()
""" Instance variables """
""" 实例化变量 """
self.window_size = hyperparameters["window_size"]
self.embed_dim = hyperparameters["embed_dim"]
self.hidden_dim = hyperparameters["hidden_dim"]
self.freeze_embeddings = hyperparameters["freeze_embeddings"]
""" Embedding Layer
Takes in a tensor containing embedding indices, and returns the
corresponding embeddings. The output is of dim
(number_of_indices * embedding_dim).
If freeze_embeddings is True, set the embedding layer parameters to be
non-trainable. This is useful if we only want the parameters other than the
embeddings parameters to change.
"""
""" 嵌入层
接收一个包含嵌入索引的张量,并返回相应的嵌入。 输出为
dim(number_of_indices * embedding_dim)。
如果 freeze_embeddings 为 True,则将嵌入层参数设置为不可训练。
如果我们只希望嵌入参数以外的参数发生变化,这将很有用。
"""
self.embeds = nn.Embedding(vocab_size, self.embed_dim, padding_idx=pad_ix)
if self.freeze_embeddings:
self.embed_layer.weight.requires_grad = False
""" Hidden Layer
隐藏层
"""
full_window_size = 2 * window_size + 1
self.hidden_layer = nn.Sequential(
nn.Linear(full_window_size * self.embed_dim, self.hidden_dim),
nn.Tanh()
)
""" Output Layer
输出层
"""
self.output_layer = nn.Linear(self.hidden_dim, 1)
""" Probabilities
概率
"""
self.probabilities = nn.Sigmoid()
def forward(self, inputs):
"""
Let B:= batch_size
L:= window-padded sentence length
D:= self.embed_dim
S:= self.window_size
H:= self.hidden_dim
inputs: a (B, L) tensor of token indices
"""
B, L = inputs.size()
"""
Reshaping.
Takes in a (B, L) LongTensor
Outputs a (B, L~, S) LongTensor
重塑数据形状
接收一个(B, L) LongTensor,输出一个(B, L~, S) LongTensor
"""
# Fist, get our word windows for each word in our input.
# 首先,获取输入中每个单词的单词窗口。
token_windows = inputs.unfold(1, 2 * self.window_size + 1, 1)
_, adjusted_length, _ = token_windows.size()
# Good idea to do internal tensor-size sanity checks, at the least in comments!
# 做内部张量大小健全性检查是个好主意!
assert token_windows.size() == (B, adjusted_length, 2 * self.window_size + 1)
"""
Embedding.
Takes in a torch.LongTensor of size (B, L~, S)
Outputs a (B, L~, S, D) FloatTensor.
嵌入
接收一个大小是(B, L~, S)的torch.LongTensor,输出一个(B, L~, S, D) FloatTensor。
"""
embedded_windows = self.embeds(token_windows)
"""
Reshaping.
Takes in a (B, L~, S, D) FloatTensor.
Resizes it into a (B, L~, S*D) FloatTensor.
-1 argument "infers" what the last dimension should be based on leftover axes.
重塑数据形状
接收一个(B, L~, S, D) FloatTensor,把它大小重置到(B, L~, S*D) FloatTensor。
-1 参数“推断”最后一个维度应该基于剩余的轴。
"""
embedded_windows = embedded_windows.view(B, adjusted_length, -1)
"""
Layer 1.
Takes in a (B, L~, S*D) FloatTensor.
Resizes it into a (B, L~, H) FloatTensor
接收一个(B, L~, S*D) FloatTensor,重置大小到(B, L~, H) FloatTensor。
"""
layer_1 = self.hidden_layer(embedded_windows)
"""
Layer 2
Takes in a (B, L~, H) FloatTensor.
Resizes it into a (B, L~, 1) FloatTensor.
接收一个(B, L~, H) FloatTensor,重置大小到(B, L~, 1) FloatTensor
"""
output = self.output_layer(layer_1)
"""
Softmax.
Takes in a (B, L~, 1) FloatTensor of unnormalized class scores.
Outputs a (B, L~, 1) FloatTensor of (log-)normalized class scores.
接收一个非归一化类分数的(B, L~, 1) FloatTensor
输出一个(log-)归一化类分数的B, L~, 1) FloatTensor
"""
output = self.probabilities(output)
output = output.view(B, -1)
return output
训练(Training)
现在可以把所有东西放在一起了,我们从准备数据和初始化模型开始。然后我们可以初始化优化器并定义我们自己的损失函数。 这次,我们定义自己的损失函数,而不是像之前那样使用预定义的损失函数之一。
In [88]:
# Prepare the data
# 准备数据
data = list(zip(train_sentences, train_labels))
batch_size = 2
shuffle = True
window_size = 2
collate_fn = partial(custom_collate_fn, window_size=window_size, word_to_ix=word_to_ix)
# Instantiate a DataLoader
# 实例化 DataLoader
loader = DataLoader(data, batch_size=batch_size, shuffle=shuffle, collate_fn=collate_fn)
# Initialize a model
# It is useful to put all the model hyperparameters in a dictionary
# 初始化模型
# 将所有模型超参数放在字典中很有用
model_hyperparameters = {
"batch_size": 4,
"window_size": 2,
"embed_dim": 25,
"hidden_dim": 25,
"freeze_embeddings": False,
}
vocab_size = len(word_to_ix)
model = WordWindowClassifier(model_hyperparameters, vocab_size)
# Define an optimizer
# 定义优化器
learning_rate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
# Define a loss function, which computes to binary cross entropy loss
# 定义一个损失函数,计算二元交叉熵损失
def loss_function(batch_outputs, batch_labels, batch_lengths):
# Calculate the loss for the whole batch
# 计算整批的损失
bceloss = nn.BCELoss()
loss = bceloss(batch_outputs, batch_labels.float())
# Rescale the loss. Remember that we have used lengths to store the
# number of words in each training example
# 重新缩放损失。 我们使用长度来存储每个训练样本中的单词数
loss = loss / batch_lengths.sum().float()
return loss
与之前的示例不同,这次我们使用批处理,而不是在每个 epoch 中一次将所有训练数据传递给模型。 因此,在每个训练epoch的迭代中,我们也迭代批次。
In [89]:
# Function that will be called in every epoch
# 将在每个 epoch 中调用的函数
def train_epoch(loss_function, optimizer, model, loader):
# Keep track of the total loss for the batch
# 跟踪批次的总损失
total_loss = 0
for batch_inputs, batch_labels, batch_lengths in loader:
# Clear the gradients
# 清除梯度
optimizer.zero_grad()
# Run a forward pass
# 运行前向传递
outputs = model.forward(batch_inputs)
# Compute the batch loss
# 计算批量损失
loss = loss_function(outputs, batch_labels, batch_lengths)
# Calculate the gradients
# 计算梯度
loss.backward()
# Update the parameteres
# 更新参数
optimizer.step()
total_loss += loss.item()
return total_loss
# Function containing our main training loop
# 包含主训练循环的函数
def train(loss_function, optimizer, model, loader, num_epochs=10000):
# Iterate through each epoch and call our train_epoch function
# 迭代遍历每个 epoch 并调用 train_epoch 函数
for epoch in range(num_epochs):
epoch_loss = train_epoch(loss_function, optimizer, model, loader)
if epoch % 100 == 0: print(epoch_loss)
开始训练吧!
In [90]:
num_epochs = 1000
train(loss_function, optimizer, model, loader, num_epochs=num_epochs)
0.3274914249777794 0.24941639229655266 0.1968013420701027 0.1381114460527897 0.11672545038163662 0.09148690290749073 0.07141915801912546 0.05857925023883581 0.04900792893022299 0.04107789508998394
预测(Prediction)
让我们看看模型在预测方面的表现如何。 我们可以从创建测试数据开始。
In [91]:
# Create test sentences
# 创建测试语句
test_corpus = ["She comes from Paris"]
test_sentences = [s.lower().split() for s in test_corpus]
test_labels = [[0, 0, 0, 1]]
# Create a test loader
# 创建一个测试加载器
test_data = list(zip(test_sentences, test_labels))
batch_size = 1
shuffle = False
window_size = 2
collate_fn = partial(custom_collate_fn, window_size=2, word_to_ix=word_to_ix)
test_loader = torch.utils.data.DataLoader(test_data,
batch_size=1,
shuffle=False,
collate_fn=collate_fn)
用循环查看一下我们的测试样本,看看我们做得如何。
In [92]:
for test_instance, labels, _ in test_loader:
outputs = model.forward(test_instance)
print(labels)
print(outputs)
tensor([[0, 0, 0, 1]]) tensor([[0.0339, 0.1031, 0.0500, 0.9770]], grad_fn=<ViewBackward>)
(全文完)