pytorch实现lstm分类模型_lstm 分类 pytorch-CSDN博客

本文链接：https://blog.csdn.net/Sableye/article/details/110545298

本文详细解析了如何使用PyTorch实现一个LSTM模型进行词类标注。介绍了LSTM的初始化参数、输入输出结构，以及在模型训练时如何处理输入数据。通过构建神经网络模型，结合nn.Linear层，将LSTM的输出转化为多分类任务的得分，以进行词类标注。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

教程原文在这里Tutorial，这篇文章中用LSTM实现了一个简单的词类标注模型。下面是一些具体的解析：

# Author: Robert Guthrie

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)
# 引用库函数

我们首先了解如何初始化一个nn.LSTM实例，以及它的输入输出。初始化nn.LSTM实例，可以设定的参数如下：
在这里插入图片描述
常用的是前两个，用来描述LSTM输入的词向量维度和输出的向量的维度（与hidden state相同），其中num_layer指的是这样的结构：

这种称作stacked LSTM，如上就是两层LSTM堆叠起来。bi-direction指的是双向，双向的LSTM会从正反两个方向读句子，依次输入词向量，两个方向的hidden state也并不是公共的，如下图：
在这里插入图片描述
对应到下面代码的第一行，就是创建了一个输入输出的维度均为3、单层单向的LSTM网络。

lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3
inputs = [torch.randn(1, 3) for _ in range(5)]  # make a sequence of length 5

这个网络的输入输出怎么样呢？LSTM的基本功能是接收一个句子（一个词向量序列），从第一个词开始逐个后移，移到每一个词的时候，根据hidden state、cell state以及当前的词向量计算输出，并更新hidden state 和cell state，因此输入首先是一个词向量列，同时也可以设定一开始的hidden state和cell state，如果不设定那就自动初始化为0。而输出有三个，一个是每一步的输出构成的序列，这里每一个输出对应句子中的每一个词，第二个输出是最后的hidden state，第三个则是最后的cell state，具体的输入输出如下图：
在这里插入图片描述

# initialize the hidden state.
hidden = (torch.randn(1, 1, 3),
          torch.randn(1, 1, 3))
          
for i in inputs:
    # Step through the sequence one element at a time.
    # after each step, hidden contains the hidden state.
    out, hidden = lstm(i.view(1, 1, -1), hidden)

# alternatively, we can do the entire sequence all at once.
# the first value returned by LSTM is all of the hidden states throughout
# the sequence. the second is just the most recent hidden state
# (compare the last slice of "out" with "hidden" below, they are the same)
# The reason for this is that:
# "out" will give you access to all hidden states in the sequence
# "hidden" will allow you to continue the sequence and backpropagate,
# by passing it as an argument  to the lstm at a later time
# Add the extra 2nd dimension
inputs = torch