提供六个使用 LSTM 的示例案例:分别基于 PyTorch 和 TensorFlow 各三个应用场景,并且都使用了模拟数据(以便在任何环境下都能直接运行)。
- PyTorch 版案例
- 时间序列预测(预测正弦波下一步)
- 文本分类(简单二分类)
- 序列到序列(Seq2Seq)预测(将输入序列反转输出)
- TensorFlow 版案例
4. 时间序列预测(预测正弦波下一步)
5. 文本分类(简单二分类)
6. 序列到序列(Seq2Seq)预测(将输入序列反转输出)
每个案例都给出了可运行的示例代码以及较为详细的注释说明,帮助读者理解 LSTM 的用法与实现流程。代码在 CPU 上即可跑通,对于实际问题,可根据需要替换为真实数据集并调整模型结构和超参数。
一、PyTorch 版案例
案例 1(PyTorch):时间序列预测
思路简介
- 本示例使用一个带噪声的正弦波序列作为模拟数据,利用 LSTM 来预测下一个时间步的值。
- 我们将连续的 20 个点作为输入,训练模型去预测第 21 个点。
- 该案例展示了 多对一(Many-to-One) 的时间序列回归任务。
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
# ----- 1. 模拟数据 ----- #
# 生成一个正弦波,并添加一些随机噪声
def generate_data(seq_length=1000, noise_amplitude=0.1):
x = np.linspace(0, 20*np.pi, seq_length)
y = np.sin(x) + noise_amplitude * np.random.randn(seq_length)
return y
time_series = generate_data(seq_length=1200) # 生成1200个点
# 设置输入序列长度
input_size = 20 # 我们会用前20个点去预测第21个点
data_x = []
data_y = []
for i in range(len(time_series) - input_size):
data_x.append(time_series[i:i+input_size])
data_y.append(time_series[i+input_size]) # 下一个点作为标签
data_x = np.array(data_x)
data_y = np.array(data_y)
# 划分训练集和测试集
train_size = 1000
train_x = data_x[:train_size]
train_y = data_y[:train_size]
test_x = data_x[train_size:]
test_y = data_y[train_size:]
# 转成 PyTorch 张量
train_x_tensor = torch.tensor(train_x, dtype=torch.float32).unsqueeze(-1) # (batch, seq_len, 1)
train_y_tensor = torch.tensor(train_y, dtype=torch.float32).unsqueeze(-1) # (batch, 1)
test_x_tensor = torch.tensor(test_x, dtype=torch.float32).unsqueeze(-1)
test_y_tensor = torch.tensor(test_y, dtype=torch.float32).unsqueeze(-1)
# ----- 2. 定义 LSTM 模型 ----- #
class LSTMTimeSeries(nn.Module):
def __init__(self, hidden_size=16, num_layers=1):
super(LSTMTimeSeries, self).__init__()
# 输入维度为1(因为每个时间步只有1个特征),隐藏层大小可调
self.lstm = nn.LSTM(input_size=1, hidden_size=hidden_size,
num_layers=num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, 1) # 最终输出1个值
def forward(self, x):
# x 形状: (batch, seq_len, 1)
out, (h_n, c_n) = self.lstm(x)
# out的形状: (batch, seq_len, hidden_size)
# 取最后一个时间步的输出
last_out = out[:, -1, :] # (batch, hidden_size)
out = self.fc(last_out) # (batch, 1)
return out
model = LSTMTimeSeries(hidden_size=32, num_layers=1)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# ----- 3. 训练模型 ----- #
epochs = 10
for epoch in range(epochs):
# 前向传播
pred = model(train_x_tensor)
loss = criterion(pred, train_y_tensor)
# 反向传播和更新
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 2 == 0:
print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")
# ----- 4. 测试与可视化 ----- #
model.eval()
with torch.no_grad():
test_pred = model(test_x_tensor).squeeze(-1).numpy()
# 绘制对比图
plt.figure(figsize=(10,4))
plt.plot(range(len(time_series)), time_series, label='True Time Series')
plt.axvline(x=train_size+input_size, color='r', linestyle='--', label='Train/Test Split')
plt.plot(range(train_size+input_size, len(time_series)), test_pred, label='Predicted', color='orange')
plt.title("PyTorch LSTM - Time Series Prediction")
plt.legend()
plt.show()
代码要点说明
- 数据生成:
generate_data
函数用于生成正弦波 + 噪声,形成模拟时间序列。 - 序列切分:将前 20 个点作为输入,用第 21 个点作为预测目标。
- LSTM 模型:通过
nn.LSTM
搭建循环结构,batch_first=True
使输入维度为 (batch, seq_len, feature)。 - 训练:采用均方误差损失
MSELoss
,用 Adam 优化器进行迭代。 - 可视化:对测试集的预测结果进行绘制,与真实曲线对比。
案例 2(PyTorch):文本分类
思路简介
- 该示例构造一个模拟文本分类任务:随机生成若干“句子”,并用简单的整数来表示单词。
- 标签分为两类(0 或 1)。我们用 LSTM 读取句子,然后在最后一个时间步输出一个隐状态,并进行分类(many-to-one)。
- 现实中可替换为更真实的文本嵌入(如词向量或 BERT 向量),此处仅演示流程。
import torch
import torch.nn as nn
import numpy as np
import random
# ----- 1. 模拟文本数据 ----- #
def generate_fake_text_data(num_samples=2000, vocab_size=50, max_len=10):
# vocab_size = 50 表示单词ID范围为[0,49]
# max_len = 10 表示每个句子最多10个词
# 生成随机“句子”和随机标签(0/1)
data_x = []
data_y = []
for _ in range(num_samples):
seq_len = random.randint(3, max_len) # 句子长度随机
sentence = [random.randint(1, vocab_size-1) for _ in range(seq_len)]
label = random.randint(0,1)
data_x.append(sentence)
data_y.append(label)
return data_x, data_y
train_x, train_y = generate_fake_text_data(num_samples=1800, vocab_size=50, max_len=10)
test_x, test_y = generate_fake_text_data(num_samples=200, vocab_size=50, max_len=10)
# 将句子padding到相同长度,以便打包成batch
def pad_sequences(sequences, max_len):
padded_seqs = []
for seq in sequences:
if len(seq) < max_len:
seq = seq + [0]*(max_len - len(seq)) # 用0补齐
else:
seq = seq[:max_len]
padded_seqs.append(seq)
return np.array(padded_seqs)
max_seq_len = 10 # 与数据生成时的max_len一致
train_x_padded = pad_sequences(train_x, max_seq_len)
test_x_padded = pad_sequences(test_x, max_seq_len)
train_x_tensor = torch.tensor(train_x_padded, dtype=torch.long) # (batch, seq_len)
train_y_tensor = torch.tensor(train_y, dtype=torch.long)
test_x_tensor = torch.tensor(test_x_padded, dtype=torch.long)
test_y_tensor = torch.tensor(test_y, dtype=torch.long)
# ----- 2. 定义 LSTM 分类模型 ----- #
class LSTMClassifier(nn.Module):
def __init__(self, vocab_size, embed_dim=16, hidden_size=32, num_layers=1, num_classes=2):
super(LSTMClassifier, self).__init__()
self.embedding = nn.Embedding(num_embeddings=vocab_size, embedding_dim=embed_dim)
self.lstm = nn.LSTM(input_size=embed_dim, hidden_size=hidden_size,
num_layers=num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, num_classes)
def forward(self, x):
# x 形状: (batch, seq_len)
embedded = self.embedding(x) # (batch, seq_len, embed_dim)
lstm_out, (h_n, c_n) = self.lstm(embedded)
# h_n 形状: (num_layers, batch, hidden_size)
# 取最后一层的隐藏状态并做分类
last_hidden = h_n[-1, :, :] # (batch, hidden_size)
logits = self.fc(last_hidden) # (batch, num_classes)
return logits
model = LSTMClassifier(vocab_size=50, embed_dim=16, hidden_size=32, num_layers=1, num_classes=2)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# ----- 3. 训练模型 ----- #
epochs = 5
batch_size = 64
def get_batch(x, y, batch_size):
for i in range(0, len(x), batch_size):
yield x[i:i+batch_size], y[i:i+batch_size]
for epoch in range(epochs):
model.train()
total_loss = 0
for batch_x, batch_y in get_batch(train_x_tensor, train_y_tensor, batch_size):
logits = model(batch_x)
loss = criterion(logits, batch_y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
avg_loss = total_loss / (len(train_x_tensor) // batch_size)
print(f"Epoch [{epoch+1}/{epochs}], Loss: {avg_loss:.4f}")
# ----- 4. 测试模型 ----- #
model.eval()
correct = 0
total = 0
with torch.no_grad():
for batch_x, batch_y in get_batch(test_x_tensor, test_y_tensor, batch_size):
logits = model(batch_x)
preds = torch.argmax(logits, dim=1)
correct += (preds == batch_y).sum().item()
total += batch_y.size(0)
print(f"Test Accuracy: {100 * correct/total:.2f}%")
代码要点说明
- 数据生成:使用随机数模拟句子(单词 ID)和二元标签。
- Embedding 层:将单词 ID 转换为可学习的向量表示。
- LSTM 分类器:将最后一层的隐藏状态
h_n
用于分类。 - 训练循环:小批量(batch)训练,输出周期性损失并查看收敛情况。
- 测试精度:用
torch.argmax
获取预测类别,与真实标签比较。
案例 3(PyTorch):序列到序列(Seq2Seq)预测
思路简介
- 在序列到序列任务中,输入和输出序列可能等长(或不等长)。
- 这里使用一个简单的“反转序列”任务作为演示:输入一个数字序列
[1,2,3,...]
,目标是输出其反序[...,3,2,1]
。 - 使用 LSTM 搭建一个简单的 Encoder-Decoder 架构:
- Encoder 将输入序列编码到隐藏状态;
- Decoder 根据隐藏状态逐个解码输出。
import torch
import torch.nn as nn
import numpy as np
import random
# ----- 1. 模拟数据:反转序列 ----- #
def generate_seq2seq_data(num_samples=2000, seq_len=5, vocab_size=10):
# 每个输入序列长度固定为 seq_len,每个元素在 [1, vocab_size-1] 范围
# 输出序列是输入序列的反转
data_input = []
data_target = []
for _ in range(num_samples):
seq = [random.randint(1, vocab_size-1) for _ in range(seq_len)]
rev_seq = seq[::-1]
data_input.append(seq)
data_target.append(rev_seq)
return data_input, data_target
seq_len = 5
train_input, train_target = generate_seq2seq_data(num_samples=1800, seq_len=seq_len, vocab_size=10)
test_input, test_target = generate_seq2seq_data(num_samples=200, seq_len=seq_len, vocab_size=10)
train_input_tensor = torch.tensor(train_input, dtype=torch.long) # (batch, seq_len)
train_target_tensor = torch.tensor(train_target, dtype=torch.long) # (batch, seq_len)
test_input_tensor = torch.tensor(test_input, dtype=torch.long)
test_target_tensor = torch.tensor(test_target, dtype=torch.long)
# ----- 2. 定义 Encoder-Decoder LSTM 模型 ----- #
class Encoder(nn.Module):
def __init__(self, vocab_size, embed_dim=16, hidden_size=32):
super(Encoder, self).__init__()
self.embedding = nn.Embedding(vocab_size, embed_dim)
self.lstm = nn.LSTM(embed_dim, hidden_size, batch_first=True)
def forward(self, x):
embedded = self.embedding(x) # (batch, seq_len, embed_dim)
outputs, (h, c) = self.lstm(embedded)
# outputs: (batch, seq_len, hidden_size)
# h, c: (1, batch, hidden_size) 这里是单层LSTM
return h, c
class Decoder(nn.Module):
def __init__(self, vocab_size, embed_dim=16, hidden_size=32):
super(Decoder, self).__init__()
self.embedding = nn.Embedding(vocab_size, embed_dim)
self.lstm = nn.LSTM(embed_dim, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, vocab_size)
def forward(self, x, hidden, cell):
# x: (batch, 1) 当前时间步的输入token
embedded = self.embedding(x) # (batch, 1, embed_dim)
output, (hidden, cell) = self.lstm(embedded, (hidden, cell))
# output: (batch, 1, hidden_size)
logits = self.fc(output.squeeze(1)) # (batch, vocab_size)
return logits, hidden, cell
class Seq2Seq(nn.Module):
def __init__(self, vocab_size, embed_dim=16, hidden_size=32, seq_len=5):
super(Seq2Seq, self).__init__()
self.encoder = Encoder(vocab_size, embed_dim, hidden_size)
self.decoder = Decoder(vocab_size, embed_dim, hidden_size)
self.seq_len = seq_len
def forward(self, src, trg=None, teacher_forcing_ratio=0.5):
# src, trg 形状: (batch, seq_len)
batch_size = src.size(0)
# 编码
h, c = self.encoder(src)
# 解码 - 逐个时间步
outputs = []
# 初始化 decoder 输入为 <start_token>,此处简化为用0
dec_input = torch.zeros((batch_size, 1), dtype=torch.long)
if src.is_cuda:
dec_input = dec_input.cuda()
for t in range(self.seq_len):
logits, h, c = self.decoder(dec_input, h, c)
pred_token = torch.argmax(logits, dim=1).unsqueeze(1) # (batch, 1)
outputs.append(logits.unsqueeze(1)) # (batch, 1, vocab_size)
# 决定下一步输入:teacher forcing 或模型自身输出
if trg is not None and random.random() < teacher_forcing_ratio:
dec_input = trg[:, t].unsqueeze(1)
else:
dec_input = pred_token
outputs = torch.cat(outputs, dim=1) # (batch, seq_len, vocab_size)
return outputs
# ----- 3. 训练模型 ----- #
vocab_size = 10 # 和数据生成保持一致(词表大小)
model = Seq2Seq(vocab_size=vocab_size, embed_dim=16, hidden_size=32, seq_len=seq_len)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
epochs = 5
batch_size = 32
def get_batch(x, y, batch_size):
for i in range(0, len(x), batch_size):
yield x[i:i+batch_size], y[i:i+batch_size]
for epoch in range(epochs):
model.train()
total_loss = 0
for batch_x, batch_y in get_batch(train_input_tensor, train_target_tensor, batch_size):
# 前向 + 计算loss
logits = model(batch_x, batch_y, teacher_forcing_ratio=0.5) # (batch, seq_len, vocab_size)
# 计算与目标 batch_y 的交叉熵损失,需要将输出维度调成 (batch*seq_len, vocab_size)
loss = criterion(logits.view(-1, vocab_size), batch_y.view(-1))
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
avg_loss = total_loss / (len(train_input_tensor) // batch_size)
print(f"Epoch [{epoch+1}/{epochs}], Loss: {avg_loss:.4f}")
# ----- 4. 测试 ----- #
model.eval()
with torch.no_grad():
sample_input = test_input_tensor[0:2] # 取前两个样本做演示
sample_target = test_target_tensor[0:2]
logits = model(sample_input, teacher_forcing_ratio=0.0) # 不使用teacher forcing
preds = torch.argmax(logits, dim=2) # (batch, seq_len)
print("Sample input sequences:", sample_input)
print("True reversed sequences:", sample_target)
print("Model predicted sequences:", preds)
代码要点说明
- 数据模拟:随机生成数字序列及其反转。
- Encoder-Decoder:Encoder 将输入序列编码成最后隐藏状态,Decoder 根据该状态在每个时间步输出一个词(这里是数字)。
- Teacher Forcing:训练时一定概率使用真实目标(
trg[:, t]
)作为下一个时间步的输入,加速收敛。 - 测试:仅使用 Decoder 的预测输出作为下一时刻的输入,查看最终序列生成效果。
二、TensorFlow 版案例
下面的三个案例与 PyTorch 部分的思路相似,只是采用了 TensorFlow(以 tf.keras
方式)实现。为简洁起见,仍然使用模拟数据进行演示。
案例 4(TensorFlow):时间序列预测
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# ----- 1. 模拟数据:正弦波预测 ----- #
def generate_data(seq_length=1000, noise_amplitude=0.1):
x = np.linspace(0, 20*np.pi, seq_length)
y = np.sin(x) + noise_amplitude * np.random.randn(seq_length)
return y
time_series = generate_data(seq_length=1200)
input_size = 20
data_x = []
data_y = []
for i in range(len(time_series) - input_size):
data_x.append(time_series[i:i+input_size])
data_y.append(time_series[i+input_size])
data_x = np.array(data_x)
data_y = np.array(data_y)
train_size = 1000
train_x = data_x[:train_size]
train_y = data_y[:train_size]
test_x = data_x[train_size:]
test_y = data_y[train_size:]
# 转换为 TensorFlow 张量
train_x_tf = tf.convert_to_tensor(train_x[..., np.newaxis], dtype=tf.float32) # (batch, seq_len, 1)
train_y_tf = tf.convert_to_tensor(train_y, dtype=tf.float32) # (batch,)
test_x_tf = tf.convert_to_tensor(test_x[..., np.newaxis], dtype=tf.float32)
test_y_tf = tf.convert_to_tensor(test_y, dtype=tf.float32)
# ----- 2. 构建 LSTM 模型 ----- #
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(input_size, 1)),
tf.keras.layers.LSTM(32),
tf.keras.layers.Dense(1)
])
model.compile(optimizer='adam', loss='mse')
# ----- 3. 训练模型 ----- #
model.fit(train_x_tf, train_y_tf, epochs=10, batch_size=32)
# ----- 4. 测试 & 可视化 ----- #
pred_test = model.predict(test_x_tf).squeeze()
plt.figure(figsize=(10,4))
plt.plot(range(len(time_series)), time_series, label='True Time Series')
plt.axvline(x=train_size+input_size, color='r', linestyle='--', label='Train/Test Split')
plt.plot(range(train_size+input_size, len(time_series)), pred_test, label='Predicted', color='orange')
plt.title("TensorFlow LSTM - Time Series Prediction")
plt.legend()
plt.show()
代码要点说明
- 数据处理:和 PyTorch 版本类似,生成带噪声的正弦波,并切分 20 步预测第 21 步。
- 模型搭建:
tf.keras.Sequential
+LSTM
+Dense
。 - 训练:
model.fit(...)
即可完成自动的前向与反向传播。 - 预测:
model.predict(...)
后,与真实曲线对比。
案例 5(TensorFlow):文本分类
import tensorflow as tf
import numpy as np
import random
# ----- 1. 模拟文本数据 ----- #
def generate_fake_text_data(num_samples=2000, vocab_size=50, max_len=10):
data_x = []
data_y = []
for _ in range(num_samples):
seq_len = random.randint(3, max_len)
sentence = [random.randint(1, vocab_size-1) for _ in range(seq_len)]
label = random.randint(0,1)
data_x.append(sentence)
data_y.append(label)
return data_x, data_y
train_x, train_y = generate_fake_text_data(num_samples=1800, vocab_size=50, max_len=10)
test_x, test_y = generate_fake_text_data(num_samples=200, vocab_size=50, max_len=10)
def pad_sequences(sequences, max_len):
padded_seqs = []
for seq in sequences:
if len(seq) < max_len:
seq = seq + [0]*(max_len - len(seq))
else:
seq = seq[:max_len]
padded_seqs.append(seq)
return np.array(padded_seqs)
max_seq_len = 10
train_x_padded = pad_sequences(train_x, max_seq_len)
test_x_padded = pad_sequences(test_x, max_seq_len)
train_x_tf = tf.convert_to_tensor(train_x_padded, dtype=tf.int32)
train_y_tf = tf.convert_to_tensor(train_y, dtype=tf.int32)
test_x_tf = tf.convert_to_tensor(test_x_padded, dtype=tf.int32)
test_y_tf = tf.convert_to_tensor(test_y, dtype=tf.int32)
# ----- 2. 定义 LSTM 分类模型 ----- #
vocab_size = 50
embed_dim = 16
hidden_size = 32
model = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embed_dim, input_length=max_seq_len),
tf.keras.layers.LSTM(hidden_size),
tf.keras.layers.Dense(2, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# ----- 3. 训练模型 ----- #
model.fit(train_x_tf, train_y_tf, epochs=5, batch_size=32)
# ----- 4. 测试模型 ----- #
loss, acc = model.evaluate(test_x_tf, test_y_tf, verbose=0)
print(f"Test Accuracy: {acc*100:.2f}%")
代码要点说明
- Embedding 层:
tf.keras.layers.Embedding
将 [batch, seq_len] 的输入映射到 [batch, seq_len, embed_dim] 的嵌入向量。 - LSTM 层:输出最后时间步的隐状态,直接连接全连接层输出分类结果。
- 损失函数:对于二分类或多分类,使用
sparse_categorical_crossentropy
或categorical_crossentropy
。 - 评估:
model.evaluate
返回损失和准确率。
案例 6(TensorFlow):序列到序列(Seq2Seq)预测
这里示例与 PyTorch 部分类似,做一个“反转序列”的任务演示简单的 Encoder-Decoder 结构。为了易于理解,使用 tf.keras.Model
子类化方式实现。
import tensorflow as tf
import numpy as np
import random
# ----- 1. 生成数据 ----- #
def generate_seq2seq_data(num_samples=2000, seq_len=5, vocab_size=10):
data_input = []
data_target = []
for _ in range(num_samples):
seq = [random.randint(1, vocab_size-1) for _ in range(seq_len)]
rev_seq = seq[::-1]
data_input.append(seq)
data_target.append(rev_seq)
return np.array(data_input), np.array(data_target)
seq_len = 5
vocab_size = 10
train_input, train_target = generate_seq2seq_data(1800, seq_len, vocab_size)
test_input, test_target = generate_seq2seq_data(200, seq_len, vocab_size)
# ----- 2. 定义 Encoder-Decoder 模型 ----- #
class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embed_dim, enc_units):
super().__init__()
self.embedding = tf.keras.layers.Embedding(vocab_size, embed_dim)
self.lstm = tf.keras.layers.LSTM(enc_units, return_sequences=True, return_state=True)
def call(self, x):
x = self.embedding(x)
outputs, state_h, state_c = self.lstm(x)
return state_h, state_c
class Decoder(tf.keras.Model):
def __init__(self, vocab_size, embed_dim, dec_units):
super().__init__()
self.embedding = tf.keras.layers.Embedding(vocab_size, embed_dim)
self.lstm = tf.keras.layers.LSTM(dec_units, return_sequences=True, return_state=True)
self.fc = tf.keras.layers.Dense(vocab_size)
def call(self, x, hidden, cell):
# x: (batch, 1)
x = self.embedding(x) # (batch, 1, embed_dim)
outputs, state_h, state_c = self.lstm(x, initial_state=[hidden, cell])
logits = self.fc(outputs) # (batch, 1, vocab_size)
return logits, state_h, state_c
class Seq2Seq(tf.keras.Model):
def __init__(self, vocab_size, embed_dim, units, seq_len):
super().__init__()
self.encoder = Encoder(vocab_size, embed_dim, units)
self.decoder = Decoder(vocab_size, embed_dim, units)
self.seq_len = seq_len
def call(self, enc_input, dec_input=None, teacher_forcing=True):
# enc_input: (batch, seq_len)
batch_size = tf.shape(enc_input)[0]
# 编码
enc_h, enc_c = self.encoder(enc_input)
outputs = []
dec_x = tf.zeros((batch_size, 1), dtype=tf.int32) # 初始decoder输入(此处用0)
if dec_input is not None:
# 训练时将dec_input与teacher forcing混合
pass
# 解码
for t in range(self.seq_len):
logits, enc_h, enc_c = self.decoder(dec_x, enc_h, enc_c)
# logits: (batch, 1, vocab_size)
logits = tf.squeeze(logits, axis=1) # (batch, vocab_size)
outputs.append(tf.expand_dims(logits, 1)) # (batch, 1, vocab_size)
predicted_id = tf.argmax(logits, axis=1, output_type=tf.int32) # (batch,)
if teacher_forcing and dec_input is not None:
# 50% 概率使用真实输入
coin = tf.random.uniform(shape=[])
if coin < 0.5:
dec_x = tf.expand_dims(dec_input[:, t], 1)
else:
dec_x = tf.expand_dims(predicted_id, 1)
else:
dec_x = tf.expand_dims(predicted_id, 1)
final = tf.concat(outputs, axis=1) # (batch, seq_len, vocab_size)
return final
# ----- 3. 训练流程 ----- #
units = 32
embed_dim = 16
model = Seq2Seq(vocab_size, embed_dim, units, seq_len)
optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
@tf.function
def train_step(enc_inp, dec_inp, dec_tar):
with tf.GradientTape() as tape:
predictions = model(enc_inp, dec_inp, teacher_forcing=True) # (batch, seq_len, vocab_size)
loss = loss_object(dec_tar, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
return loss
epochs = 5
batch_size = 32
def get_batch(x, y, bs):
for i in range(0, len(x), bs):
yield x[i:i+bs], y[i:i+bs]
for epoch in range(epochs):
total_loss = 0
for enc_inp, dec_tar in get_batch(train_input, train_target, batch_size):
enc_inp_tf = tf.convert_to_tensor(enc_inp, dtype=tf.int32)
dec_inp_tf = tf.convert_to_tensor(dec_tar, dtype=tf.int32) # 这里直接用目标作为teacher forcing输入
dec_tar_tf = tf.convert_to_tensor(dec_tar, dtype=tf.int32)
loss_val = train_step(enc_inp_tf, dec_inp_tf, dec_tar_tf)
total_loss += loss_val.numpy()
print(f"Epoch [{epoch+1}/{epochs}], Loss: {total_loss:.4f}")
# ----- 4. 测试 ---- #
def predict(model, inp):
enc_inp_tf = tf.convert_to_tensor(inp, dtype=tf.int32)
predictions = model(enc_inp_tf, teacher_forcing=False) # 不用teacher forcing
preds_id = tf.argmax(predictions, axis=2)
return preds_id
sample_input = test_input[:2]
sample_target = test_target[:2]
preds = predict(model, sample_input)
print("Sample input sequences:", sample_input)
print("True reversed sequences:", sample_target)
print("Model predicted sequences:", preds.numpy())
代码要点说明
- Encoder / Decoder:分别定义为子类模型,Encoder 返回最终的隐藏状态和细胞状态,Decoder 逐步根据输入 token 和先前状态生成输出。
- Seq2Seq 主模型:将 Encoder、Decoder 组合起来,并在
call
方法中实现整体前向逻辑,包含 Teacher Forcing。 - 训练循环:在
train_step
中用tf.GradientTape
手动控制反向传播,计算交叉熵损失并更新参数。 - 预测:不启用 Teacher Forcing,Decoder 只能使用自身上一步的预测来生成下一步的输入。
总结
以上 6 个示例分别展示了在 PyTorch 与 TensorFlow 中使用 LSTM 处理不同类型的序列任务:
- 时间序列预测(Many-to-One 回归)
- 文本分类(Many-to-One 分类)
- 序列到序列(Many-to-Many 生成/翻译)
它们涵盖了从简单网络(单层 LSTM + 全连接)到Encoder-Decoder 结构的多种常见用法,并使用模拟数据让代码能在任何环境下快速运行、测试。
在实际应用时,往往需要:
- 更复杂的数据预处理(如词向量、数据归一化等)
- 更多层数或更大隐藏维度
- 适当的正则化措施(如 Dropout)
- 更完整的训练策略(如学习率衰减、早停等)
读者可根据业务需求灵活调整。希望这些案例有助于理解 LSTM 的基本编程范式与实现要点。祝学习和研究顺利!