【时序数据预测】长短期记忆网络(LSTM)应用案例

提供六个使用 LSTM 的示例案例:分别基于 PyTorchTensorFlow 各三个应用场景,并且都使用了模拟数据(以便在任何环境下都能直接运行)。

  • PyTorch 版案例
    1. 时间序列预测(预测正弦波下一步)
    2. 文本分类(简单二分类)
    3. 序列到序列(Seq2Seq)预测(将输入序列反转输出)
  • TensorFlow 版案例
    4. 时间序列预测(预测正弦波下一步)
    5. 文本分类(简单二分类)
    6. 序列到序列(Seq2Seq)预测(将输入序列反转输出)

每个案例都给出了可运行的示例代码以及较为详细的注释说明,帮助读者理解 LSTM 的用法与实现流程。代码在 CPU 上即可跑通,对于实际问题,可根据需要替换为真实数据集并调整模型结构和超参数。


一、PyTorch 版案例

案例 1(PyTorch):时间序列预测

思路简介

  • 本示例使用一个带噪声的正弦波序列作为模拟数据,利用 LSTM 来预测下一个时间步的值。
  • 我们将连续的 20 个点作为输入,训练模型去预测第 21 个点。
  • 该案例展示了 多对一(Many-to-One) 的时间序列回归任务。
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

# ----- 1. 模拟数据 ----- #
# 生成一个正弦波,并添加一些随机噪声
def generate_data(seq_length=1000, noise_amplitude=0.1):
    x = np.linspace(0, 20*np.pi, seq_length)
    y = np.sin(x) + noise_amplitude * np.random.randn(seq_length)
    return y

time_series = generate_data(seq_length=1200)  # 生成1200个点

# 设置输入序列长度
input_size = 20  # 我们会用前20个点去预测第21个点
data_x = []
data_y = []

for i in range(len(time_series) - input_size):
    data_x.append(time_series[i:i+input_size])
    data_y.append(time_series[i+input_size])  # 下一个点作为标签

data_x = np.array(data_x)
data_y = np.array(data_y)

# 划分训练集和测试集
train_size = 1000
train_x = data_x[:train_size]
train_y = data_y[:train_size]
test_x = data_x[train_size:]
test_y = data_y[train_size:]

# 转成 PyTorch 张量
train_x_tensor = torch.tensor(train_x, dtype=torch.float32).unsqueeze(-1)  # (batch, seq_len, 1)
train_y_tensor = torch.tensor(train_y, dtype=torch.float32).unsqueeze(-1)  # (batch, 1)
test_x_tensor = torch.tensor(test_x, dtype=torch.float32).unsqueeze(-1)
test_y_tensor = torch.tensor(test_y, dtype=torch.float32).unsqueeze(-1)

# ----- 2. 定义 LSTM 模型 ----- #
class LSTMTimeSeries(nn.Module):
    def __init__(self, hidden_size=16, num_layers=1):
        super(LSTMTimeSeries, self).__init__()
        # 输入维度为1(因为每个时间步只有1个特征),隐藏层大小可调
        self.lstm = nn.LSTM(input_size=1, hidden_size=hidden_size, 
                            num_layers=num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, 1)  # 最终输出1个值
        
    def forward(self, x):
        # x 形状: (batch, seq_len, 1)
        out, (h_n, c_n) = self.lstm(x)  
        # out的形状: (batch, seq_len, hidden_size)
        # 取最后一个时间步的输出
        last_out = out[:, -1, :]  # (batch, hidden_size)
        out = self.fc(last_out)   # (batch, 1)
        return out

model = LSTMTimeSeries(hidden_size=32, num_layers=1)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# ----- 3. 训练模型 ----- #
epochs = 10
for epoch in range(epochs):
    # 前向传播
    pred = model(train_x_tensor)
    loss = criterion(pred, train_y_tensor)
    
    # 反向传播和更新
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch+1) % 2 == 0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

# ----- 4. 测试与可视化 ----- #
model.eval()
with torch.no_grad():
    test_pred = model(test_x_tensor).squeeze(-1).numpy()

# 绘制对比图
plt.figure(figsize=(10,4))
plt.plot(range(len(time_series)), time_series, label='True Time Series')
plt.axvline(x=train_size+input_size, color='r', linestyle='--', label='Train/Test Split')
plt.plot(range(train_size+input_size, len(time_series)), test_pred, label='Predicted', color='orange')
plt.title("PyTorch LSTM - Time Series Prediction")
plt.legend()
plt.show()

代码要点说明

  1. 数据生成generate_data 函数用于生成正弦波 + 噪声,形成模拟时间序列。
  2. 序列切分:将前 20 个点作为输入,用第 21 个点作为预测目标。
  3. LSTM 模型:通过 nn.LSTM 搭建循环结构,batch_first=True 使输入维度为 (batch, seq_len, feature)。
  4. 训练:采用均方误差损失 MSELoss,用 Adam 优化器进行迭代。
  5. 可视化:对测试集的预测结果进行绘制,与真实曲线对比。

案例 2(PyTorch):文本分类

思路简介

  • 该示例构造一个模拟文本分类任务:随机生成若干“句子”,并用简单的整数来表示单词。
  • 标签分为两类(0 或 1)。我们用 LSTM 读取句子,然后在最后一个时间步输出一个隐状态,并进行分类(many-to-one)。
  • 现实中可替换为更真实的文本嵌入(如词向量或 BERT 向量),此处仅演示流程。
import torch
import torch.nn as nn
import numpy as np
import random

# ----- 1. 模拟文本数据 ----- #
def generate_fake_text_data(num_samples=2000, vocab_size=50, max_len=10):
    # vocab_size = 50 表示单词ID范围为[0,49]
    # max_len = 10 表示每个句子最多10个词
    # 生成随机“句子”和随机标签(0/1)
    data_x = []
    data_y = []
    for _ in range(num_samples):
        seq_len = random.randint(3, max_len)  # 句子长度随机
        sentence = [random.randint(1, vocab_size-1) for _ in range(seq_len)]
        label = random.randint(0,1)
        data_x.append(sentence)
        data_y.append(label)
    return data_x, data_y

train_x, train_y = generate_fake_text_data(num_samples=1800, vocab_size=50, max_len=10)
test_x, test_y = generate_fake_text_data(num_samples=200, vocab_size=50, max_len=10)

# 将句子padding到相同长度,以便打包成batch
def pad_sequences(sequences, max_len):
    padded_seqs = []
    for seq in sequences:
        if len(seq) < max_len:
            seq = seq + [0]*(max_len - len(seq))  # 用0补齐
        else:
            seq = seq[:max_len]
        padded_seqs.append(seq)
    return np.array(padded_seqs)

max_seq_len = 10  # 与数据生成时的max_len一致
train_x_padded = pad_sequences(train_x, max_seq_len)
test_x_padded = pad_sequences(test_x, max_seq_len)

train_x_tensor = torch.tensor(train_x_padded, dtype=torch.long)  # (batch, seq_len)
train_y_tensor = torch.tensor(train_y, dtype=torch.long)
test_x_tensor = torch.tensor(test_x_padded, dtype=torch.long)
test_y_tensor = torch.tensor(test_y, dtype=torch.long)

# ----- 2. 定义 LSTM 分类模型 ----- #
class LSTMClassifier(nn.Module):
    def __init__(self, vocab_size, embed_dim=16, hidden_size=32, num_layers=1, num_classes=2):
        super(LSTMClassifier, self).__init__()
        self.embedding = nn.Embedding(num_embeddings=vocab_size, embedding_dim=embed_dim)
        self.lstm = nn.LSTM(input_size=embed_dim, hidden_size=hidden_size,
                            num_layers=num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)
        
    def forward(self, x):
        # x 形状: (batch, seq_len)
        embedded = self.embedding(x)  # (batch, seq_len, embed_dim)
        lstm_out, (h_n, c_n) = self.lstm(embedded)
        # h_n 形状: (num_layers, batch, hidden_size)
        # 取最后一层的隐藏状态并做分类
        last_hidden = h_n[-1, :, :]  # (batch, hidden_size)
        logits = self.fc(last_hidden)  # (batch, num_classes)
        return logits

model = LSTMClassifier(vocab_size=50, embed_dim=16, hidden_size=32, num_layers=1, num_classes=2)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# ----- 3. 训练模型 ----- #
epochs = 5
batch_size = 64

def get_batch(x, y, batch_size):
    for i in range(0, len(x), batch_size):
        yield x[i:i+batch_size], y[i:i+batch_size]

for epoch in range(epochs):
    model.train()
    total_loss = 0
    for batch_x, batch_y in get_batch(train_x_tensor, train_y_tensor, batch_size):
        logits = model(batch_x)
        loss = criterion(logits, batch_y)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
    avg_loss = total_loss / (len(train_x_tensor) // batch_size)
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {avg_loss:.4f}")

# ----- 4. 测试模型 ----- #
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for batch_x, batch_y in get_batch(test_x_tensor, test_y_tensor, batch_size):
        logits = model(batch_x)
        preds = torch.argmax(logits, dim=1)
        correct += (preds == batch_y).sum().item()
        total += batch_y.size(0)

print(f"Test Accuracy: {100 * correct/total:.2f}%")

代码要点说明

  1. 数据生成:使用随机数模拟句子(单词 ID)和二元标签。
  2. Embedding 层:将单词 ID 转换为可学习的向量表示。
  3. LSTM 分类器:将最后一层的隐藏状态 h_n 用于分类。
  4. 训练循环:小批量(batch)训练,输出周期性损失并查看收敛情况。
  5. 测试精度:用 torch.argmax 获取预测类别,与真实标签比较。

案例 3(PyTorch):序列到序列(Seq2Seq)预测

思路简介

  • 在序列到序列任务中,输入和输出序列可能等长(或不等长)。
  • 这里使用一个简单的“反转序列”任务作为演示:输入一个数字序列 [1,2,3,...],目标是输出其反序 [...,3,2,1]
  • 使用 LSTM 搭建一个简单的 Encoder-Decoder 架构:
    • Encoder 将输入序列编码到隐藏状态;
    • Decoder 根据隐藏状态逐个解码输出。
import torch
import torch.nn as nn
import numpy as np
import random

# ----- 1. 模拟数据:反转序列 ----- #
def generate_seq2seq_data(num_samples=2000, seq_len=5, vocab_size=10):
    # 每个输入序列长度固定为 seq_len,每个元素在 [1, vocab_size-1] 范围
    # 输出序列是输入序列的反转
    data_input = []
    data_target = []
    for _ in range(num_samples):
        seq = [random.randint(1, vocab_size-1) for _ in range(seq_len)]
        rev_seq = seq[::-1]
        data_input.append(seq)
        data_target.append(rev_seq)
    return data_input, data_target

seq_len = 5
train_input, train_target = generate_seq2seq_data(num_samples=1800, seq_len=seq_len, vocab_size=10)
test_input, test_target = generate_seq2seq_data(num_samples=200, seq_len=seq_len, vocab_size=10)

train_input_tensor = torch.tensor(train_input, dtype=torch.long)   # (batch, seq_len)
train_target_tensor = torch.tensor(train_target, dtype=torch.long) # (batch, seq_len)
test_input_tensor = torch.tensor(test_input, dtype=torch.long)
test_target_tensor = torch.tensor(test_target, dtype=torch.long)

# ----- 2. 定义 Encoder-Decoder LSTM 模型 ----- #
class Encoder(nn.Module):
    def __init__(self, vocab_size, embed_dim=16, hidden_size=32):
        super(Encoder, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(embed_dim, hidden_size, batch_first=True)
        
    def forward(self, x):
        embedded = self.embedding(x)  # (batch, seq_len, embed_dim)
        outputs, (h, c) = self.lstm(embedded)
        # outputs: (batch, seq_len, hidden_size)
        # h, c: (1, batch, hidden_size) 这里是单层LSTM
        return h, c

class Decoder(nn.Module):
    def __init__(self, vocab_size, embed_dim=16, hidden_size=32):
        super(Decoder, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(embed_dim, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, vocab_size)
        
    def forward(self, x, hidden, cell):
        # x: (batch, 1) 当前时间步的输入token
        embedded = self.embedding(x)  # (batch, 1, embed_dim)
        output, (hidden, cell) = self.lstm(embedded, (hidden, cell))
        # output: (batch, 1, hidden_size)
        logits = self.fc(output.squeeze(1))  # (batch, vocab_size)
        return logits, hidden, cell

class Seq2Seq(nn.Module):
    def __init__(self, vocab_size, embed_dim=16, hidden_size=32, seq_len=5):
        super(Seq2Seq, self).__init__()
        self.encoder = Encoder(vocab_size, embed_dim, hidden_size)
        self.decoder = Decoder(vocab_size, embed_dim, hidden_size)
        self.seq_len = seq_len
        
    def forward(self, src, trg=None, teacher_forcing_ratio=0.5):
        # src, trg 形状: (batch, seq_len)
        batch_size = src.size(0)
        # 编码
        h, c = self.encoder(src)
        
        # 解码 - 逐个时间步
        outputs = []
        # 初始化 decoder 输入为 <start_token>,此处简化为用0
        dec_input = torch.zeros((batch_size, 1), dtype=torch.long)
        if src.is_cuda:
            dec_input = dec_input.cuda()
            
        for t in range(self.seq_len):
            logits, h, c = self.decoder(dec_input, h, c)
            pred_token = torch.argmax(logits, dim=1).unsqueeze(1)  # (batch, 1)
            outputs.append(logits.unsqueeze(1))  # (batch, 1, vocab_size)
            
            # 决定下一步输入:teacher forcing 或模型自身输出
            if trg is not None and random.random() < teacher_forcing_ratio:
                dec_input = trg[:, t].unsqueeze(1)
            else:
                dec_input = pred_token
        
        outputs = torch.cat(outputs, dim=1)  # (batch, seq_len, vocab_size)
        return outputs

# ----- 3. 训练模型 ----- #
vocab_size = 10  # 和数据生成保持一致(词表大小)
model = Seq2Seq(vocab_size=vocab_size, embed_dim=16, hidden_size=32, seq_len=seq_len)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

epochs = 5
batch_size = 32

def get_batch(x, y, batch_size):
    for i in range(0, len(x), batch_size):
        yield x[i:i+batch_size], y[i:i+batch_size]

for epoch in range(epochs):
    model.train()
    total_loss = 0
    for batch_x, batch_y in get_batch(train_input_tensor, train_target_tensor, batch_size):
        # 前向 + 计算loss
        logits = model(batch_x, batch_y, teacher_forcing_ratio=0.5)  # (batch, seq_len, vocab_size)
        # 计算与目标 batch_y 的交叉熵损失,需要将输出维度调成 (batch*seq_len, vocab_size)
        loss = criterion(logits.view(-1, vocab_size), batch_y.view(-1))
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
    avg_loss = total_loss / (len(train_input_tensor) // batch_size)
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {avg_loss:.4f}")

# ----- 4. 测试 ----- #
model.eval()
with torch.no_grad():
    sample_input = test_input_tensor[0:2]  # 取前两个样本做演示
    sample_target = test_target_tensor[0:2]
    logits = model(sample_input, teacher_forcing_ratio=0.0) # 不使用teacher forcing
    preds = torch.argmax(logits, dim=2)  # (batch, seq_len)
    
    print("Sample input sequences:", sample_input)
    print("True reversed sequences:", sample_target)
    print("Model predicted sequences:", preds)

代码要点说明

  1. 数据模拟:随机生成数字序列及其反转。
  2. Encoder-Decoder:Encoder 将输入序列编码成最后隐藏状态,Decoder 根据该状态在每个时间步输出一个词(这里是数字)。
  3. Teacher Forcing:训练时一定概率使用真实目标(trg[:, t])作为下一个时间步的输入,加速收敛。
  4. 测试:仅使用 Decoder 的预测输出作为下一时刻的输入,查看最终序列生成效果。

二、TensorFlow 版案例

下面的三个案例与 PyTorch 部分的思路相似,只是采用了 TensorFlow(以 tf.keras 方式)实现。为简洁起见,仍然使用模拟数据进行演示。

案例 4(TensorFlow):时间序列预测

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# ----- 1. 模拟数据:正弦波预测 ----- #
def generate_data(seq_length=1000, noise_amplitude=0.1):
    x = np.linspace(0, 20*np.pi, seq_length)
    y = np.sin(x) + noise_amplitude * np.random.randn(seq_length)
    return y

time_series = generate_data(seq_length=1200)
input_size = 20

data_x = []
data_y = []
for i in range(len(time_series) - input_size):
    data_x.append(time_series[i:i+input_size])
    data_y.append(time_series[i+input_size])

data_x = np.array(data_x)
data_y = np.array(data_y)

train_size = 1000
train_x = data_x[:train_size]
train_y = data_y[:train_size]
test_x = data_x[train_size:]
test_y = data_y[train_size:]

# 转换为 TensorFlow 张量
train_x_tf = tf.convert_to_tensor(train_x[..., np.newaxis], dtype=tf.float32)  # (batch, seq_len, 1)
train_y_tf = tf.convert_to_tensor(train_y, dtype=tf.float32)  # (batch,)
test_x_tf = tf.convert_to_tensor(test_x[..., np.newaxis], dtype=tf.float32)
test_y_tf = tf.convert_to_tensor(test_y, dtype=tf.float32)

# ----- 2. 构建 LSTM 模型 ----- #
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(input_size, 1)),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dense(1)
])
model.compile(optimizer='adam', loss='mse')

# ----- 3. 训练模型 ----- #
model.fit(train_x_tf, train_y_tf, epochs=10, batch_size=32)

# ----- 4. 测试 & 可视化 ----- #
pred_test = model.predict(test_x_tf).squeeze()

plt.figure(figsize=(10,4))
plt.plot(range(len(time_series)), time_series, label='True Time Series')
plt.axvline(x=train_size+input_size, color='r', linestyle='--', label='Train/Test Split')
plt.plot(range(train_size+input_size, len(time_series)), pred_test, label='Predicted', color='orange')
plt.title("TensorFlow LSTM - Time Series Prediction")
plt.legend()
plt.show()

代码要点说明

  1. 数据处理:和 PyTorch 版本类似,生成带噪声的正弦波,并切分 20 步预测第 21 步。
  2. 模型搭建tf.keras.Sequential + LSTM + Dense
  3. 训练model.fit(...) 即可完成自动的前向与反向传播。
  4. 预测model.predict(...) 后,与真实曲线对比。

案例 5(TensorFlow):文本分类

import tensorflow as tf
import numpy as np
import random

# ----- 1. 模拟文本数据 ----- #
def generate_fake_text_data(num_samples=2000, vocab_size=50, max_len=10):
    data_x = []
    data_y = []
    for _ in range(num_samples):
        seq_len = random.randint(3, max_len)
        sentence = [random.randint(1, vocab_size-1) for _ in range(seq_len)]
        label = random.randint(0,1)
        data_x.append(sentence)
        data_y.append(label)
    return data_x, data_y

train_x, train_y = generate_fake_text_data(num_samples=1800, vocab_size=50, max_len=10)
test_x, test_y = generate_fake_text_data(num_samples=200, vocab_size=50, max_len=10)

def pad_sequences(sequences, max_len):
    padded_seqs = []
    for seq in sequences:
        if len(seq) < max_len:
            seq = seq + [0]*(max_len - len(seq))
        else:
            seq = seq[:max_len]
        padded_seqs.append(seq)
    return np.array(padded_seqs)

max_seq_len = 10
train_x_padded = pad_sequences(train_x, max_seq_len)
test_x_padded = pad_sequences(test_x, max_seq_len)

train_x_tf = tf.convert_to_tensor(train_x_padded, dtype=tf.int32)
train_y_tf = tf.convert_to_tensor(train_y, dtype=tf.int32)
test_x_tf = tf.convert_to_tensor(test_x_padded, dtype=tf.int32)
test_y_tf = tf.convert_to_tensor(test_y, dtype=tf.int32)

# ----- 2. 定义 LSTM 分类模型 ----- #
vocab_size = 50
embed_dim = 16
hidden_size = 32

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embed_dim, input_length=max_seq_len),
    tf.keras.layers.LSTM(hidden_size),
    tf.keras.layers.Dense(2, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# ----- 3. 训练模型 ----- #
model.fit(train_x_tf, train_y_tf, epochs=5, batch_size=32)

# ----- 4. 测试模型 ----- #
loss, acc = model.evaluate(test_x_tf, test_y_tf, verbose=0)
print(f"Test Accuracy: {acc*100:.2f}%")

代码要点说明

  1. Embedding 层tf.keras.layers.Embedding 将 [batch, seq_len] 的输入映射到 [batch, seq_len, embed_dim] 的嵌入向量。
  2. LSTM 层:输出最后时间步的隐状态,直接连接全连接层输出分类结果。
  3. 损失函数:对于二分类或多分类,使用 sparse_categorical_crossentropycategorical_crossentropy
  4. 评估model.evaluate 返回损失和准确率。

案例 6(TensorFlow):序列到序列(Seq2Seq)预测

这里示例与 PyTorch 部分类似,做一个“反转序列”的任务演示简单的 Encoder-Decoder 结构。为了易于理解,使用 tf.keras.Model 子类化方式实现。

import tensorflow as tf
import numpy as np
import random

# ----- 1. 生成数据 ----- #
def generate_seq2seq_data(num_samples=2000, seq_len=5, vocab_size=10):
    data_input = []
    data_target = []
    for _ in range(num_samples):
        seq = [random.randint(1, vocab_size-1) for _ in range(seq_len)]
        rev_seq = seq[::-1]
        data_input.append(seq)
        data_target.append(rev_seq)
    return np.array(data_input), np.array(data_target)

seq_len = 5
vocab_size = 10
train_input, train_target = generate_seq2seq_data(1800, seq_len, vocab_size)
test_input, test_target = generate_seq2seq_data(200, seq_len, vocab_size)

# ----- 2. 定义 Encoder-Decoder 模型 ----- #
class Encoder(tf.keras.Model):
    def __init__(self, vocab_size, embed_dim, enc_units):
        super().__init__()
        self.embedding = tf.keras.layers.Embedding(vocab_size, embed_dim)
        self.lstm = tf.keras.layers.LSTM(enc_units, return_sequences=True, return_state=True)
        
    def call(self, x):
        x = self.embedding(x)
        outputs, state_h, state_c = self.lstm(x)
        return state_h, state_c

class Decoder(tf.keras.Model):
    def __init__(self, vocab_size, embed_dim, dec_units):
        super().__init__()
        self.embedding = tf.keras.layers.Embedding(vocab_size, embed_dim)
        self.lstm = tf.keras.layers.LSTM(dec_units, return_sequences=True, return_state=True)
        self.fc = tf.keras.layers.Dense(vocab_size)
        
    def call(self, x, hidden, cell):
        # x: (batch, 1)
        x = self.embedding(x)  # (batch, 1, embed_dim)
        outputs, state_h, state_c = self.lstm(x, initial_state=[hidden, cell])
        logits = self.fc(outputs)  # (batch, 1, vocab_size)
        return logits, state_h, state_c

class Seq2Seq(tf.keras.Model):
    def __init__(self, vocab_size, embed_dim, units, seq_len):
        super().__init__()
        self.encoder = Encoder(vocab_size, embed_dim, units)
        self.decoder = Decoder(vocab_size, embed_dim, units)
        self.seq_len = seq_len
        
    def call(self, enc_input, dec_input=None, teacher_forcing=True):
        # enc_input: (batch, seq_len)
        batch_size = tf.shape(enc_input)[0]
        # 编码
        enc_h, enc_c = self.encoder(enc_input)
        
        outputs = []
        dec_x = tf.zeros((batch_size, 1), dtype=tf.int32)  # 初始decoder输入(此处用0)
        if dec_input is not None:
            # 训练时将dec_input与teacher forcing混合
            pass
        
        # 解码
        for t in range(self.seq_len):
            logits, enc_h, enc_c = self.decoder(dec_x, enc_h, enc_c)
            # logits: (batch, 1, vocab_size)
            logits = tf.squeeze(logits, axis=1)  # (batch, vocab_size)
            outputs.append(tf.expand_dims(logits, 1))  # (batch, 1, vocab_size)
            
            predicted_id = tf.argmax(logits, axis=1, output_type=tf.int32)  # (batch,)
            
            if teacher_forcing and dec_input is not None:
                # 50% 概率使用真实输入
                coin = tf.random.uniform(shape=[])
                if coin < 0.5:
                    dec_x = tf.expand_dims(dec_input[:, t], 1)
                else:
                    dec_x = tf.expand_dims(predicted_id, 1)
            else:
                dec_x = tf.expand_dims(predicted_id, 1)

        final = tf.concat(outputs, axis=1)  # (batch, seq_len, vocab_size)
        return final

# ----- 3. 训练流程 ----- #
units = 32
embed_dim = 16
model = Seq2Seq(vocab_size, embed_dim, units, seq_len)

optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

@tf.function
def train_step(enc_inp, dec_inp, dec_tar):
    with tf.GradientTape() as tape:
        predictions = model(enc_inp, dec_inp, teacher_forcing=True)  # (batch, seq_len, vocab_size)
        loss = loss_object(dec_tar, predictions)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

epochs = 5
batch_size = 32

def get_batch(x, y, bs):
    for i in range(0, len(x), bs):
        yield x[i:i+bs], y[i:i+bs]

for epoch in range(epochs):
    total_loss = 0
    for enc_inp, dec_tar in get_batch(train_input, train_target, batch_size):
        enc_inp_tf = tf.convert_to_tensor(enc_inp, dtype=tf.int32)
        dec_inp_tf = tf.convert_to_tensor(dec_tar, dtype=tf.int32)  # 这里直接用目标作为teacher forcing输入
        dec_tar_tf = tf.convert_to_tensor(dec_tar, dtype=tf.int32)
        loss_val = train_step(enc_inp_tf, dec_inp_tf, dec_tar_tf)
        total_loss += loss_val.numpy()
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {total_loss:.4f}")

# ----- 4. 测试 ---- #
def predict(model, inp):
    enc_inp_tf = tf.convert_to_tensor(inp, dtype=tf.int32)
    predictions = model(enc_inp_tf, teacher_forcing=False)  # 不用teacher forcing
    preds_id = tf.argmax(predictions, axis=2)
    return preds_id

sample_input = test_input[:2]
sample_target = test_target[:2]
preds = predict(model, sample_input)

print("Sample input sequences:", sample_input)
print("True reversed sequences:", sample_target)
print("Model predicted sequences:", preds.numpy())

代码要点说明

  1. Encoder / Decoder:分别定义为子类模型,Encoder 返回最终的隐藏状态和细胞状态,Decoder 逐步根据输入 token 和先前状态生成输出。
  2. Seq2Seq 主模型:将 Encoder、Decoder 组合起来,并在 call 方法中实现整体前向逻辑,包含 Teacher Forcing。
  3. 训练循环:在 train_step 中用 tf.GradientTape 手动控制反向传播,计算交叉熵损失并更新参数。
  4. 预测:不启用 Teacher Forcing,Decoder 只能使用自身上一步的预测来生成下一步的输入。

总结

以上 6 个示例分别展示了在 PyTorchTensorFlow 中使用 LSTM 处理不同类型的序列任务:

  1. 时间序列预测(Many-to-One 回归)
  2. 文本分类(Many-to-One 分类)
  3. 序列到序列(Many-to-Many 生成/翻译)

它们涵盖了从简单网络(单层 LSTM + 全连接)到Encoder-Decoder 结构的多种常见用法,并使用模拟数据让代码能在任何环境下快速运行、测试。
在实际应用时,往往需要:

  • 更复杂的数据预处理(如词向量、数据归一化等)
  • 更多层数或更大隐藏维度
  • 适当的正则化措施(如 Dropout)
  • 更完整的训练策略(如学习率衰减、早停等)

读者可根据业务需求灵活调整。希望这些案例有助于理解 LSTM 的基本编程范式与实现要点。祝学习和研究顺利!

### LSTM应用场景 LSTM长短期记忆网络)是一种特殊的RNN架构,能够有效地解决传统RNN中的梯度消失问题,从而在处理长时间依赖关系方面表现出色。其主要应用场景包括但不限于以下几个领域: #### 自然语言处理 (NLP) LSTM在网络文本生成、机器翻译和情感分析等方面表现优异。通过捕捉上下文中长期存在的语义信息,LSTM可以更精确地理解并生成自然语言[^1]。 #### 时间序列预测 对于股票价格预测、天气预报等时间序列数据分析任务,LSTM因其强大的模式识别能力而成为首选工具之一。它可以学习到隐藏在历史数据背后的复杂规律,并据此对未来趋势做出较为准确的推测[^2]。 #### 语音识别 由于音频信号本质上也是一种序列数据形式,因此LSTM同样适用于构建高效的自动语音识别系统,在转录人类口语成文字的过程中发挥重要作用[^3]。 ### LSTM 的实现方法 以下是基于 Python 和 Keras 库的一个简单例子来展示如何创建一个基本的 LSTM 模型用于二分类问题: ```python from keras.models import Sequential from keras.layers import Dense, Embedding, LSTM model = Sequential() # 嵌入层将整数编码词汇转换为密集向量表示 model.add(Embedding(input_dim=5000, output_dim=32)) # 添加一层 LSTM 层含 128 个单元 model.add(LSTM(units=128, dropout=0.2, recurrent_dropout=0.2)) # 输出层采用 sigmoid 函数激活函数进行概率估计 model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) print(model.summary()) ``` 此代码片段展示了如何搭建一个简单的 LSTM 网络结构,其中包含了嵌入层(embedding layer),单层 LSTM 结构以及最终输出层(dense layer with sigmoid function)[^1]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值