序列模型小结

最新推荐文章于 2024-05-28 08:20:48 发布

你好，李不理

最新推荐文章于 2024-05-28 08:20:48 发布

阅读量528

点赞数

分类专栏：动手深度学习NLP Pytorch 文章标签：自然语言处理深度学习

本文链接：https://blog.csdn.net/wLtyh/article/details/124676652

版权

Pytorch 同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

动手深度学习NLP

3 篇文章 0 订阅

订阅专栏

提示：该博客全部参考李沐老师的《动手深度学习》这门课，可在B站自行查看。该博客写的不好，不用看了，大家直接看李沐老师的就好了，里面有一个画图函数My_plot可能对大家有用，大家可以看看。

前言

一、序列模型

我自己的理解
序列模型：为解决含有序列数据的一类问题而建立的数学模型。序列数据是指数据含有时序结构的，如音乐、语音、文本和视频。

1.1 统计工具

假设我们已知时序序列 $x_1,x_2,...,x_{t-1}$ ,需要预测时间步t数据 $x_t$ ，使用统计工具建立预测如下： $x_t{\sim}P(x_t | x_{t-1},...,x_1)$

1.1.1自回归模型

当时序过长的话，我们使用 $x_1,...,x_{t-1}$ 来预测 $x_t$ 的话会给我们计算带来困难，因此接下来的内容围绕如何有效估计 $P(x_t | x_{t-1},...,x_1)$ 展开。简单来说，该问题可以归结为以下两种策略。
第一种策略：我们总选取某个长度 $τ$ ,使用观测序列 $x_{t-1},...,x_{t-τ}$ 。来预测时刻t的数据 $x_t$ 。这种模型称为自回归模型。
第二种策略：如下图所示，总是保留一些对过去观测的总结 $h_{t}$ ，并同时更新预测 $\hat{x_t}$ 和总结 $h_t$ 。这就产生了基于 $\hat{x_t}=P(x_t | h_t)$ 估计 $x_t$ ，以及公式 $h_t=g(h_{t-1},x_{t-1})$ 更新的模型。由于 $h_t$ 从未被观测到，这类模型也被称为隐变量自回归模型。
在这里插入图片描述
如何生成训练数据？
一个经典方法是使用历史观测来预测下一个未来观测。即有 $P(x_1,...,x_T)=\prod_{t=1}^TP(x_t | x_{t-1},...,x_1)$
注意，如果我们处理的是离散的对象（如单词），而不是连续的数字，则上述的考虑仍然有效。唯一的差别是，对于离散的对象，我们需要使用分类器而不是回归模型来估计 $P(x_t | x_{t-1},...,x_1)$

1.1.2 马尔科夫模型

回想一下，在自回归模型的近似法中，我们使用 $x_{t-1},...,x_{t-τ}$ 而不是 $x_{t-1},...,x_1$ 来估计 $x_t$ 。只要这种是近似精确的，我们就说序列满足马尔可夫条件（Markov condition）。特别是，如果 $τ = 1$ ，就得到一个一阶马尔科夫模型， $P (x)$ 由下式给出： $P(x_1,...,x_T) = \prod_{t=1}^TP(x_t | x_{t-1})当P(x_1| x_0) = P(x_1)$
在这里插入图片描述

1.1.3 参考代码

%matplotlib inline
import torch
from torch import nn
import matplotlib.pyplot as plt
from IPython import display

'''
    画图参考博客：
        https://blog.csdn.net/qq_46018418/article/details/116140271
        https://www.bilibili.com/read/cv15697715
'''
def My_plot(*data, figsize=None, title=None, xlabel=None, ylabel=None, xlim=None, ylim=None, 
            legend=None, is_grid=False, xscale='linear', yscale='linear'):
    # 用这种格式图片显示更清晰,也可以使用jpg，png等其他格式
    display.set_matplotlib_formats('svg')
    # 设置图片编号和图片大小
    if figsize is not None:
        plt.figure(num=0, figsize=figsize)
    else:
        plt.figure(num=0)
    # 设置标题
    if title is not None:
        plt.title(title)
    # 设置X轴名称和y轴名称
    if xlabel is not None and ylabel is not None:
        plt.xlabel(xlabel, fontsize=12)
        plt.ylabel(ylabel, fontsize=12)
    # 设置X轴范围
    if xlim is not None:
        plt.xlim(xlim)
    # 设置y轴范围
    if ylim is not None:
        plt.ylim(ylim)
    # 画图使用网格线，采用默认参数
    if is_grid:
        plt.grid()
    # 对x，y进行缩放
    plt.xscale(xscale)
    plt.yscale(yscale)
    # 画图
    for X, y in data:
        assert len(X)==len(y),"shape error"
        plt.plot(X, y)
        # plt.plot(X, y,  color='blue', linewidth=0.5, linestyle='-')
    if legend is not None:
        plt.legend(legend)

T = 1000  # 总共产生1000个点
time = torch.arange(1, T + 1, dtype=torch.float32)
# 生成数据加入，均值为0，标准差为0.2的噪声
x = torch.sin(0.01 * time) + torch.normal(0, 0.2, (T,))
My_plot([time, x], figsize=(6,3), title="fig one", xlabel="time", ylabel='X', xlim=[1, T], legend=['True data'], is_grid=True)

输出：
在这里插入图片描述

from torch.utils import data

# 这里即4阶马尔科夫链，当前值只与前面四个状态有关
tau = 4
# features.shape -> (996, 4)
features = torch.zeros((T - tau, tau))
print("features shape：", features.shape)
# x.shape -> (1000,)
for i in range(tau):
    features[:, i] = x[i: T - tau + i]
labels = x[tau:].reshape((-1, 1))

def load_array(data_arrays, batch_size, is_train=True):
    """构造一个PyTorch数据迭代器"""
    dataset = data.TensorDataset(*data_arrays)
    return data.DataLoader(dataset, batch_size, shuffle=is_train)

# 这里只使用了前600个数据进行训练
batch_size, n_train = 16, 600
# 只有前n_train个样本用于训练
train_iter = load_array((features[:n_train], labels[:n_train]),
                            batch_size, is_train=True)

输出：
在这里插入图片描述

# 初始化网络权重的函数
def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.xavier_uniform_(m.weight)

# 一个简单的多层感知机
def get_net():
    net = nn.Sequential(nn.Linear(4, 10),
                        nn.ReLU(),
                        nn.Linear(10, 1))
    net.apply(init_weights)
    return net

# 平方损失。注意：MSELoss计算平方误差时不带系数1/2
loss = nn.MSELoss(reduction='none')

'''这个是我们为了在训练过程中存储某些数据而定义的一个类'''
class Accumulator:
    def __init__(self, n):
        self.data = [0.0] * n
    
    def add(self, *args):
        self.data = [a + float(b) for a, b in zip(self.data, args)]
        
    def __getitem__(self, idx):
        assert idx<len(self.data)
        return self.data[idx]

def evaluate_loss(net, data_iter, loss):
    """Evaluate the loss of a model on the given dataset.

    Defined in :numref:`sec_model_selection`"""
    metric = Accumulator(2)  # Sum of losses, no. of examples
    for X, y in data_iter:
        out = net(X)
        #print(y.shape)
        y = torch.reshape(y, out.shape)
        l = loss(out, y)
        #print(l.shape)
        # item()方法只有在tensor只有一个元素的时候才能使用
        metric.add(l.sum().item(), l.shape[0])
    return metric[0] / metric[1]

def train(net, train_iter, loss, epochs, lr):
    trainer = torch.optim.Adam(net.parameters(), lr)
    for epoch in range(epochs):
        for X, y in train_iter:
            trainer.zero_grad()
            l = loss(net(X), y)
            l.mean().backward()
            trainer.step()
        print(f'epoch {epoch + 1}, '
              f'loss: {evaluate_loss(net, train_iter, loss):f}')

net = get_net()
train(net, train_iter, loss, 10, 0.005)

输出：
在这里插入图片描述

'''
    从时间步time[tau:]，我们使用的输入数据均是源数据，但我们真实训练的数据为前600个时间步，所以从600个时间步的
    预测开始，我们应该要使用预测数据来进行预测。
'''
onestep_preds = net(features)
# 使用detach()方法是使其不能求导，单与源数据共享内存
My_plot([time, x.detach().numpy()],
         [time[tau:], onestep_preds.detach().numpy()], xlabel='time',
         ylabel='x', legend=['data', '1-step preds'], xlim=[1, 1000],
         figsize=(6, 3), is_grid=True)

输出：
在这里插入图片描述

'''从600时间步后我们逐渐使用我们预测数据来进行预测，看看效果'''
multistep_preds = torch.zeros(T)
multistep_preds[: n_train + tau] = x[: n_train + tau]
for i in range(n_train + tau, T):
    multistep_preds[i] = net(
        multistep_preds[i - tau:i].reshape((1, -1)))

My_plot([time, x.detach().numpy()], [time[tau:], onestep_preds.detach().numpy()], [time[n_train + tau:], multistep_preds[n_train + tau:].detach().numpy()],
        xlabel='time',ylabel='x', legend=['data', '1-step preds', 'multistep preds'],xlim=[1, 1000], figsize=(6, 3), is_grid=True)

输出：
在这里插入图片描述

# T = 1000
max_steps = 64
features = torch.zeros((T - tau - max_steps + 1, tau + max_steps))
print(features.shape)
for i in range(tau):
    features[:, i] = x[i: i + T - tau - max_steps + 1]
    
# 列i（i>=tau）是来自（i-tau+1）步的预测，其时间步从（i+1）到（i+T-tau-max_steps+1）
for i in range(tau, tau + max_steps):
    features[:, i] = net(features[:, i - tau:i]).reshape(-1)

# 这里使用1,4,16步预测时，并没有完全预测到时间步1000,这里省略了一些，因为我们要满足64要预测到1000，为了方便把它们合起来写，就偷了一下懒
steps = (1, 4, 16, 64)
x_times = [time[tau + i - 1: T - max_steps + i] for i in steps]
ys = [features[:, (tau + i - 1)].detach().numpy() for i in steps]
datas = (data for data in zip(x_times, ys))
My_plot(*datas, xlabel='time', ylabel = 'x',
         legend=[f'{i}-step preds' for i in steps], xlim=[5, 1000],
         figsize=(6, 3), is_grid=True)