TPLLM: A Traffic Prediction Framework Based on Pretrained Large Language Models 论文解读 + LLM用于时间序列相关文章-CSDN博客

本文链接：https://blog.csdn.net/m0_64636251/article/details/141725189

注：本文基于个人对论文和代码的理解，难免会有错误，欢迎大家一起交流和学习。

LLM的跨模态知识转移和小样本学习能力

It is noteworthy that the rapidly advancing pretrained Large Language Models (LLMs) of recent years have demonstrated exceptional proficiency in cross-modality knowledge transfer and few-shot learning.

数据具有序列特性，基于LLM实现预测

Recognizing the sequential nature of traffic data, similar to language, we introduce TPLLM, a novel traffic prediction framework leveraging LLMs.

网络结构：使用两个前置嵌入层，提取原始数据的特征；而后输入给LLM，并且采用lora实现微调

In this framework, we construct a sequence embedding layer based on Convolutional Neural Networks (CNNs) and a graph embedding layer based on Graph Convolutional Networks (GCNs) to extract sequence features and spatial features, respectively. These are subsequently integrated to form inputs that are suitable for LLMs. A LowRank Adaptation (LoRA) fine-tuning approach is applied to TPLLM, thereby facilitating efficient learning and minimizing computational demands.

LLM的能力（可以用于数据插补 data imputation）

Pretrained LLMs are deep learning models trained on large-scale high-quality generalized datasets to capture universal patterns and information. LLMs are widely recognized for generative tasks due to their capabilities of powerful few-shot learning [6] and cross-modality knowledge transfer [7]. Endowed with an extensive array of parameters and a wealth of pre-existing knowledge, LLMs have found applications across a diverse range of domains, notably including transportation. These models exhibit remarkable potential for swift adaptation to a variety of downstream tasks, such as traffic prediction, data imputation, and incident identification. This adaptability is facilitated through the process of fine-tuning, which requires only minimal data [8] to significantly extend the models’ capabilities.

学习框图

数据结构的对齐

a significant structural similarity between multivariate timeseries traffic data and textual data, with both being representable as collections of vectors of consistent dimensionality. This congruence effectively narrows the divide between these distinct types of data, unveiling a promising path for applying LLMs to the analysis of traffic data.

数据处理的核心思想

The central idea of the TPLLM is to shape the multivariate time-series traffic data into a form that is understandable by LLMs in a token embedding-like manner, thus exploiting the prior knowledge in the LLMs.

LLM4TS

Existing LLMs are pre-trained on a general language corpus, which means they fail to learn contextualized information outside linguistic domains; therefore, the time-series alignment stage is proposed to align LLMs with the characteristics of time-series data.

网络结构的考量

To further enhance the model’s understanding of the spatial features of the traffic data, we also append graph-structured spatial information of the road network to the input. The final output from the LLMs is used in order to generate traffic prediction results. To optimize training efficiency and finetuning effectiveness, we employ a Parameter-Efficient FineTuning (PEFT) approach, specifically Low-Rank Adaptation (LoRA) [10], significantly reducing training costs without compromising performance.

总体网络架构

Based on the similarity between time-series traffic data and natural language, we consider the historical data sequence of a single sensor during a period T as a word, and the data X of all sensors in the road network during this period as a sentence.

多通道一维卷积

LLM用于时间序列的相关论文：

Zhou et al. [29] proposed a generalized time-series analysis framework based on cross-modality knowledge migration of pretrained LLMs. This is the first time that a pretrained LLM is used for time series analysis tasks including prediction, classification, interpolation, and anomaly detection. The framework makes input embedding and positional embedding of the input time series and applies the PEFT method to the LLM.

Tian Zhou, Peisong Niu, Xue Wang, Liang Sun, Rong Jin, "One Fits All: Power General Time Series Analysis by Pretrained LM,", NeurIPS, 2023. [paper]

代码：

https://github.com/DAMO-DI-ML/NeurIPS2023-One-Fits-All

DAMO-DI-ML (github.com)

网络结构和LLM4CP类似

GitHub - liaoyuhua/LLM4TS: Large Language & Foundation Models for Time Series.Large Language & Foundation Models for Time Series. - liaoyuhua/LLM4TShttps://github.com/liaoyuhua/LLM4TS Yilong Ren (catalyzex.com)

LLM用于时间序列的文章合集

https://arxiv.org/abs/2308.08469

增强LLM处理时间序列信息的能力

Due to limited large-scale time-series data for building robust foundation models, our approach LLM4TS focuses on leveraging the strengths of pre-trained LLMs. By combining time-series patching with temporal encoding, we have enhanced the capability of LLMs to handle time-series data effectively.

问题构建

aligning the LLMs with the characteristics of time-series data using an autoregressive objective【总体方法类似于LLM4CP中的Embed代码】

时间序列对齐Time-Series Alignment

数据移位自回归：Figure 2(a) illustrates the autoregressive objective in the timeseries alignment stage: given an input sequence of patched time-series data (e.g., 1st patch, 2nd patch, 3rd patch, etc.), the backbone model generates an output sequence shifted one patch to the right (e.g., 2nd patch, 3rd patch, 4th patch, etc.).

标准化：Instance Normalization

目的：Data normalization is essential for stable performance when adapting pre-trained models across various modalities.

为了避免the resulting transformed data becomes unsuitable to be the ground truth for the output.简单归一化至with zero mean and unit standard deviation

Time-Series Tokenization

目的：The context window sizes in pre-trained LLMs (e.g., 1024 in GPT-2) are sufficient for NLP tasks but are inadequate for long-term time-series forecasting.

channel-independence将每个时间步的特征维度由C降维至1：Channel-independence converts multivariate time-series data into multiple univariate timeseries data, thus transforming the data’s dimension(R Tin×C) to R Tin×1, with the channel dimension C merged into the batch size dimension.

patching:patching step groups adjacent time steps into a singular patch-based token, reducing the input sample’s time dimension from Tin to Tp, where Tp denotes the number of patches, and concurrently expanding the feature dimension from 1 to P , with P representing the patch length.(R Tp×P)

Three Encodings for Patched Time-Series Data

目的：the original token encoding layer (designed for text) becomes unsuitable for time-series data due to the mismatched modalities

传统NLP的tokenizer是利用查找表将每一个token映射至高维度：In standard NLP practices, this encoding uses a trainable lookup table to map tokens into a high-dimensional space. However, this method only suits scalar tokens, whereas our patched time-series data are vectors.

1.采用一维卷积层保留局部语义信息：Therefore, we drop the original token encoding layer in the LLM, and employ a one-dimensional convolutional layer Convtoken as our new token encoding layer. As opposed to employing a linear layer [Zhou et al., 2023], we choose a convolutional layer due to its superior ability to retain local semantic information within the time-series data. This results in the generation of the token embedding etoken ∈ R Tp×D, where D denotes the dimension of the embeddings

2.positional encoding layer

3.temporal encoding layer:(参考下面代码)

每个patch都有其时间标记Each timestamp includes a range of multi-scale temporal attributes (e.g., seconds, minutes, hours, holidays, etc.).【下图当中t sec等等】

使用a trainable lookup table for each temporal attribute (e.g., Esec, Emin, ...), mapping it into a high-dimensional space, and then summing them to produce a singular temporal embedding

class FixedEmbedding(nn.Module):
    def __init__(self, c_in, d_model):
        super(FixedEmbedding, self).__init__()

        w = torch.zeros(c_in, d_model).float()
        w.require_grad = False

        position = torch.arange(0, c_in).float().unsqueeze(1)
        div_term = (torch.arange(0, d_model, 2).float()
                    * -(math.log(10000.0) / d_model)).exp()

        w[:, 0::2] = torch.sin(position * div_term)
        w[:, 1::2] = torch.cos(position * div_term)

        self.emb = nn.Embedding(c_in, d_model)
        self.emb.weight = nn.Parameter(w, requires_grad=False)

    def forward(self, x):
        return self.emb(x).detach()


class TemporalEmbedding(nn.Module):
    def __init__(self, d_model, embed_type='fixed', freq='h'):
        super(TemporalEmbedding, self).__init__()

        minute_size = 4
        hour_size = 24
        weekday_size = 7
        day_size = 32
        month_size = 13

        Embed = FixedEmbedding if embed_type == 'fixed' else nn.Embedding
        if freq == 't':
            self.minute_embed = Embed(minute_size, d_model)
        self.hour_embed = Embed(hour_size, d_model)
        self.weekday_embed = Embed(weekday_size, d_model)
        self.day_embed = Embed(day_size, d_model)
        self.month_embed = Embed(month_size, d_model)

    def forward(self, x):
        x = x.long()
        minute_x = self.minute_embed(x[:, :, 4]) if hasattr(
            self, 'minute_embed') else 0.
        hour_x = self.hour_embed(x[:, :, 3])
        weekday_x = self.weekday_embed(x[:, :, 2])
        day_x = self.day_embed(x[:, :, 1])
        month_x = self.month_embed(x[:, :, 0])

        return hour_x + weekday_x + day_x + month_x + minute_x