论文&实验 Large Language Models AreZero-Shot Time Series Forecasters

是攸宁啊

已于 2024-02-02 17:56:30 修改

阅读量1.4k

点赞数 23

文章标签：语言模型人工智能自然语言处理

于 2024-01-31 23:07:05 首次发布

本文链接：https://blog.csdn.net/Msc30839573/article/details/135949786

版权

文章探讨了如何利用大型语言模型（LLMs）如GPT-3和LLaMA-2进行时间序列预测，通过将时间序列转换为文本序列的下一个token预测。文章强调了LLMs的零值预测能力，但面临如何有效处理连续值和缺失数据的问题，以及模型升级可能导致性能下降的原因。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、概览

主要内容：

提出了一种模型LLMTime, 使用llm进行时间序列预测

将时间序列编码为一串数字，可以将时间序列预测转换为文本的下一个token预测

优点：

1.时序预测

2.LLMTIME can be used without any fine-tuning on the downstream data used by other models.（zero-shot 通过上下文学习：Large language models, for example GPT-3 or LLaMA-2, accomplish this form of generalization through in-context learning, which identifies patterns in the language model’s prompt and extrapolates them through next-token prediction.）

困难：

1.如何将时间序列tokenizing, 并将discrete distibutions over tokens转化为highly flexible densities over continuous values(在连续值上的高度灵活密度)

2.时间序列预测的常见应用，如天气或金融数据，需要从仅包含少量可能信息的观测中进行推断，这使得精确的点预测几乎不可能，因此uncertainty estimation(不确定性评估)尤为重要。

3.llm如何zero-shot 推断时间序列，这是不是与时间序列的周期性有关

4.面对时间序列的缺失值该如何处理

By encoding time series as a string of numerical digits, we can frame time series
forecasting as next-token prediction in text. Developing this approach, we find that
large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-
shot extrapolate time series at a level comparable to or exceeding the performance
of purpose-built time series models trained on the downstream tasks. To facilitate
this performance, we propose procedures for effectively tokenizing time series data
and converting discrete distributions over tokens into highly flexible densities over
continuous values. We argue the success of LLMs for time series stems from their
ability to naturally represent multimodal distributions, in conjunction with biases
for simplicity, and repetition, which align with the salient features in many time
series, such as repeated seasonal trends. We also show how LLMs can naturally
handle missing data without imputation through non-numerical text, accommodate
textual side information, and answer questions to help explain predictions. While
we find that increasing model size generally improves performance on time series,
we show GPT-4 can perform worse than GPT-3 because of how it tokenizes
numbers, and poor uncertainty calibration, which is likely the result of alignment
interventions such as RLHF.

二、模型

tokenization:

（回答困难1.1）

将数分为单个数字

Every language model has an associated tokenizer, which breaks an input string into a sequence of tokens, each belonging to V(词汇表)

Time series data typically takes the exact same form as language modeling data, as a collection of sequences {Ui = (u1, . . . , uj, . . . , uni )}, but in time series uj is numerical.

In time series, we design LLaMA tokenizer to map numbers to individual digits

To remedy the tokenization of GPT models, we separate the digits with spaces to force a separate tokenization of each digit and use a comma (" ,") to separate each time step in a time series. Because decimal points are redundant given a fixed precision, we drop them in the encoding to save on context length. Thus, with e.g. 2
digits of precision, we pre-process a time series as follows before feeding into the tokenizer:
0.123, 1.23, 12.3, 123.0 → " 1 2 , 1 2 3 , 1 2 3 0 , 1 2 3 0 0".

Continuous likelihoods

（回答困难1.2）

通过将每个数字视为一个具有 B 个可能类别的分类问题，可以将模型的输出概率分布视为一种层次 softmax 分布，其中每个数字的概率分布是由前面的数字决定的。为了将离散的概率分布转化为连续的概率密度分布，可以将输出的每个数字分配到一个离散的 bin 中，然后在每个 bin 上放置一个均匀分布，最终得到一个混合的、连续的概率密度分布。

由于这种方法可以处理任意多个数字位数，因此可以非常灵活地表示高分辨率的连续概率分布。通过这种方式，即使使用离散的数字表示，也可以实现高效、精确和灵活的连续概率密度建模。