TimeGPT文档

木悠铎753Q

已于 2024-03-16 15:47:30 修改

阅读量249

点赞数 1

文章标签：人工智能语言模型 gpt

于 2024-03-11 13:38:49 首次发布

原文链接：https://nixtlaverse.nixtla.io/nixtla/index.html

版权

本文介绍了Nixtla开发的TimeGPT，一种基于大规模数据训练的预训练Transformer模型，专用于时间序列预测。文章详细讲解了如何安装、使用和进行基本操作，包括入门示例、异常检测方法以及设置身份验证令牌的过程。

摘要由CSDN通过智能技术生成

1 TimeGPT

TimeGPT，由Nixtla开发，是一种专门用于预测任务的生成式预训练Transformer模型。TimeGPT在史上最大的数据集上进行了训练——超过1000亿行的金融、天气、能源和网络数据——并使时间序列分析的能力民主化。这个工具能够在几秒钟内识别模式并预测未来的数据点。

1.1 开始入门

1.1.1 安装

pip install nixtlats

1.1.2 如何使用

只需导入库，设置您的token，就可以用两行代码开始进行预测！

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv')

from nixtlats import TimeGPT

timegpt = TimeGPT(
    # defaults to os.environ.get("TIMEGPT_TOKEN")
    token = 'my_token_provided_by_nixtla'
)

fcst_df = timegpt.forecast(df, h=24, level=[80, 90])

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: H
INFO:nixtlats.timegpt:Restricting input...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...

timegpt.plot(df, fcst_df, level=[80, 90], max_insample_length=24 * 5)

1.2 TimeGPT 快速入门

释放准确预测的力量，自信地应对不确定性。减少不确定性和资源限制。

1.2.1 介绍

Nixtla的TimeGPT是一种用于时间序列数据的生成式预训练预测模型。TimeGPT可以在没有训练的情况下为新的时间序列产生准确的预测，仅使用历史值作为输入。TimeGPT可用于各种任务，包括需求预测、异常检测、金融预测等。

TimeGPT模型像人类阅读句子一样“阅读”时间序列数据——从左到右。它查看过去数据的窗口，我们可以将其视为“标记”，并预测接下来会发生什么。这种预测基于模型在过去数据中识别出的模式，并推断到未来。

API提供了一个与TimeGPT交互的接口，允许用户利用其预测能力来预测未来事件。TimeGPT还可用于其他与时间序列相关的任务，如假设情景分析、异常检测等。

1.2.2 用法

from nixtlats import TimeGPT

/home/ubuntu/miniconda/envs/nixtlats/lib/python3.11/site-packages/statsforecast/core.py:25: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from tqdm.autonotebook import tqdm

您可以实例化TimeGPT类，并提供您的token。

timegpt = TimeGPT(
    # defaults to os.environ.get("TIMEGPT_TOKEN")
    token = 'my_token_provided_by_nixtla'
)

您可以调用validate_token方法来测试您的token的有效性。

timegpt.validate_token()

INFO:nixtlats.timegpt:Happy Forecasting! :), If you have questions or need support, please email ops@nixtla.io

True

现在您可以开始进行预测了！让我们导入一个经典的AirPassengers数据集的示例。该数据集包含了1949年至1960年间澳大利亚航空公司乘客的月度数量。首先，让我们加载数据集并绘制它：

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df.head()

timegpt.plot(df, time_col='timestamp', target_col='value')

数据的重要要求：

确保目标变量列没有缺失或非数值的值。
在第一个和最后一个日期时间戳之间，不要包含缺口/跳跃（对于给定的频率）。预测函数不会填补缺失的日期。
日期时间戳列的格式应该可被Pandas读取（参见此链接了解更多详情）。

接下来，使用SDK的forecast方法预测未来12个月。设置以下参数：

df：包含时间序列数据的Pandas数据框。
h：要预测的步数。
freq：时间序列的频率，以Pandas格式表示。请参阅Pandas提供的可用频率。
time_col：标识日期时间戳列的列。
target_col：我们想要预测的变量。

timegpt_fcst_df = timegpt.forecast(df=df, h=12, freq='MS', time_col='timestamp', target_col='value')
timegpt_fcst_df.head()

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...

timegpt.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')

您还可以通过增加horizon参数来生成更长的预测。例如，让我们预测接下来的36个月：

timegpt_fcst_df = timegpt.forecast(df=df, h=36, time_col='timestamp', target_col='value', freq='MS')
timegpt_fcst_df.head()

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
WARNING:nixtlats.timegpt:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtlats.timegpt:Calling Forecast Endpoint...

timegpt.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')

或者更短的一个预测：

timegpt_fcst_df = timegpt.forecast(df=df, h=6, time_col='timestamp', target_col='value', freq='MS')
timegpt.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...

警告：TimeGPT-1目前针对短期预测进行了优化。虽然forecast方法允许任何正数且较大的horizon，但预测的准确性可能会下降。我们目前正在努力提高长期预测的准确性。

1.2.3 使用DateTime索引推断频率

freq参数特别关键，它表示连续数据点之间的时间单位。幸运的是，您可以将具有DateTime索引的DataFrame传递给预测方法，确保您的时间序列数据具备必要的时间特征。通过为DataFrame的DateTime索引分配一个合适的freq参数，您可以告知模型观察之间的一致间隔——无论是天（'D'）、月（'M'）还是其他适当的频率。

df_time_index = df.set_index('timestamp')
df_time_index.index = pd.DatetimeIndex(df_time_index.index, freq='MS')
timegpt.forecast(df=df, h=36, time_col='timestamp', target_col='value').head()

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
WARNING:nixtlats.timegpt:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtlats.timegpt:Calling Forecast Endpoint...

1.3建立您的身份验证令牌（Authentication Token）

一个令牌，也称为API密钥，是一串唯一的字符，用作验证您对TimeGTP的请求的密钥。本教程将解释如何在使用Nixtla SDK时设置您的令牌。

注册后，您将收到一封确认注册的电子邮件。确认后，您将获得访问仪表板的权限。在那里，在API密钥下，您将找到您的令牌。要将您的令牌集成到使用Nixtla SDK的开发工作流程中，您有两种方法。

1.3.1 直接粘贴和复制

步骤1：复制您在仪表板的API密钥中找到的令牌。
步骤2：通过直接将您的令牌粘贴到代码中实例化TimeGPT类，如下所示：

from nixtlats import TimeGPT 
timegpt = TimeGPT(token = 'your token here')

这种方法简单直接，适用于快速测试或不需要共享的脚本。

1.3.2 使用环境变量

步骤1：将您的令牌存储在一个名为TIMEGPT_TOKEN的环境变量中。这可以根据您的偏好进行会话或永久性地设置。
步骤2：当您实例化TimeGPT类时，SDK将自动查找TIMEGPT_TOKEN环境变量，并将其用于对您的请求进行身份验证。

from nixtlats import TimeGPT
timegpt = TimeGPT()

重要提示：环境变量必须准确命名为TIMEGPT_TOKEN，全部大写字母，拼写无误，以便SDK能够识别它。

设置环境变量有几种方法。

a.从终端设置

使用 export 命令来设置 TIMEGPT_TOKEN 。

export TIMEGPT_TOKEN=your_token

b.使用.env文件

对于更持久的解决方案，如果需要保密可以进行版本控制，或者方便在不同项目中使用，可以将您的令牌放在一个`.env`文件中。

# Inside a file named .env
TIMEGPT_TOKEN=your_token

在Python中：如果使用 .env 文件，您可以在Python脚本中加载环境变量。使用 dotenv 包加载 .env 文件，然后实例化TimeGPT类。

from dotenv import load_dotenv
load_dotenv()

from nixtlats import TimeGPT
timegpt = TimeGPT()

这种方法更安全，适用于将部署或共享的应用程序，因为它可以将令牌从源代码中分离出来。

重要提示：请记住，您的令牌就像密码一样-保持它的机密性和安全性！

验证你的token

您可以随时在仪表板的API密钥部分找到您的令牌。要检查令牌的状态，请使用TimeGPT类的 validate_token 方法。该方法将在令牌有效时返回True，否则返回False。

timegpt.validate_token()

INFO:nixtlats.timegpt:Happy Forecasting! :), If you have questions or need support, please email ops@nixtla.io

True

您并不需要每次使用TimeGPT时都验证您的令牌。此功能旨在方便您确保其有效性。要完全访问TimeGPT的功能，除了一个有效的令牌外，您还需要在您的账户中有足够的积分。您可以在仪表板的使用情况部分检查您的积分。

2 教程

2.1 异常检测

时间序列数据中的异常检测在金融、医疗保健、安全和基础设施等多个领域发挥着关键作用。本质上，时间序列数据表示按时间顺序索引（或列出或绘制）的一系列数据点，通常具有等间隔。随着系统和流程变得越来越数字化和互连，监测和确保它们的正常行为的需求也相应增长。检测异常可以指示潜在问题、故障甚至恶意活动。通过及时识别这些与预期模式的偏差，组织可以采取预防措施，优化流程或保护资源。TimeGPT包括 detect_anomalies 方法，用于自动检测异常。

import pandas as pd
from nixtlats import TimeGPT

timegpt = TimeGPT(
    # defaults to os.environ.get("TIMEGPT_TOKEN")
    token = 'my_token_provided_by_nixtla'
)

detect_anomalies 方法旨在处理包含时间序列的数据帧，并根据观测值的异常性质对其进行标记。该方法评估输入数据帧中每个观测值与其在时间序列中的上下文，使用统计量确定其成为异常的可能性。默认情况下，该方法基于99%的预测区间来识别异常。超出此区间的观测值被视为异常。生成的数据帧将具有一个额外的标签"anomaly"，对于异常观测值，该标签设置为1，否则设置为0。

pm_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/peyton_manning.csv')
timegpt_anomalies_df = timegpt.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D')
timegpt_anomalies_df.head()

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Calling Anomaly Detector Endpoint...

timegpt.plot(pm_df, 
             timegpt_anomalies_df,
             time_col='timestamp', 
             target_col='value')

尽管`detect_anomalies`方法的默认行为是使用99%的预测区间进行操作，但用户可以根据其需求灵活调整此阈值。这可以通过修改`level`参数来实现。减小`level`参数的值将导致更窄的预测区间，进而识别更多观测值为异常。请参见下一个例子。

timegpt_anomalies_df = timegpt.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D', level=90)
timegpt.plot(pm_df, 
             timegpt_anomalies_df,
             time_col='timestamp', 
             target_col='value')

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Calling Anomaly Detector Endpoint...

相反，增加该值将使预测区间变大，从而检测到更少的异常。这种定制化允许用户校准方法的灵敏度，以与其特定用例保持一致，确保从数据中获得最相关和可操作的见解。

timegpt_anomalies_df = timegpt.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D', level=99.99)
timegpt.plot(pm_df, 
             timegpt_anomalies_df,
             time_col='timestamp', 
             target_col='value')

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Calling Anomaly Detector Endpoint...

您还可以包括`date_features`以更好地检测异常：

timegpt_anomalies_df_x = timegpt.detect_anomalies(
    pm_df, time_col='timestamp', 
    target_col='value', 
    freq='D', 
    date_features=True,
    level=99.99,
)
timegpt.plot(
    pm_df, 
    timegpt_anomalies_df_x,
    time_col='timestamp', 
    target_col='value',
)

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Calling Anomaly Detector Endpoint...

2.2 外生变量

此外，您可以传递外生变量以更好地向TimeGPT提供有关数据的信息。您只需在目标列后添加外生回归变量即可。

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df.head()

现在让我们考虑这些信息来计算异常。

timegpt_anomalies_df_x = timegpt.detect_anomalies(df=df)
timegpt.plot(
    df, 
    timegpt_anomalies_df_x,
)

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: H
INFO:nixtlats.timegpt:Calling Anomaly Detector Endpoint...

木悠铎753Q

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
TimeGPT文档

Nixtla的TimeGPT是一种用于时间序列数据的生成式预训练预测模型。TimeGPT可以在没有训练的情况下为新的时间序列产生准确的预测，仅使用历史值作为输入。TimeGPT可用于各种任务，包括需求预测、异常检测、金融预测等。TimeGPT模型像人类阅读句子一样“阅读”时间序列数据——从左到右。它查看过去数据的窗口，我们可以将其视为“标记”，并预测接下来会发生什么。这种预测基于模型在过去数据中识别出的模式，并推断到未来。API提供了一个与TimeGPT交互的接口，允许用户利用其预测能力来预测未来事件。
复制链接

扫一扫