sql server序列_SQL Server中的Microsoft时间序列

sql server序列

The next topic in our Data Mining series is the popular algorithm, Time Series. Since business users want to forecast values for areas like production, sales, profit, etc., with a time parameter, Time Series has become an important data mining tool. It essentially allows analyzing the past behavior of a variable over time in order to predict its future behavior.

我们的数据挖掘系列中的下一个主题是流行的算法“时间序列”。 由于业务用户希望使用时间参数来预测生产,销售,利润等领域的值,因此时间序列已成为重要的数据挖掘工具。 它实质上允许分析变量的过去行为,以预测其未来行为。

时间序列中的组成部分 (Components in Time Series)

A time series consists of five components:

时间序列包含五个部分:

  • Trend: Trend is the movement of the values. Typically, a given series will have an upward or downward trend 趋势:趋势是价值的移动。 通常,给定的序列将具有上升或下降的趋势
  • Cyclical: Upward or downward repetitive movement of the values over a longer period of time 周期性 :数值在较长时间段内向上或向下重复移动
  • Seasonal: Similar to cyclical, but there can be multiple movements of the values over shorter periods of time, such as hourly, daily, weekly, monthly, etc. 季节性:类似于周期性,但在较短的时间内可以有多个值的移动,例如每小时,每天,每周,每月等。
  • Random: There can be movements in the data values which are totally random but will have an impact on the time series trend. A time-series analysis should identify these exceptions, and account for them in predictions 随机:数据值中的移动可能是完全随机的,但会对时间序列趋势产生影响。 时间序列分析应确定这些异常,并在预测中说明它们
  • Cross: Other factors may affect the trend of a time series. For example, sales of item A may be dependent on seasonal factors, but may also be affected by the sales of item B. If we take the production of a crop as an example, it will be dependent on rainfall or temperature trends 交叉:其他因素可能会影响时间序列的趋势。 例如,项目A的销售可能取决于季节因素,但也可能受项目B的销售影响。如果我们以农作物的生产为例,则取决于降雨或温度趋势

SQL Server中的Times系列 (Times Series in SQL Server)

To demonstrate time series analysis using SQL Server, we will use the vTimeSeries view in the AdventureWorksDW2017 sample database. Here is the sample data set:

为了演示使用SQL Server进行时间序列分析,我们将使用AdventureWorksDW2017示例数据库中的vTimeSeries视图。 这是样本数据集:

Sample data set for vTimeSeries

We will use only the first four columns, which are ModelRegion, TimeIndex, Quantity and Amount.

我们将仅使用前四列,即ModelRegion,TimeIndex,Quantity和Amount。

As discussed in the first article, create a sample analysis service project with Visual Studio or SQL Server Data Tools (SSDT). Then, create a Data Source connection to the AdventureWorksDW2017 database and add the vTimeSeries view to the Data Source View.

第一篇文章所述 ,使用Visual Studio或SQL Server数据工具(SSDT)创建一个示例分析服务项目。 然后,创建到AdventureWorksDW2017数据库的数据源连接,并将vTimeSeries视图添加到数据源视图。

Next, create a Mining Structure with the Microsoft Time Series data mining technique. Select it from the available list of data mining techniques.

接下来,使用Microsoft时间序列数据挖掘技术创建一个挖掘结构。 从数据挖掘技术的可用列表中选择它。

Then, select the necessary attributes to create the Time Series model. The Time Series model needs two compulsory parameters and one optional parameter. It requires a single time column which will be the key for the model. This has to be a column with the same intervals. For example, if you have monthly data, the entire data set should be monthly and should not contain different intervals of data. Another compulsory column is the column that you want to predict from the Time Series technique. This should be a continuous and numerical variable such as sales, temperature, quantity, etc.

然后,选择必要的属性以创建时间序列模型。 时间序列模型需要两个强制参数和一个可选参数。 它需要一个时间列,这将是模型的关键。 这必须是具有相同间隔的列。 例如,如果您有每月数据,则整个数据集应为每月数据,并且不应包含不同的数据间隔。 另一个强制性列是您要根据时间序列技术进行预测的列。 这应该是一个连续的数字变量,例如销售额,温度,数量等。

Optionally, you can define multiple series in a one-time series. For example, in a sales time series, you might want to analyze the trend by region. Therefore, the region will be an additional and optional key.

(可选)您可以一次定义多个序列。 例如,在销售时间序列中,您可能需要按区域分析趋势。 因此,该区域将是一个附加的和可选的键。

Here is the configuration for the parameters in the Time Series:

这是时间序列中参数的配置:

Specifing Parameters for Microsoft Time Series Technique

In the above configuration, both Quantity and Amount are configured as the input and prediction columns. As mentioned before, these two columns should be numerical and continuous variables. TimeIndex is the key column that is used to identify the time component of the data set. ModelRegion is the optional series column from which users can predict Region and Product Model quantities and sales amounts.

在以上配置中,“ 数量”和“ 金额”均被配置为输入和预测列。 如前所述,这两列应该是数字变量和连续变量。 TimeIndex是用于标识数据集的时间成分的关键列。 ModelRegion是可选的系列列,用户可以从中预测“区域”和“产品模型”的数量以及销售额。

The following image shows data types, in case the user wants to change them:

下图显示了数据类型,以防用户想要更改它们:

Data types for selected attributes in Time Series Model

However, in this example, you can leave the default data types as it is. With that, time series model creation is done, and the data mining model needs to be processed.

但是,在此示例中,您可以保留默认数据类型。 这样,完成了时序模型的创建,并且需要处理数据挖掘模型。

Let us view the time series trend with the predictions:

让我们通过预测来查看时间序列趋势:

Future Predictions in Time Series Analysis.

In the graph, predictions are shown by dotted lines. One important thing to remember is that the time series does not understand time index cycles. It just uses the number that follows the previous one. For example, after 201212, the next number will be 201213, not the calendar designation 201301.

在图中,预测用虚线表示。 要记住的重要一件事是时间序列不了解时间索引周期。 它仅使用前一个之后的数字。 例如,在201212之后,下一个数字将是201213,而不是日历名称201301。

From the configuration available at the top of the Mining Model Viewer, the user has the option to see more information.

通过Mining Model Viewer顶部的可用配置,用户可以选择查看更多信息。

By setting the Show Deviation to on, you can view the deviations for the predicted values to judge the accuracy of your model. A lower deviation means higher accuracy. Deviations are shown in the following image:

通过将“ 显示偏差”设置为开,您可以查看预测值的偏差,以判断模型的准确性。 偏差越小意味着精度越高。 下图显示了偏差:

Showing Deviations in Time Series

In the Mining Models tab, you can set the predictive attributes for Predict or PredictOnly as shown below:

在“ 挖掘模型”选项卡中,可以为PredictPredictOnly设置预测属性,如下所示:

Setting up Predict or Predict Only.

Predict means the attribute is used to predict, and predicted value is used to predict the next values, whereas the PredictOnly parameter means the attribute is used only for prediction, and predicted value is not used for the next predictions.

Predict表示该属性用于预测,预测值用于预测下一个值,而PredictOnly参数意味着该属性仅用于预测,而预测值不用于下一个预测。

型号参数 (Model parameters)

Model parameters are used to change the parameters to suit the data environment. Although the default parameters provide the best results, users have the option to change them accordingly:

模型参数用于更改参数以适合数据环境。 尽管默认参数提供了最佳结果,但用户可以选择相应地更改它们:

Model parameters for Time Series Technique

AUTO_DETECT_PERIODICTY (AUTO_DETECT_PERIODICTY)

This parameter specifies a value between 0 and 1 used to detect periodicity for the time series. By setting it to 1, the time series algorithm will automatically detect the periodicity. However, this can cause a performance issue during model building. Setting the value to 0 indicates that the algorithm will detect only the strong periodic data.

此参数指定介于0和1之间的值,用于检测时间序列的周期性。 通过将其设置为1,时间序列算法将自动检测周期性。 但是,这可能会在模型构建期间导致性能问题。 将该值设置为0表示该算法将仅检测强周期数据。

FORECAST_METHOD (FORECAST_METHOD)

FORECAST_METHOD specifies which forecasting algorithm is used. If the MIXED method is chosen, it creates models for both ARTXP and ARIMA time series algorithms, and their results will be combined during the prediction phase. In the standard edition of SQL Server, the models are combined using an automatic ratio that favors ARTXP for near-term and ARIMA for long-term prediction. In higher editions such as Enterprise edition, the models are combined and weighted according to the value set for PREDICTION_SMOOTHING. When the FORECAST_METHOD is set to either ARTXP or ARIMA, the value for the PREDICTION_SMOOTHING parameter is ignored.

FORECAST_METHOD指定使用哪种预测算法。 如果选择了MIXED方法,它将为ARTXP和ARIMA时间序列算法创建模型,并且它们的结果将在预测阶段进行合并。 在SQL Server的标准版中,使用自动比率组合模型,该比率对ARTXP有利,而对ARIMA有利于长期预测。 在企业版等较高的版本中,根据为PREDICTION_SMOOTHING设置的值对模型进行组合和加权。 当FORECAST_METHOD设置为ARTXP或ARIMA时,将忽略PREDICTION_SMOOTHING参数的值。

PERIODICITY_HINT (PERIODICITY_HINT)

This parameter provides a hint to the algorithm as to the periodicity of the data so the Time Series model performs better. Although the Time Series has the option of identifying the periodicity, it is better to provide the periodicity to the model. For example, if you have data with a periodicity of monthly, weekly and daily, you can configure PERODICITY_HINT such as {12,7,1}.

此参数为算法提供了有关数据周期性的提示,因此时间序列模型的性能更好。 尽管时间序列可以选择确定周期性,但最好为模型提供周期性。 例如,如果您具有每月,每周和每天的周期性数据,则可以配置PERODICITY_HINT,例如{12,7,1}。

MISSING_VALUE_SUBSTITUTION (MISSING_VALUE_SUBSTITUTION)

For Time Series, there cannot be gaps in the data set. However, due to various practical reasons, there may be instances where all data cannot be captured. SQL Server Time Series provides a method to substitute missing values. The default is None, which is suited for a data set without missing values. Mean values set the mean of the existing values to the missing values, whereas the Previous option sets the missing value with the previous values. Also, the user can set a constant value; this is not recommended.

对于时间序列,数据集中不能有空白。 但是,由于各种实际原因,可能存在无法捕获所有数据的情况。 SQL Server时间序列提供了一种替代缺失值的方法。 默认值为“无”,适用于不丢失值的数据集。 平均值将现有值的平均值设置为缺失值,而“上一个”选项将缺失值与先前值一起设置。 另外,用户可以设置一个恒定值; 不建议这样做。

目录 (Table of contents)

Introduction to SQL Server Data Mining
Naive Bayes Prediction in SQL Server
Microsoft Decision Trees in SQL Server
Microsoft Time Series in SQL Server
Association Rule Mining in SQL Server
Microsoft Clustering in SQL Server
Microsoft Linear Regression in SQL Server
Implement Artificial Neural Networks (ANNs) in SQL Server
Implementing Sequence Clustering in SQL Server
Measuring the Accuracy in Data Mining in SQL Server
Data Mining Query in SSIS
Text Mining in SQL Server
SQL Server数据挖掘简介
SQL Server中的朴素贝叶斯预测
SQL Server中的Microsoft决策树
SQL Server中的Microsoft时间序列
SQL Server中的关联规则挖掘
SQL Server中的Microsoft群集
SQL Server中的Microsoft线性回归
在SQL Server中实现人工神经网络(ANN)
在SQL Server中实现序列聚类
在SQL Server中测量数据挖掘的准确性
SSIS中的数据挖掘查询
SQL Server中的文本挖掘

翻译自: https://www.sqlshack.com/microsoft-time-series-in-sql-server/

sql server序列

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值