时间序列预测 时间因果建模
机器学习(Machine Learning)
Time series can be defined as a set of measurements of the certain variable of regular time intervals. The thing to be noted in this is that time acts as an independent variable for estimation. It is very much important to analyze and extract the inferences from the data i.e. where, t variables are the time function of Y target, Y=F(t). The application of the time series is where the decisions that involve factors of the uncertainty of the future including sales forecasting weather forecasting, inventory studies, GDP of developing countries, stock market, daily petrol price, etc.
时间序列可以定义为规则时间间隔的某些变量的一组测量值。 在此要注意的是,时间充当估计的自变量。 分析并从数据中提取推论非常重要,即其中t变量是Y目标的时间函数,Y = F(t)。 时间序列的应用是涉及未来不确定性因素的决策,包括销售预测天气预报,库存研究,发展中国家的GDP,股票市场,每日汽油价格等。
Why people do forecasting?
人们为什么要进行预测?
- Time series is an effective method of forecasting decisions. 时间序列是预测决策的有效方法。
- It helps organizations in developing forecasting techniques to be better prepared for an uncertain future. 它可以帮助组织开发预测技术,为不确定的未来做好更好的准备。
- It can be combined with other data mining techniques to understand the behavior of the data and to predict future trends. 它可以与其他数据挖掘技术结合使用,以了解数据的行为并预测未来的趋势。
Some time-series components:
一些时间序列组件:
Trend: The overall long-time direction of the series, it can be uptrend(increase) or a downtrend(decrease) in the time series pattern types.
趋势:该系列的总体长期方向,可以是时间序列模式类型中的上升趋势(上升)或下降趋势(下降)。
Seasonality: When factors such as the time of the year or the day of the week affect the dependent variable, repetitive patterns are observed in the time series. Seasonality is always of a fixed and known frequency.
季节性:当诸如一年中的时间或一周中的一天之类的因素影响因变量时,在时间序列中会观察到重复的模式。 季节性总是具有固定的已知频率。
Cycles: Unlike seasonal patterns, cyclic patterns exhibit rise and fall that are not of a fixed period. Consider the data duration is at least two years.
周期:与季节模式不同,周期模式显示出不是固定时期的上升和下降。 考虑数据持续时间至少为两年。
Unexpected variations: Most of the time series has some amount of random variation and some time series may not have much random variation.
意外变化:大多数时间序列具有一定数量的随机变化,而某些时间序列可能没有太多随机变化。
Irregularities: irregular patterns might occur due to random or unforeseen events. They are often of short duration and non-repeating.
不规则性:由于随机或不可预见的事件,可能会出现不规则的模式。 它们通常持续时间短且不重复。
White Noise: A white noise series is one with a zero mean, constant variance, and no correlation between its values at different times. Since values are uncorrelated, the adjacent values do not help to forecast future values. Example: Stock prices of companies may vary daily and time series become uncorrelated.
白噪声:白噪声序列是平均值为零,方差恒定且在不同时间其值之间没有相关性的序列。 由于值不相关,因此相邻值无助于预测未来值。 示例:公司的股价每天可能会变化,并且时间序列变得不相关。
Stationarity: The time series should be stationary to build the model. The correlation should not be zero but covariance between the two features can be constant. Variance means the spread of information in the data. Non-stationary series can be classified as:
平稳性:时间序列应该固定以建立模型。 相关性不应该为零,但是两个特征之间的协方差可以是恒定的。 方差意味着数据中信息的传播。 非平稳系列可以分类为:
- increasing trend or non-constant mean. 增加趋势或非恒定均值。
- Non-constant variance. 非恒定方差。
- Co-variance is not constant with time. 协方差不是随时间恒定的。
To check if the data is stationary or not, generally, we do two types of test:
为了检查数据是否稳定,通常,我们进行两种类型的测试:
Rolling test — it is a visual test, it means rolling mean and standard deviation. Plotting the moving average or moving variance to check if it varies with time.
滚动测试-这是一种视觉测试,表示滚动平均值和标准偏差。 绘制移动平均值或移动方差以检查其是否随时间变化。
Dickey-Fuller test — it is a type of statistical test, for this, we take an initial assumption that time series is non-stationary as a null hypothesis. When the test result comes that Test Statistic is smaller than the Critical Value then we reject the null hypothesis.
Dickey-Fuller检验—这是一种统计检验,为此,我们最初假设时间序列作为平稳假设是不稳定的。 当测试结果为“测试统计量”小于“临界值”时,我们将拒绝原假设。
If it is non-stationary in our data then there are two techniques to remove the non-stationary in the data. Getting a TS perfectly stationary is desirable but not practical, so it is made as close as possible using these statistical techniques.
如果它在我们的数据中是非平稳的,那么有两种技术可以消除数据中的非平稳。 使TS完全静止是理想的,但不切实际,因此使用这些统计技术使其尽可能接近。
- Differencing: It is used to make the time series stationary, to de-trend, and to control the auto-correlations. Generally differencing is not preferred because over difference series can produce inaccurate estimations. 差分:用于使时间序列平稳,趋于下降并控制自相关。 通常,不建议使用差分,因为差分序列过多可能会导致估算结果不准确。
Decomposition: Separate a time series into trend, seasonal effects, and ramping variability assumptions. We can also use techniques like transformation which penalize higher values more than lower values. Example: square root, cube root, log, etc.
分解:将时间序列分为趋势,季节影响和渐进变异性假设。 我们还可以使用诸如转换之类的技术,这些技术对较高的值比较低的值更不利。 示例:平方根,立方根,对数等。
结论: (Conclusion:)
Time series analysis is a must for every company to understand seasonality, cyclicality, trend, and randomness in the sales and other attributes. In the coming blogs, we will learn more about how to perform time series analysis with python. Time series analysis is used to identify the time-based patterns existing in the Data to determine a good model that can be used to forecast the future behavior of business metrics. And for all forecasting use cases, time-series analysis is useful, though forecasting is a larger topic. You can often improve forecasts by taking the dependencies in your time series into account, so you need to understand them through analysis, which is more specific than just knowing dependencies are there.
每个公司都必须进行时间序列分析,以了解销售,其他属性的季节性,周期性,趋势和随机性。 在接下来的博客中,我们将学习有关如何使用python执行时间序列分析的更多信息。 时间序列分析用于识别数据中存在的基于时间的模式,以确定可用于预测业务指标未来行为的良好模型。 对于所有预测用例,时间序列分析都是有用的,尽管预测是一个更大的主题。 您通常可以通过考虑时间序列中的相关性来改进预测,因此您需要通过分析来了解它们,而这种分析比仅知道其中的相关性更为具体。
You can reach me at my LinkedIn link here and on my email: design4led@gmail.com.
你可以在我的LinkedIn链接到我这里design4led@gmail.com:和我的电子邮件。
My Previous Articles:
我以前的文章:
时间序列预测 时间因果建模