One challenge that data scientists come across is forecasting an intermittent time series.
数据科学家遇到的一个挑战是预测间歇时间序列。
That is to say — a time series with many 0s present in the data.
也就是说-数据中存在多个0的时间序列。
An example of this is daily rainfall patterns. On days where there is no rainfall, a value of 0 is recorded. This makes for quite a volatile time series with no clearly defined trend and is much more difficult for a conventional time series model such as ARIMA to forecast.
一个例子就是每天的降雨模式。 在没有降雨的日子,记录为0的值。 这造成了一个非常不稳定的时间序列,没有明确定义的趋势,并且对于诸如ARIMA之类的常规时间序列模型进行预测要困难得多。
The below data is sourced from the Irish weather broadcaster Met Éireann:
以下数据来自爱尔兰气象广播公司MetÉireann :
As we can see, forecasting a trend and seasonal patterns would prove quite tricky given that there are many 0 values present in the data at undefined intervals.
如我们所见,鉴于数据中存在许多未定义时间间隔的0值,因此预测趋势和季节模式将非常棘手。
The conventional solution in this case might be to shorten the time series, e.g. add the rainfall in mm every 30 days in order to forecast a monthly time series. However, this would result in significant data loss and in the context of less than three years of data — any forecast may well prove to be quite superficial.
在这种情况下,常规解决方案可能是缩短时间序列,例如,每30天以mm为单位增加降雨,以预测每月的时间序列。 但是,这将导致大量数据丢失,并且在不到三年的数据范围内,任何预测都可能被证明是肤浅的。
When working with a time series such as this, the tsintermittent package in R can come in quite handy.
当时间序列的工作,如这一点, tsintermittent R中的软件包可以派上用场。
In particular, Croston’s method is used on a training set for the