时间序列预测 预测时间段_如何进行下一个时间序列预测项目

时间序列预测 预测时间段

by Kirill Dubovikov

通过基里尔·杜博维科夫(Kirill Dubovikov)

如何进行下一个时间序列预测项目 (How to rock your next time series forecasting project)

Time series forecasting is a task of great importance. It has a wide variety of applications ranging from sales forecasting to anomaly detection in complex manufacturing processes.

时间序列预测是非常重要的任务。 它具有广泛的应用范围,从销售预测到复杂制造过程中的异常检测。

Yet, it is quite different from traditional machine learning methods.


Forecasting projects have many caveats you should be aware of to succeed in the task at hand. In this article we will explore key points that will help you to finish the task successfully. Read on to find out the ones you miss out.

预测项目有许多警告,您需要注意要成功完成手头的任务。 在本文中,我们将探讨可帮助您成功完成任务的关键点。 继续阅读,找出您错过的那些。

研究你的方法 (Research your methods)

It is crucial to study your problem in detail and plan before committing actions that will affect the success of your project.


1.研究你的理论 (1. Study your theory)

Before taking on a modeling project, make sure that you understand the theory if you haven’t already. Forecasting: Principles and Practice is a very solid resource that can outline the basics in a practical and concise way.

在进行建模项目之前,请确保您了解该理论(如果还没有的话)。 预测:“原则和实践”是非常扎实的资源,可以以实用,简洁的方式概述基础知识。

2.研究领域 (2. Study the domain)

Before discussing any project details, make sure that you understand the basics. Learn as much as you can about the business domain in which you will operate. Google can help you get started. Understand key definitions and common business models. Without this step, you may end up doing months worth of work for nothing. You do not want to fail because you solved a problem that does not make business sense.

在讨论任何项目细节之前,请确保您了解基础知识。 尽可能多地了解您将要运营的业务领域。 Google可以帮助您入门。 了解关键定义和常见的业务模型。 如果没有这一步,您可能会白白浪费数月的时间。 您不想失败,因为您解决了没有商业意义的问题。

3.研究问题 (3. Study the problem)

The first thing you should do in any data science project is to study the problem. And I do not mean talking with your client for 15 minutes and writing out an initial understanding. You should question everything and be skeptical (in a good sense).

在任何数据科学项目中,您应该做的第一件事就是研究问题。 我的意思不是要与您的客户交谈15分钟并写出初步的理解。 您应该对所有问题提出质疑,并且要持怀疑态度(从某种意义上来说)。

Remember, that data science is a relatively new field in the world of practical applications. This means that your client’s vision may be incomplete. To help them you should understand their business and their problem as deep as you can.

请记住,数据科学是实际应用领域中一个相对较新的领域。 这意味着您的客户的视野可能不完整 。 为了帮助他们,您应该尽可能深入地了解他们的业务和问题。

Work with requirements, digest them word-by-word and feel the problem at hand. Then try to understand the business behind the problem.

处理需求,逐字逐句地消化它们,并感觉到手头的问题。 然后尝试了解问题背后的原因。

Is it sound from the economy standpoint? What is the end goal of your client? It may not be direct profits, but there must be a goal. Does solving the problem your client asks for really helps to achieve this goal?

从经济的角度来看是否合理? 客户的最终目标是什么? 它可能不是直接的利润,但必须有一个目标。 解决客户要求的问题是否真的有助于实现此目标?

If not, try to figure out what can be changed together with your customer. Do not fear to ask questions — project success depends on them.

如果不是,请尝试找出可以与客户一起更改的内容。 不要害怕提出问题-项目的成功取决于他们。

寻找正确的数据 (Finding the right data)

At this point we should have a good understanding of our problem and domain. Next, we will look into the importance of researching and questioning the data.

在这一点上,我们应该对我们的问题和领域有一个很好的了解。 接下来,我们将研究研究和质疑数据的重要性。

4.考虑指标和评估 (4. Think about metrics and evaluation)

Next, you should think about evaluation. And I am not talking about model validation, but an evaluation that makes sense to the client. Often, they won’t come up with ready to use business metric, that’s your responsibility to work with them on it. In the end, you should have a solid mathematical definition of how your forecasts affect the end goal of the project.

接下来,您应该考虑评估。 我不是在谈论模型验证,而是对客户有意义的评估。 通常,他们不会提出可立即使用的业务指标,这是您与他们合作的责任。 最后,您应该对预测如何影响项目的最终目标有一个可靠的数学定义。

And please do not try to use conventional data science validation metrics such as MAPE, MAE, RMSE as the main project metric. They are fine metrics for validation purposes, though they still can fail you pretty bad in a business context.

而且,请不要尝试使用常规的数据科学验证度量标准(例如MAPE,MAE,RMSE)作为主要项目度量标准。 它们是用于验证目的的优良指标,尽管在业务环境中它们仍然会使您失望。

For example, take that we have some sales data for different items. The client asked us to estimate future sales over the two-month horizon. In addition, she gave us historical data on current sales forecasting strategy (e.g. done by analysts by hand).

例如,假设我们有一些不同项目的销售数据。 客户要求我们估算未来两个月的销售量。 此外,她还向我们提供了有关当前销售预测策略的历史数据(例如,由分析师手工完成)。

For example, your new fancy deep learning model destroys existing strategy by 30% difference in MAPE. You may deploy it to production and fail miserably for the following reasons:

例如,您的新的高级深度学习模型破坏了MAPE中现有策略的30%差异。 您可能将其部署到生产中,但由于以下原因而惨遭失败:

  • Your model undersells frequently compared to the current strategy. The error can be small and do not affect MAPE by much, but business-wise 4% undersells compared to their current approach can be a disaster

    与当前策略相比,您的模型经常被抛售 。 该错误可能很小,并且不会对MAPE造成太大影响,但与目前的方法相比,在业务方面4%的低价销售可能是一场灾难

  • Your model cuts possible oversales by a large margin. In many cases, this would not impress the client. In the end, they even can ask you to use the upper confidence interval to cut the risks of underselling

    您的模型可以大大减少可能的超额销售。 在许多情况下,这不会给客户留下深刻的印象。 最后,他们甚至可以要求您使用较高的置信区间来降低卖空风险
  • MAPE (chosen metric) does not make sense to the client and is hard to understand by people that will read the report


Always make sure that you have a sound model evaluation strategy. It is your safety belt, and it is better to have one rather than not, isn’t it?

始终确保您拥有合理的模型评估策略。 这是您的安全带,最好有一条而不是没有,不是吗?

5.查看您的数据 (5. Look at your data)

Do your Exploratory Data Analysis (EDA) homework. It may be very seductive to skip this part. “We’ll do this later, for sure! I’ll just look at how some models perform”. That’s what you may think. Eat your frogs first. Draw plots, seek outliers, check for strange patterns. If you have many time series, look at their sums if it makes sense to do so.

做探索性数据分析(EDA)作业。 跳过这一部分可能非常诱人。 “我们一定会稍后再做! 我将看看一些模型的性能如何。” 那就是你可能会想的。 先吃青蛙 。 绘制图,查找异常值,检查是否有奇怪的图案。 如果您有很多时间序列,那么请看一下它们的总和。

Communicate all findings to the client. If anything in the data is not understandable and clear then that should be figured out as soon as possible.

将所有发现传达给客户。 如果数据中的任何内容无法理解和清除,则应尽快解决。

You may have bugs in the code, or your framework may have bugs too, or perhaps the client’s data exporting pipeline may also be buggy. Double check the date parsing. Always deliberately state date format for your framework. For example, Python’s pandas library can silently fail you when parsing dates in different locales.

您可能在代码中有错误,或者您的框架中也可能有错误,或者客户端的数据导出管道也可能有错误。 仔细检查日期解析。 始终故意声明框架的日期格式。 例如,当解析不同语言环境中的日期时,Python的pandas库可以使您无声地失败。

Even if you won’t find any bugs, your client may be surprised by your findings. Unusual seasonal patterns and anomaly findings can provide tremendous value too. Your customer may not be aware of these because they did not look at the data the way you do.

即使您找不到任何错误,您的客户也可能会对您的发现感到惊讶。 异常的季节模式和异常发现也可以提供巨大的价值。 您的客户可能没有意识到这些,因为他们没有按照您的方式查看数据。

6.再次查看您的数据 (6. Look at your data AGAIN)

Do more EDA. I can’t emphasize how important this is.

进行更多EDA。 我不能强调这有多重要。

7.编写测试以进行数据加载和预处理 (7. Write tests for data loading and preprocessing)

Write automated tests for your data pipeline. Tests will pay you off and save you time later.

为您的数据管道编写自动化测试。 测试将为您带来回报,并在以后节省您的时间。

8.关于业务指标的思考 (8. Ponder on business metrics)

Are your business metrics defined? Your customer has agreed on them and every piece is crystal clear to them? Do you have functions to calculate them implemented and properly tested? If not, then it is time to stop and do this part.

是否定义了您的业务指标? 您的客户已同意他们的意见,而每一件事情对他们来说都是透明的? 您是否具有计算已实施并经过适当测试的功能? 如果不是,那么该停止并执行此部分了。

If you will continue hoping that MAPE or RMSE will make it up for a business metric you may end up in trouble. It will make your reports hard to understand and increase your chances of solving the wrong task.

如果您继续希望MAPE或RMSE将其用于业务指标,那么您可能会遇到麻烦。 这将使您的报告难以理解,并增加您解决错误任务的机会。

简洁是关键 (Simplicity is the key)

At last, let us not forget about simplicity. The last five points explore the value behind simple things: simple models, double checks and communication.

最后,让我们不要忘记简单性。 最后五点探讨简单事物背后的价值:简单模型,双重检查和沟通。

9.从均值开始 (9. Start with the mean)

Before going full-machine learning, check how simple models work, like predicting the mean average. No, really.

在进行全机器学习之前,请检查简单模型的工作原理,例如预测平均数。 不完全是。

Some examples:


  • Predicting running mean for the last N weeks

  • Predicting running quantiles

  • Predicting Exponentially Weighted Average

  • Using heuristic rules for holidays and regular events. A mean with a multiplier can work wonders on the New Year

    对假期和常规活动使用启发式规则。 乘数的平均值可以在新年里创造奇迹

This will give you a solid baseline. It may be hard to believe, but in some cases, you may even find that is is the best possible solution. For this exact reason, we have created an entire module devoted to heuristic rules and simple statistical models in our forecasting framework.

这将为您提供坚实的基准。 可能难以置信,但是在某些情况下,您甚至可能会发现这是最好的解决方案。 因此,我们在预测框架中创建了一个专门用于启发式规则和简单统计模型的完整模块。

10.选择合适的型号 (10. Choose the right model)

Finally, the fun part. Try out as many models as you can afford. Do an initial test on a wide range of models, filter them and tune the best one or two.

最后,有趣的部分。 尝试尽可能多的模型。 对各种模型进行初始测试,过滤它们并调整最佳的一个或两个。

Be wary of non-functional requirements. Always measure the time required for fitting the model. Your task may have limitations that will affect your choice:

警惕非功能性要求。 始终测量拟合模型所需的时间。 您的任务可能会有一些限制,这些限制会影响您的选择:

  • Running time, especially if you have tens of thousands of time series to make forecasts for

  • Available computational resources

11.衡量绩效 (11. Measure performance)

Calculate your business and technical metrics using time series cross-validation. Use as many folds as you can to get accurate estimates. Research your findings if you see anything unusual. Extremely good performance? Extremely bad performance? Those are often the signs of caution.

使用时间序列交叉验证来计算您的业务和技术指标。 尽可能多地使用褶皱以获得准确的估计值。 如果发现异常,请研究您的发现。 表现极好? 性能极差? 这些通常是谨慎的迹象。

12.再次检查一切 (12. Check everything again)

Check yourself. Check your client. Write some tests if possible.

自行检查。 检查您的客户。 如果可能,编写一些测试。

13.准备报告并清晰沟通 (13. Prepare reports and communicate clearly)

Now is the time to communicate and present your findings and results. Research consumers of your results. Do they know about machine learning or time series forecasting? Are they proficient in computer science? If they are, this part may be easy.

现在是时候交流和介绍您的发现和结果了。 研究结果的消费者。 他们是否了解机器学习或时间序列预测? 他们精通计算机科学吗? 如果是这样,这部分可能很容易。

If not, try to use fancy statistics and machine learning thesaurus as little as you can. Prepare clear definitions if you can’t go without complex terms. Use plots with titles, legends and axis names. The end goal is to communicate your results as clear as it is possible. No one will be able to use your results if they won’t understand them. And no one will be able to spot a mistake if you will hide behind complex descriptions and cryptic formulas.

如果没有,请尽量少使用奇特的统计数据和机器学习词库。 如果不能没有复杂的术语,请准备明确的定义。 使用带有标题,图例和轴名称的图。 最终目标是尽可能清晰地传达您的结果。 如果他们不了解结果,将无法使用您的结果。 如果您将隐藏在复杂的描述和神秘的公式后面,那么没人会发现一个错误。

整个项目过程的一般指导 (General guidance for the course of the entire project)

  • Communicate more. I can’t emphasize this more. In the end, communication is likely to be more important than your entire 100-model ensemble powerhouse

    交流更多。 我不能再强调这一点。 最终,交流可能比整个100个模型的集成发电站更重要
  • Get rid of complex thesaurus

  • Dive into the business


结论 (Conclusion)

We have explored some key points that will help to to succeed in a time series forecasting project. Some of them may seem intuitive and you may think that you’ll never make these mistakes, but be sure to check yourself. Often, the easiest thing to fail in is the most obvious one.

我们探索了一些关键点,这些点将有助于在时间序列预测项目中取得成功。 其中一些可能看起来很直观,您可能认为您永远都不会犯这些错误,但是一定要检查一下自己。 通常,最容易失败的是最明显的事情。

Please, share the article if it helped you. Also, consider giving it it some claps ? .

如果您的文章对您有用,当作兔子 。 另外,考虑给它一些鼓掌吗? 。

Follow me on ? Twitter ,?Medium and ??‍?LinkedIn.

跟我来吗? Ť 维特 ,?我dium一个第二???林克DIN。

翻译自: https://www.freecodecamp.org/news/how-to-rock-your-next-time-series-forecasting-project-3930d589f704/

时间序列预测 预测时间段

  • 0
  • 1
    觉得还不错? 一键收藏
  • 0


  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


