Machine learning for predictive maintenance: where to start?

Ref: https://medium.com/bigdatarepublic/machine-learning-for-predictive-maintenance-where-to-start-5f3b7586acfb

Think about all the machines you use during a year, all of them, from a toaster every morning to an airplane every summer holiday. Now imagine that, from now on, one of them would fail every day. What impact would that have? The truth is that we are surrounded by machines that make our life easier, but we also get more and more dependent on them. Therefore, the quality of a machine is not only based on how useful and efficient it is, but also on how reliable it is. And together with reliability comes maintenance.

When the impact of a failure cannot be afforded, such as a malfunctioning airplane engine for example, the machine is subjected to preventive maintenance, which involves periodic inspection and repair, often scheduled based on time-in-service. The challenge of proper scheduling grows with the complexity of machines: in a system with many components working together and influencing each other’s lifetime, how can we find the right moment when maintenance should be performed so that components are not prematurely replaced but the whole system still stays functioning reliably? Providing an answer to this question is the aim of predictive maintenance, where we seek to build models that quantify the risk of failure for a machine in any moment in time and use this information to improve scheduling of maintenance.

The success of predictive maintenance models depend on three main components: having the right data available, framing the problem appropriately and evaluating the predictions properly.

In this post, we will elaborate the first two points and give insights on how to choose the modelling technique that best fits the question you are trying to answer and the data you have at hand.

DATA COLLECTION

To build a failure model, we require enough historical data that allows us to capture information about events leading to failure. In addition to that, general “static” features of the system can also provide valuable information, such as mechanical properties, average usage and operating conditions. However, more data is not always better. When collecting data to support a failure model, it is important to make an inventory the following:

  • What are the types of failure that can occur? Which ones will we try to predict?
  • How does the “failure process” look like? Is it a slow degradation process or an acute one?
  • Which parts of the machine/system could be related to each type of failure? What can be measured about each of them that reflect their state? How often and with which accuracy do these measurements need to be performed?

The life span of machines is usually in the order of years, which means that data has to be collected for an extended period of time in order to observe the system throughout its degradation process.

In an ideal scenario both data scientists and domain experts would be involved in the data collection plan to ensure that the data gathered is suitable for the model to be built. However, what mostly happens in real life is that the data has already been collected before the data scientist arrives and he/she must try to make the best of what is available.

Depending on the characteristics of the system and on the data available, a proper framing of the model to be built is essential: which question do we want the model to answer and is it possible with the data we have at hand?

PROBLEM FRAMING

When thinking about how to frame a predictive maintenance model, it is important to keep a couple of questions in mind:

  • What kind of output should the model give?
  • Is enough historical data available or just static data?
  • Is every recorded event labelled, i.e. which measurements correspond to good functioning and which ones correspond to failure? Or at least, is it known when each machine failed (if at all)?
  • When labelled events are available, what is the proportion of the number of events of each type of failure and events of well functioning?
  • How long in advance should the model be able to indicate that a failure will occur?
  • What are the performance targets that the model should be optimized for? High precision, high sensitivity/recall, high accuracy? What is the consequence of not predicting a failure or predicting a failure that will not happen?

With all this information at hand, we can now decide which modelling strategy fits best to the available data and the desired output, or at least which one is the best candidate to start with. There are multiple modelling strategies for predictive maintenance and we will describe four of them in relation to the question they aim to answer and which kind of data they require:

  1. Regression models to predict remaining useful lifetime (RUL)
  2. Classification models to predict failure within a given time window
  3. Flagging anomalous behaviour
  4. Survival models for the prediction of failure probability over time

STRATEGY 1: Regression models to predict remaining useful lifetime (RUL)

OUTPUT: How many days/cycles are left before the system fails?

DATA CHARACTERISTICS: Static and historical data are available, and every event is labelled. Several events of each type of failure are present in the dataset.

BASIC ASSUMPTIONS/REQUIREMENTS:

  • Based on static characteristics of the system and on how it behaves now, the remaining useful time can be predicted, which implies that both static and historical data are required and that the degradation process is smooth.
  • Just one type of “path to failure” is being modelled: if many types of failure are possible and the system’s behaviour preceding each one of them differs, one dedicated model should be made for each of them.
  • Labelled data is available and measurements were taken at different moments during the system’s lifetime.

STRATEGY 2: Classification models to predict failure within a given time window

Creating a model which can predict lifetimes very accurate can be very challenging. In practice however, one usually does not need to predict the lifetime very accurate far in the future. Often the maintenance team only needs to know if the machine will fail ‘soon’. This results in the next strategy:

QUESTION: Will a machine fail in the next N days/cycles?

DATA CHARACTERISTICS: Same as for strategy 1

BASIC ASSUMPTIONS/REQUIREMENTS: The assumptions of a classification model are very similar to those of regression models. They mostly differ on:

  • Since we are defining a failure in a time window instead of an exact time, the requirement of smoothness of the degradation process is relaxed.
  • Classification models can deal with multiple types of failure, as long as they are framed as a multi-class problem, e.g.: class = 0 corresponding to no failure in the next n days, class = 1 for failure type 1 in the next n days, class = 2 for failure type 2 in the next n days and so forth.
  • Labelled data is available and there are “enough” cases of each type of failure to train and evaluate the model.

In general, what regression and classification models are doing is modelling the relationship between features and the degradation path of the system. That means that if the model is applied to a system that will exhibit a different type of failure not present in the training data, the model will fail to predict it.

STRATEGY 3: Flagging anomalous behaviour

Both previous strategies require a lot of examples of both normal behaviour (of which we often have a lot of) and examples of failures. However, how many planes will you let crash to collect data? If you have mission critical systems, in which acute repairs are difficult, there are often only limited, or no examples of failures at all. In this case, a different strategy is necessary:

QUESTION: Is the behaviour shown normal?

DATA CHARACTERISTICS: Static and historical data are available, but either labels are unknown or too few failure events were observed or there are too many types of failure

BASIC ASSUMPTIONS/REQUIREMENTS: It is possible to define what normal behaviour is and the difference between current and “normal” behaviour is related to degradation leading to failure.

The generality of an anomaly detection model is both its biggest advantage and pitfall: the model should be able to flag every type of failure, despite of not having any previous knowledge about them. Anomalous behaviour, however, does not necessarily lead to failure. And if it does, the model does not give information about the time span it should occur.

The evaluation of an anomaly detection model is also challenging due to the lack of labelled data. If at least some labelled data of failure events is available, it can and should be used for evaluating the algorithm. When no labelled data is available, the model is usually made available and domain experts provide feedback on the quality of its anomaly flagging ability.

STRATEGY 4: Survival models for the prediction of failure probability over time

The previous three approaches focus on prediction, giving you enough information to apply maintenance before failure. If you however are interested in the degradation process itself and the resulting failure probability, this last strategy suits you best.

QUESTION: Given a set of characteristics, how does the risk of failure change in time?

DATA CHARACTERISTICS: Static data available, information on the reported failure time of each machine or recorded date of when a given machine became unobservable for failure.

A survival model estimates the probability of failure for a given type of machine given static features and is also useful to analyse the impact of certain features on lifetime. It provides, therefore, estimates for a group of machines of similar characteristics. Therefore, for a specific machine under investigation it does not take its specific current status into account.

Bottom line:

What is the most suitable approach for a predictive maintenance model? As for all other data science problems, there is no free lunch! The advice here is to start by understanding which types of failure you are trying to model, which type of output you would like the model to give and which kind of data is available. Having put all this put together with the advice given above, I hope you now know from where to start!

Some useful links:

Python网络爬虫与推荐算法新闻推荐平台:网络爬虫:通过Python实现新浪新闻的爬取,可爬取新闻页面上的标题、文本、图片、视频链接(保留排版) 推荐算法:权重衰减+标签推荐+区域推荐+热点推荐.zip项目工程资源经过严格测试可直接运行成功且功能正常的情况才上传,可轻松复刻,拿到资料包后可轻松复现出一样的项目,本人系统开发经验充足(全领域),有任何使用问题欢迎随时与我联系,我会及时为您解惑,提供帮助。 【资源内容】:包含完整源码+工程文件+说明(如有)等。答辩评审平均分达到96分,放心下载使用!可轻松复现,设计报告也可借鉴此项目,该资源内项目代码都经过测试运行成功,功能ok的情况下才上传的。 【提供帮助】:有任何使用问题欢迎随时与我联系,我会及时解答解惑,提供帮助 【附带帮助】:若还需要相关开发工具、学习资料等,我会提供帮助,提供资料,鼓励学习进步 【项目价值】:可用在相关项目设计中,皆可应用在项目、毕业设计、课程设计、期末/期中/大作业、工程实训、大创等学科竞赛比赛、初期项目立项、学习/练手等方面,可借鉴此优质项目实现复刻,设计报告也可借鉴此项目,也可基于此项目来扩展开发出更多功能 下载后请首先打开README文件(如有),项目工程可直接复现复刻,如果基础还行,也可在此程序基础上进行修改,以实现其它功能。供开源学习/技术交流/学习参考,勿用于商业用途。质量优质,放心下载使用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值