预测性维护_工厂数据的预测性维护

预测性维护

In the era of digitization the concept of Smart Factory attracts a lot of attention. Modern industry becomes connected and highly automated. Such factories need their machines to run smoothly and with minimal down times. Predictive maintenance helps to deal with breakages. It aims to identify possible failures and helps to schedule the maintenance of detected devices.

在数字化时代,智能工厂的概念吸引了很多关注。 现代工业变得互联且高度自动化。 这样的工厂需要他们的机器运行平稳,停机时间最少。 预测性维护有助于处理破损。 它旨在识别可能的故障并帮助安排检测到的设备的维护。

In current blog post I illustrate the process of building a model that detects failures of factory machines. I use an open dataset from one of the Schwan’s factories. It contains time series values that include telemetry, device errors and failures. The aim is to predict device failure 12 hours before it happens. An assumption is that half of a day is enough for a technician to react and to handle a possible issue.

在当前的博客文章中,我说明了构建检测工厂机器故障的模型的过程。 我使用Schwan一家工厂的开放数据集。 它包含时间序列值,包括遥测,设备错误和故障。 目的是在发生故障前12小时预测设备故障。 假设半天时间足以使技术人员做出React并解决可能的问题。

探索数据集 (Exploring dataset)

The first thing to do is reading the dataset and loading data into pandas data frames. This logic is quite trivial, so I do not post it here. Instead you can access the original Jupyter Notebook at this link.

首先要做的是读取数据集并将数据加载到熊猫数据框中。 这种逻辑非常琐碎,因此我不在此发布。 相反,您可以通过此链接访问原始的Jupyter Notebook。

In general there are 3 files to read: telemetry.csv, failures.csv and errors.csv. At the end we get 3 pandas data frames: telemetry_df, failures_df and errors_df.

通常,有3个文件需要读取:telemetry.csv,failures.csv和errors.csv。 最后,我们得到3个熊猫数据帧:telemetry_df,failures_df和errors_df。

Image for post
Telemetry data frame
遥测数据帧
Image for post
Failures and Errors data frames
故障和错误数据帧

In total there are 876100 rows of telemetry values for 100 machines during 1 year on an hour basis. This is a lot, but most of the time devices work well. All in all the data set contains only 3919 errors and 761 failures. The latest are the values we try to predict.

在1年中,以小时为单位,总共有100台机器的876100行遥测值。 这很多,但是大多数时候设备运行良好。 所有数据集总共仅包含3919个错误和761个失败。 最新是我们尝试预测的值。

We have data for 100 machines. In real predictive maintenance cases it often makes sense to create a separate model for each machine for having best predictions. In this example we assume that one model might work for all the devices. To check it, let’s build box plots for telemetry values of all machines and compare their distributions.

我们有100台机器的数据。 在实际的预测性维护案例中,通常有必要为每台机器创建一个单独的模型以实现最佳预测。 在此示例中,我们假设一个模型可能适用于所有设备。 要检查它,让我们为所有机器的遥测值构建箱形图并比较它们的分布。

import matplotlib.pyplot as plt
import numpy as np


volt_values = []
rotate_values = []
pressure_values = []
vibration_values = []

for i in range(1,101):
volt_values.append(telemetry_df[telemetry_df['machineID'] == i]["volt"])
rotate_values.append(telemetry_df[telemetry_df['machineID'] == i]["rotate"])
pressure_values.append(telemetry_df[telemetry_df['machineID'] == i]["pressure"])
vibration_values.append(telemetry_df[telemetry_df['machineID'] == i]["vibration"])


fig, axs = plt.subplots(4, 1, constrained_layout=True, figsize=(18, 16))
fig.suptitle('Telemetry values per machine', fontsize=16)


def build_box_plot(plot_index, plot_values, title):
axs[plot_index].boxplot(plot_values)
axs[plot_index].set_title(title)
axs[plot_index].set_xticks([1, 26, 51, 76, 101])
axs[plot_index].set_xticklabels([1, 25, 50, 75, 100])


build_box_plot(0, volt_values, "Volt")
build_box_plot(1, rotate_values, "Rotate")
build_box_plot(2, pressure_values, "Pressure")
build_box_plot(3, vibration_values, "Vibration")
plt.show()
Image for post
Telemetry values of all machines
所有机器的遥测值

The picture above show us that all machines have approximately same distributions of their telemetry values. So, we might assume that they are similar from the technical point of view and hence we can train one model for all of them.

上图显示了所有机器的遥测值分布大致相同。 因此,我们可以假设从技术角度来看它们是相似的,因此我们可以为所有模型训练一个模型。

Of course it’s still hard to decide if our data is sufficient for predicting failures. So, we need to explore it further. Now we combine all 3 data frames into one (see the original notebook) and build scatter plots for pairs “Rotate-Volt” and “Vibration-Pressure”. We also mark the failure cases in red. By doing this we want to check if failures are grouped together or occur for extreme values known as outliers.

当然,仍然很难确定我们的数据是否足以预测故障。 因此,我们需要进一步探索。 现在,我们将所有3个数据帧组合为一个(请参阅原始笔记本),并为“旋转-电压”和“振动-压力”对建立散点图。 我们还将失败案例标记为红色。 通过执行此操作,我们要检查是否将故障分组在一起或是否发生了称为异常值的极端值。

Image for post
Telemetry values and failures
遥测值和故障

Looking at the plots we might conclude that there are simple patterns. It does not mean that telemetry is useless. But it tells us that it won’t be a straight forward and good predictor as it is. From the other side there still can be complex patterns especially in the way how values behave over time. These patterns cannot be seen on simple plots but still can be detected by non-linear models.

查看这些图,我们可能会得出结论:存在简单的模式。 这并不意味着遥测是无用的。 但是它告诉我们,它不会是一个简单明了的预测者。 另一方面,仍然存在复杂的模式,尤其是在值随时间变化的方式上。 这些模式无法在简单的图上看到,但仍可以通过非线性模型检测到。

Now let’s explore errors. They might happen some time before the failures and be a good indicator. To check them we first randomly sample 6 failure cases. Then we pick a time frame of 48 hours before the failure. Plots below show both telemetry values and errors that are marked by red lines.

现在让我们探索错误。 它们可能在故障发生前的某个时间发生,并且是一个很好的指示。 为了检查它们,我们首先随机抽样6个失败案例。 然后我们选择故障发生前48小时的时间范围。 下面的图显示了遥测值和用红线标记的错误。

Image for post
48 hours before the failure happend. Red lines denote errors.
故障发生前48小时。 红线表示错误。

In all 6 examples there are errors that happen several hours before the actual failure. So, they might be a good predictor.

在所有6个示例中,都存在在实际故障发生前几个小时发生的错误。 因此,它们可能是一个很好的预测指标。

准备数据集 (Prepare dataset)

After exploring dataset we have several assumptions:

探索数据集后,我们有几个假设:

  • We can combine data for all machines and train one model.

    我们可以合并所有机器的数据并训练一个模型。
  • Telemetry is not a good predictor alone, but still might improve the model in some cases. So, we calculate the rolling values and use them during the model training.

    遥测本身并不是一个好的预测指标,但在某些情况下仍可以改善模型。 因此,我们计算滚动值并在模型训练期间使用它们。
  • Errors seem to be a good predictor.

    错误似乎是一个很好的预测指标。
  • The number of failures is low comparing to the number of normal functioning samples.

    与正常运行的样本数量相比,失败的数量很少。

First of all we need to prepare a dataset for training. Preparation includes:

首先,我们需要准备训练数据集。 准备工作包括:

  • Pick all failure cases

    选择所有失败案例
  • Randomly sample 30000 normal cases

    随机抽样30000例正常病例
  • Subtract 12 hours (predict failure 12 hours before) and pick a time window of 36 hours back.

    减去12个小时(预测12个小时之前会发生故障),然后选择36个小时的时间范围。
  • Calculate the number of errors of each type during the defined time frame

    计算在定义的时间范围内每种类型的错误数
  • Calculate telemetry statistics (min, max, std, mean) during the defined time frame

    在定义的时间范围内计算遥测统计信息(最小,最大,标准,均值)
  • Split the dataset into train, test and validation

    将数据集分为训练,测试和验证

Let’s prepare the column names and a method for calculating statistics.

让我们准备列名和一种计算统计量的方法。

hours_ahead = 12
hours_lag = 36
tel_columns = ["volt", "rotate", "pressure", "vibration"]
error_columns = ["error1", "error2", "error3", "error4", "error5"]

col_names = []
for tel_c in tel_columns:
col_names.extend([tel_c + "_min", tel_c + "_max", tel_c + "_std", tel_c + "_mean"])

for err_c in error_columns:
col_names.append(err_c + "_sum")


def get_time_span_statistics(source_df, lag_start, lag_end):
lag_values_df = source_df.iloc[lag_start:lag_end]
failure_record = []

for col_name in tel_columns:
failure_record.extend([lag_values_df[col_name].min(),
lag_values_df[col_name].max(),
lag_values_df[col_name].std(),
lag_values_df[col_name].mean()])

for col_name in error_columns:
failure_record.append(lag_values_df[col_name].sum())

return failure_record

Next step is to calculate values for all failure cases.

下一步是计算所有失败案例的值。

failure_records = []
failure_ranges = []

for f_index in failure_indexes:
start_i = f_index - hours_ahead - hours_lag
end_i = f_index - hours_ahead

failure_ranges.extend(np.arange(f_index - hours_ahead - hours_lag, f_index + hours_ahead + hours_lag))
failure_records.append(get_time_span_statistics(telemetry_df, start_i, end_i))

failure_records_df = pd.DataFrame(failure_records)
failure_records_df.columns = col_names
failure_records_df['is_error'] = True

Sample normal cases and combine them with failures.

对正常情况进行抽样,并将其与故障结合起来。

normal_functioning_records = []
normal_functioning_indexes = telemetry_df.drop(failure_ranges).sample(30000).index

for n_index in normal_functioning_indexes:
start_i = n_index - hours_ahead - hours_lag
end_i = n_index - hours_ahead
normal_functioning_records.append(get_time_span_statistics(telemetry_df, start_i, end_i))

normal_functioning_records_df = pd.DataFrame(normal_functioning_records)
normal_functioning_records_df.columns = col_names
normal_functioning_records_df['is_error'] = Falsecombined_df = pd.concat([failure_records_df,
normal_functioning_records_df], ignore_index=True)# shuffle the data set
combined_df = combined_df.sample(len(combined_df))

Now we need to split the combined dataset into train, test and validation subsets.

现在我们需要将合​​并的数据集分为训练,测试和验证子集。

split_mask = np.random.rand(len(combined_df)) < 0.7

x_df = combined_df.drop(['is_error'], axis=1)
y_df = combined_df['is_error']

x_train = x_df[split_mask]
y_train = y_df[split_mask]

x_test_validation = x_df[~split_mask]
y_test_validation = y_df[~split_mask]

split_mask = np.random.rand(len(x_test_validation)) < 0.5
x_validation = x_test_validation[split_mask]
y_validation = y_test_validation[split_mask]

x_test = x_test_validation[~split_mask]
y_test = y_test_validation[~split_mask]

As a result we have the following subsets:

结果,我们具有以下子集:

  • Training set: total items = 21379, failure items = 474

    训练集:总项= 21379,失败项= 474
  • Validation set: total items = 4599, failure items = 120

    验证集:总项数= 4599,失败项数= 120
  • Test set: total items = 4741, failure items = 125

    测试集:总项数= 4741,失败项数= 125

The subsets are imbalanced. We need to keep it in mind while picking the validation metrics. For example, accuracy is not sufficient. It’s better to look at precision, recall and F1 score.

子集不平衡。 在选择验证指标时,我们需要牢记这一点。 例如,准确性不足。 最好看一下精度,召回率和F1得分。

培训与评估 (Training and evaluation)

After preparing the dataset we are ready to start training. In current example we are going to train a gradient boosting model. This is a quite powerful ensemble algorithm that provides a non-linear classifier. For evaluation we use AUCPR, which is better for imbalanced datasets.

准备好数据集后,我们就可以开始训练了。 在当前示例中,我们将训练梯度提升模型。 这是一种功能强大的集成算法,可提供非线性分类器。 为了进行评估,我们使用AUCPR,这对于不平衡的数据集更好。

from xgboost import XGBClassifier


model = XGBClassifier(max_depth=10, n_estimators=100, seed=0)
model.fit(
x_train,
y_train,
eval_set=[(x_validation, y_validation)],
early_stopping_rounds=10,
eval_metric="aucpr"
)

The training reaches an AUC value of 0.98345, which is a very high result. To get final metrics we need to evaluate the model on the test set that has not been used during the training.

训练达到的AUC值为0.98345,这是非常高的结果。 为了获得最终指标,我们需要在训练过程中未使用的测试集上评估模型。

The script below calculates precision, recall, F-score, builds ROC curve and confusion matrix.

下面的脚本计算精度,召回率,F得分,建立ROC曲线和混淆矩阵。

from sklearn import metrics


test_predictions = model.predict(x_test)
precision, recall, fscore, _ = metrics.precision_recall_fscore_support(y_test, test_predictions, average='weighted')

print("Precision {}, Recall {}, F-Score {}".format(precision, recall, fscore))
metrics.plot_roc_curve(model, x_test, y_test)
metrics.plot_confusion_matrix(model, x_test, y_test)

Final scores:

最终成绩:

  • Precision = 0.99719

    精度= 0.99719
  • Recall = 0.99722

    召回= 0.99722
  • F-Score = 0.99719

    F分数= 0.99719
Image for post
ROC curve and confusion matrix
ROC曲线和混淆矩阵

The evaluation on a test set shows that the model performs very well. It means that such model can be used in production. It’s still necessary to keep in mind that some devices might differ from the others. So it’s recommended to verify and fine tune the model on each machine before deploying it. However, model deployment is another important process that is not covered by current article.

对测试集的评估表明该模型的性能非常好。 这意味着可以在生产中使用这种模型。 仍然需要记住,某些设备可能与其他设备有所不同。 因此,建议在部署之前在每台计算机上验证和微调模型。 但是,模型部署是本文中未涉及的另一个重要过程。

To conclude, this blog post provides an example of training a model for predictive maintenance. It uses a real dataset that contains time-series telemetry values as well as information about errors and failures. The article illustrates the process of data exploration and preparation. Finally, it shows the training of Gradient Boosting model and its evaluation.

最后,这篇博客文章提供了一个培训预测性维护模型的示例。 它使用包含时间序列遥测值以及有关错误和故障信息的真实数据集。 本文说明了数据探索和准备的过程。 最后,展示了梯度提升模型的训练及其评估。

Here you can access the notebook with original scripts.

您可以在此处使用原始脚本访问笔记本。

翻译自: https://medium.com/@andrey.i.karpov/predictive-maintenance-on-factory-data-4f8cc17696e4

预测性维护

  • 3
    点赞
  • 19
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是一个简单的预测性维护案例,使用Python实现。 假设我们有一个工厂,使用一台机器来生产产品。为了确保机器的稳定运行,我们需要对其进行预测性维护。我们可以通过监测机器的温度和振动等参数来预测机器可能出现故障的时间,并提前采取维护措施,以避免生产中断。 首先,我们需要收集机器的历史数据,包括温度、振动等参数以及故障时间。我们可以使用Pandas库来读取和处理数据。 ```python import pandas as pd # 读取历史数据 data = pd.read_csv('machine_data.csv') # 显示前5行数据 print(data.head()) ``` 接下来,我们可以对数据进行简单的可视化,以了解温度和振动等参数的变化趋势。 ```python import matplotlib.pyplot as plt # 绘制温度图表 plt.plot(data['Temperature']) plt.title('Temperature') plt.show() # 绘制振动图表 plt.plot(data['Vibration']) plt.title('Vibration') plt.show() ``` 通过观察数据,我们可以发现机器的温度和振动都有一定的波动,但并没有明显的异常。接下来,我们可以使用机器学习算法来预测机器可能出现故障的时间。 ```python from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split # 将特征和目标变量分开 X = data.drop('Failure', axis=1) y = data['Failure'] # 划分训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # 训练线性回归模型 model = LinearRegression() model.fit(X_train, y_train) # 预测测试集结果 y_pred = model.predict(X_test) ``` 最后,我们可以使用预测结果来制定维护计划。如果预测结果表明机器可能出现故障,我们就可以在故障之前采取维护措施,以保证机器的稳定运行。 ```python # 计算预测准确率 accuracy = model.score(X_test, y_test) # 显示预测结果和准确率 print('Predicted Failure Time:', y_pred) print('Accuracy:', accuracy) ``` 通过这个简单的案例,我们可以看到如何使用Python来实现预测性维护。当然,实际情况可能更加复杂,需要更多的数据和更复杂的算法来进行预测。但是,这个案例可以作为一个起点,帮助我们了解预测性维护的基本原理和实现过程。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值