涡扇发动机的预测性维护

探索NASA的涡扇数据集 (Exploring NASA’s turbofan dataset)

Although released over a decade ago, NASA’s turbofan engine degradation simulation dataset (CMAPSS) remains popular and relevant today. Over 90 new research papers have been published in 2020 so far [1]. These papers present and benchmark novel algorithms to predict Remaining Useful Life (RUL) on the turbofan datasets.

尽管已经发布了十多年,但NASA的涡扇发动机退化模拟数据集(CMAPSS)仍然很受欢迎,并在今天具有重要意义。 截止到2020年,已有90多篇新研究论文发表[1]。 这些论文提出了新的算法并对其进行了基准测试,以预测涡轮风扇数据集上的剩余使用寿命(RUL)。

When I first started learning about predictive maintenance, I stumbled upon a few blog posts using the turbofan degradation dataset. Each covered exploratory data analysis and a simple model to predict the RUL, but I felt two things were lacking:

当我第一次开始学习预测性维护时,我偶然发现了一些使用涡轮风扇降级数据集的博客文章。 每个都包含探索性数据分析和一个简单的模型来预测RUL,但是我感到缺少两件事:

  1. I never got a complete overview of how to apply different suitable techniques to the same problem

    对于如何将不同的适用技术应用于同一问题,我从未获得完整的概述。
  2. The blog posts would only focus on the first dataset, leaving me guessing how the more complex challenges could be solved.

    博客文章仅关注第一个数据集,让我猜测如何解决更复杂的挑战。

A few years later this seemed like a fun project for me to pick up. In a series of posts, I plan to showcase and explain multiple analysis techniques, while also offering a solution for the more complex datasets.

几年后,这对我来说似乎是一个有趣的项目。 在一系列文章中,我计划展示和解释多种分析技术,同时还为更复杂的数据集提供解决方案。

I’ve created an index below which I’ll update with links to new posts along the way:

我创建了一个索引,在该索引下,我将通过以下方式更新指向新帖子的链接:

1. FD001 — Exploratory data analysis and baseline model (this article) 2. FD001 — Updated assumption of RUL & Support Vector Regression 3. FD001 — Time series analysis: distributed lag models4. FD001 — Survival analysis for predictive maintenance5. FD003 — Random forest (I’ve changed the order, read the article to find out why) 6. FD002 — lagged MLP & reproducible results primer 7. FD004 — LSTM & wrap-up

1. FD001-探索性数据分析和基线模型(本文)2. FD001-RUL和支持向量回归的更新假设3. FD001-时间序列分析:分布式滞后模型4。 FD001 —预测性维护的生存分析5。 FD003 —随机森林(我更改了顺序,阅读文章以找出原因)6. FD002 —落后的MLP和可再现的结果引物7. FD004 — LSTM和总结

The turbofan dataset features four datasets of increasing complexity (see table I) [2, 3]. The engines operate normally in the beginning but develop a fault over time. For the training sets, the engines are run to failure, while in the test sets the time series end ‘sometime’ before failure. The goal is to predict the Remaining Useful Life (RUL) of each turbofan engine.

涡轮风扇数据集具有四个复杂度不断提高的数据集(请参见表I)[2,3]。 发动机在一开始就正常运行,但随着时间的流逝会出现故障。 对于训练集,引擎将运行至故障,而在测试集中,时间序列将在故障发生之前的“某个时候”结束。 目的是预测每个涡轮风扇发动机的剩余使用寿命(RUL)。

Image for post
Table I: Overview of turbofan datasets
表一:涡轮风扇数据集概述

Datasets include simulations of multiple turbofan engines over time, each row contains the following information: 1. Engine unit number 2. Time, in cycles 3. Three operational settings 4. 21 sensor readings

数据集包括随时间变化的多个涡轮风扇发动机的仿真,每一行包含以下信息:1.发动机单元编号2.周期时间3.三种运行设置4. 21个传感器读数

What I find really cool about this dataset is that you can’t use any domain knowledge, as you don’t know what a sensor has been measuring. So, results are purely based on applying the correct techniques.

我发现此数据集的真正酷点是您不能使用任何领域知识,因为您不知道传感器正在测量什么。 因此,结果完全基于应用正确的技术。

In today’s post we’ll focus on exploring the first dataset (FD001) in which all engines develop the same fault and have only one operating condition. In addition, we’ll create a baseline linear regression model so we can compare our modeling efforts of future posts.

在今天的帖子中,我们将重点研究第一个数据集(FD001),在该数据集中,所有引擎都出现相同的故障,并且只有一个工作状态。 另外,我们将创建一个基线线性回归模型,以便我们可以比较未来职位的建模工作。

探索性数据分析 (Exploratory Data Analysis)

Let’s get started by importing the required libraries, read the data and inspect the first few rows. Note that a few columns seem to have none to very little deviation in their values. We’ll explore these further down below.

首先,导入所需的库,读取数据并检查前几行。 请注意,几列的值似乎没有或几乎没有偏差。 我们将在下面进一步探讨这些内容。

%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()


from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# define filepath to read data
dir_path = './CMAPSSData/'


# define column names for easy indexing
index_names = ['unit_nr', 'time_cycles&#
  • 7
    点赞
  • 77
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值