涡扇发动机的预测性维护

最新推荐文章于 2025-04-10 16:23:00 发布

weixin_26730921

最新推荐文章于 2025-04-10 16:23:00 发布

阅读量5.4k

点赞数 7

文章标签： python java

原文链接：https://towardsdatascience.com/predictive-maintenance-of-turbofan-engines-ec54a083127

版权

本文探讨了NASA的涡扇发动机退化模拟数据集(CMAPSS)，该数据集在预测剩余使用寿命(RUL)方面仍被广泛研究。文章旨在提供对不同预测技术的应用概述，并解决更复杂数据集的挑战。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

探索NASA的涡扇数据集 (Exploring NASA’s turbofan dataset)

Although released over a decade ago, NASA’s turbofan engine degradation simulation dataset (CMAPSS) remains popular and relevant today. Over 90 new research papers have been published in 2020 so far [1]. These papers present and benchmark novel algorithms to predict Remaining Useful Life (RUL) on the turbofan datasets.

尽管已经发布了十多年，但NASA的涡扇发动机退化模拟数据集(CMAPSS)仍然很受欢迎，并在今天具有重要意义。截止到2020年，已有90多篇新研究论文发表[1]。这些论文提出了新的算法并对其进行了基准测试，以预测涡轮风扇数据集上的剩余使用寿命(RUL)。

When I first started learning about predictive maintenance, I stumbled upon a few blog posts using the turbofan degradation dataset. Each covered exploratory data analysis and a simple model to predict the RUL, but I felt two things were lacking:

当我第一次开始学习预测性维护时，我偶然发现了一些使用涡轮风扇降级数据集的博客文章。每个都包含探索性数据分析和一个简单的模型来预测RUL，但是我感到缺少两件事：

I never got a complete overview of how to apply different suitable techniques to the same problem
对于如何将不同的适用技术应用于同一问题，我从未获得完整的概述。
The blog posts would only focus on the first dataset, leaving me guessing how the more complex challenges could be solved.
博客文章仅关注第一个数据集，让我猜测如何解决更复杂的挑战。

A few years later this seemed like a fun project for me to pick up. In a series of posts, I plan to showcase and explain multiple analysis techniques, while also offering a solution for the more complex datasets.

几年后，这对我来说似乎是一个有趣的项目。在一系列文章中，我计划展示和解释多种分析技术，同时还为更复杂的数据集提供解决方案。

I’ve created an index below which I’ll update with links to new posts along the way:

我创建了一个索引，在该索引下，我将通过以下方式更新指向新帖子的链接：

1. FD001 — Exploratory data analysis and baseline model (this article) 2. FD001 — Updated assumption of RUL & Support Vector Regression 3. FD001 — Time series analysis: distributed lag models4. FD001 — Survival analysis for predictive maintenance5. FD003 — Random forest (I’ve changed the order, read the article to find out why) 6. FD002 — lagged MLP & reproducible results primer 7. FD004 — LSTM & wrap-up

1. FD001-探索性数据分析和基线模型(本文)2. FD001-RUL和支持向量回归的更新假设3. FD001-时间序列分析：分布式滞后模型4。 FD001 —预测性维护的生存分析5。 FD003 —随机森林(我更改了顺序，阅读文章以找出原因)6. FD002 —落后的MLP和可再现的结果引物7. FD004 — LSTM和总结

The turbofan dataset features four datasets of increasing complexity (see table I) [2, 3]. The engines operate normally in the beginning but develop a fault over time. For the training sets, the engines are run to failure, while in the test sets the time series end ‘sometime’ before failure. The goal is to predict the Remaining Useful Life (RUL) of each turbofan engine.

涡轮风扇数据集具有四个复杂度不断提高的数据集(请参见表I)[2，3]。发动机在一开始就正常运行，但随着时间的流逝会出现故障。对于训练集，引擎将运行至故障，而在测试集中，时间序列将在故障发生之前的“某个时候”结束。目的是预测每个涡轮风扇发动机的剩余使用寿命(RUL)。

Image for post — Table I: Overview of turbofan datasets

Datasets include simulations of multiple turbofan engines over time, each row contains the following information: 1. Engine unit number 2. Time, in cycles 3. Three operational settings 4. 21 sensor readings

数据集包括随时间变化的多个涡轮风扇发动机的仿真，每一行包含以下信息：1.发动机单元编号2.周期时间3.三种运行设置4. 21个传感器读数

What I find really cool about this dataset is that you can’t use any domain knowledge, as you don’t know what a sensor has been measuring. So, results are purely based on applying the correct techniques.

我发现此数据集的真正酷点是您不能使用任何领域知识，因为您不知道传感器正在测量什么。因此，结果完全基于应用正确的技术。

In today’s post we’ll focus on exploring the first dataset (FD001) in which all engines develop the same fault and have only one operating condition. In addition, we’ll create a baseline linear regression model so we can compare our modeling efforts of future posts.

在今天的帖子中，我们将重点研究第一个数据集(FD001)，在该数据集中，所有引擎都出现相同的故障，并且只有一个工作状态。另外，我们将创建一个基线线性回归模型，以便我们可以比较未来职位的建模工作。

探索性数据分析 (Exploratory Data Analysis)

Let’s get started by importing the required libraries, read the data and inspect the first few rows. Note that a few columns seem to have none to very little deviation in their values. We’ll explore these further down below.

首先，导入所需的库，读取数据并检查前几行。请注意，几列的值似乎没有或几乎没有偏差。我们将在下面进一步探讨这些内容。

%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()


from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# define filepath to read data
dir_path = './CMAPSSData/'


# define column names for easy indexing
index_names = ['unit_nr', 'time_cycles&

最低0.47元/天解锁文章