基于传感器时间序列数据的机器故障预测(第一部分,Python)

191 篇文章 1 订阅
152 篇文章 5 订阅

Importing the necessary Libraries and Dataset

## Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
import warnings
warnings.filterwarnings('ignore')
sns.set(style = "whitegrid",font_scale = 1.5)
%matplotlib inline
plt.rcParams['figure.figsize']=[12,8]
#importing Dataset
sensor_df = pd.read_csv('sensor.csv')

Data Wrangling

print("The dataset has " , sensor_df.shape[0],"rows and", sensor_df.shape[1], "columns")
The dataset has  220320 rows and 55 columns
#First 10 rows
sensor_df.head(10)

# Last 10 rows
sensor_df.tail(10)

We have data for sensor readings from April to August, collected daily every minute.

sensor_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 220320 entries, 0 to 220319
Data columns (total 55 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   Unnamed: 0      220320 non-null  int64  
 1   timestamp       220320 non-null  object 
 2   sensor_00       210112 non-null  float64
 3   sensor_01       219951 non-null  float64
 4   sensor_02       220301 non-null  float64
 5   sensor_03       220301 non-null  float64
 6   sensor_04       220301 non-null  float64
 7   sensor_05       220301 non-null  float64
 8   sensor_06       215522 non-null  float64
 9   sensor_07       214869 non-null  float64
 10  sensor_08       215213 non-null  float64
 11  sensor_09       215725 non-null  float64
 12  sensor_10       220301 non-null  float64
 13  sensor_11       220301 non-null  float64
 14  sensor_12       220301 non-null  float64
 15  sensor_13       220301 non-null  float64
 16  sensor_14       220299 non-null  float64
 17  sensor_15       0 non-null       float64
 18  sensor_16       220289 non-null  float64
 19  sensor_17       220274 non-null  float64
 20  sensor_18       220274 non-null  float64
 21  sensor_19       220304 non-null  float64
 22  sensor_20       220304 non-null  float64
 23  sensor_21       220304 non-null  float64
 24  sensor_22       220279 non-null  float64
 25  sensor_23       220304 non-null  float64
 26  sensor_24       220304 non-null  float64
 27  sensor_25       220284 non-null  float64
 28  sensor_26       220300 non-null  float64
 29  sensor_27       220304 non-null  float64
 30  sensor_28       220304 non-null  float64
 31  sensor_29       220248 non-null  float64
 32  sensor_30       220059 non-null  float64
 33  sensor_31       220304 non-null  float64
 34  sensor_32       220252 non-null  float64
 35  sensor_33       220304 non-null  float64
 36  sensor_34       220304 non-null  float64
 37  sensor_35       220304 non-null  float64
 38  sensor_36       220304 non-null  float64
 39  sensor_37       220304 non-null  float64
 40  sensor_38       220293 non-null  float64
 41  sensor_39       220293 non-null  float64
 42  sensor_40       220293 non-null  float64
 43  sensor_41       220293 non-null  float64
 44  sensor_42       220293 non-null  float64
 45  sensor_43       220293 non-null  float64
 46  sensor_44       220293 non-null  float64
 47  sensor_45       220293 non-null  float64
 48  sensor_46       220293 non-null  float64
 49  sensor_47       220293 non-null  float64
 50  sensor_48       220293 non-null  float64
 51  sensor_49       220293 non-null  float64
 52  sensor_50       143303 non-null  float64
 53  sensor_51       204937 non-null  float64
 54  machine_status  220320 non-null  object 
dtypes: float64(52), int64(1), object(2)
memory usage: 92.5+ MB
The data set consists of 51 numerical features ,timestamp and a categorical label.


The label contains string values that represent normal, broken and recovering operational conditions of the machine.

countmeanstdmin25%50%75%max
Unnamed: 0220320.0110159.50000063601.0499910.00000055079.750000110159.500000165239.250000220319.000000
sensor_00210112.02.3722210.4122270.0000002.4388312.4565392.4998262.549016
sensor_01219951.047.5916113.2966660.00000046.31076048.13367849.47916056.727430
sensor_02220301.050.8673923.66682033.15972050.39062051.64930052.77777056.032990
sensor_03220301.043.7524812.41888731.64062042.83853944.22742845.31250048.220490
sensor_04220301.0590.673936144.0239122.798032626.620400632.638916637.615723800.000000
sensor_05220301.073.39641417.2982470.00000069.97626075.57679080.91215099.999880
sensor_06215522.013.5015372.1637360.01446813.34635013.64294014.53993022.251160
sensor_07214869.015.8431522.2011550.00000015.90712016.16753016.42795023.596640
sensor_08215213.015.2007212.0373900.02893515.18374015.49479015.69734024.348960
sensor_09215725.014.7992102.0919630.00000015.05353015.08247015.11863025.000000
sensor_10220301.041.47033912.0935190.00000040.70526044.29134047.46376076.106860
sensor_11220301.041.91831913.0564250.00000038.85642045.36314049.65654060.000000
sensor_12220301.029.13697510.1139350.00000028.68681032.51583034.93973045.000000
sensor_13220301.07.0788586.9017550.0000001.5385162.92980912.85952031.187550
sensor_14220299.0376.860041113.20638232.409550418.103250420.106200420.997100500.000000
sensor_150.0NaNNaNNaNNaNNaNNaNNaN
sensor_16220289.0416.472892126.0726420.000000459.453400462.856100464.302700739.741500
sensor_17220274.0421.127517129.1561750.000000454.138825462.020250466.857075599.999939
sensor_18220274.02.3037850.7658830.0000002.4475422.5337042.5876824.873250
sensor_19220304.0590.829775199.3458200.000000662.768975665.672400667.146700878.917900
sensor_20220304.0360.805165101.9741180.000000398.021500399.367000400.088400448.907900
sensor_21220304.0796.225942226.67931795.527660875.464400879.697600882.1299001107.526000
sensor_22220279.0459.792815154.5283370.000000478.962600531.855900534.254850594.061100
sensor_23220304.0922.609264291.8352800.000000950.922400981.9250001090.8080001227.564000
sensor_24220304.0556.235397182.2979790.000000601.151050625.873500628.6077251000.000000
sensor_25220284.0649.144799220.8651660.000000693.957800740.203500750.357125839.575000
sensor_26220300.0786.411781246.66360843.154790790.489575861.869600919.1047751214.420000
sensor_27220304.0501.506589169.8231730.000000448.297950494.468450536.2745502000.000000
sensor_28220304.0851.690339313.0740324.319347782.682625967.2798501043.9765001841.146000
sensor_29220248.0576.195305225.7640910.636574518.947225564.872500744.0214751466.281000
sensor_30220059.0614.596442195.7268720.000000627.777800668.981400697.2222001600.000000
sensor_31220304.0863.323100283.54476023.958330839.062400917.708300981.2499001800.000000
sensor_32220252.0804.283915260.6023610.240716760.607475878.850750943.8776251839.211000
sensor_33220304.0486.405980150.7518366.460602489.761075512.271750555.1632251578.600000
sensor_34220304.0234.97177688.37606554.882370172.486300226.356050316.844950425.549800
sensor_35220304.0427.129817141.7725190.000000353.176625473.349350528.891025694.479126
sensor_36220304.0593.033876289.3855112.260970288.547575709.668050837.333025984.060700
sensor_37220304.060.78736037.6048830.00000028.79922064.29548590.821928174.901200
sensor_38220293.049.65594610.54039724.47916645.57291049.47916053.645830417.708300
sensor_39220293.036.61044415.61372319.27083032.55208035.41666039.062500547.916600
sensor_40220293.068.84453021.37113923.43750057.81250066.40625077.864580512.760400
sensor_41220293.035.3651267.89866520.83333032.55208034.89583237.760410420.312500
sensor_42220293.035.45345510.25952122.13541632.81250035.15625036.979164374.218800
sensor_43220293.043.87959111.04440424.47916639.58333042.96875046.614580408.593700
sensor_44220293.042.65687711.57635525.75231636.74768440.50926045.1388901000.000000
sensor_45220293.043.09498412.83752026.33101836.74768440.21991044.849540320.312500
sensor_46220293.048.01858515.64128426.33101840.50925844.84954051.215280370.370400
sensor_47220293.044.34090310.44243727.19907039.06250042.53472046.585650303.530100
sensor_48220293.0150.88904482.24495726.33101883.912030138.020800208.333300561.632000
sensor_49220293.057.11996819.14359826.62037047.74306052.66204060.763890464.409700
sensor_50143303.0183.04926065.25865027.488426167.534700193.865700219.9074001000.000000
sensor_51204937.0202.699667109.58860727.777779179.108800197.338000216.7245001000.000000

print('all class labels:',sensor_df['machine_status'].unique())
all class labels: ['NORMAL' 'BROKEN' 'RECOVERING']

The label data has 3 machine status values:

BROKEN represents machine is failed.

RECOVERING represents machine trying to recover from failed status.

NORMAL represents the machine is working in normal status.

#Machine status distribution


sensor_df['machine_status'].value_counts()
NORMAL        205836
RECOVERING     14477
BROKEN             7
Name: machine_status, dtype: int64

Data Preprocessing

Remove redundant columns

Remove duplicates

Handle missing values

Convert Timestamp column which is of type object to datetime

#no values for sensor 15 and unnamed column is unnecessary,so I will drop these columns
sensor_df.drop(['sensor_15','Unnamed: 0'],inplace = True,axis=1)
sensor_df.head()

#check percentage of missing values for each column
(sensor_df.isnull().sum().sort_values(ascending=False)/len(sensor_df))*100
sensor_50         34.956881
sensor_51          6.982117
sensor_00          4.633261
sensor_07          2.474129
sensor_08          2.317992
sensor_06          2.177741
sensor_09          2.085603
sensor_01          0.167484
sensor_30          0.118464
sensor_29          0.032680
sensor_32          0.030864
sensor_17          0.020879
sensor_18          0.020879
sensor_22          0.018609
sensor_25          0.016340
sensor_16          0.014070
sensor_49          0.012255
sensor_48          0.012255
sensor_47          0.012255
sensor_46          0.012255
sensor_45          0.012255
sensor_44          0.012255
sensor_43          0.012255
sensor_42          0.012255
sensor_41          0.012255
sensor_40          0.012255
sensor_39          0.012255
sensor_38          0.012255
sensor_14          0.009532
sensor_26          0.009078
sensor_03          0.008624
sensor_10          0.008624
sensor_13          0.008624
sensor_12          0.008624
sensor_11          0.008624
sensor_05          0.008624
sensor_04          0.008624
sensor_02          0.008624
sensor_36          0.007262
sensor_37          0.007262
sensor_28          0.007262
sensor_27          0.007262
sensor_31          0.007262
sensor_35          0.007262
sensor_24          0.007262
sensor_23          0.007262
sensor_34          0.007262
sensor_21          0.007262
sensor_20          0.007262
sensor_19          0.007262
sensor_33          0.007262
timestamp          0.000000
machine_status     0.000000
dtype: float64
#too many missing values in sensor 50 , so dropping that
sensor_df.drop('sensor_50',inplace = True,axis=1)
sensor_df.head()

to impute some of the missing values with their mean

#imputing the remaining missing values with mean
sensor_df.fillna(sensor_df.mean(),inplace= True)
sensor_df.isnull().sum()
timestamp         0
sensor_00         0
sensor_01         0
sensor_02         0
sensor_03         0
sensor_04         0
sensor_05         0
sensor_06         0
sensor_07         0
sensor_08         0
sensor_09         0
sensor_10         0
sensor_11         0
sensor_12         0
sensor_13         0
sensor_14         0
sensor_16         0
sensor_17         0
sensor_18         0
sensor_19         0
sensor_20         0
sensor_21         0
sensor_22         0
sensor_23         0
sensor_24         0
sensor_25         0
sensor_26         0
sensor_27         0
sensor_28         0
sensor_29         0
sensor_30         0
sensor_31         0
sensor_32         0
sensor_33         0
sensor_34         0
sensor_35         0
sensor_36         0
sensor_37         0
sensor_38         0
sensor_39         0
sensor_40         0
sensor_41         0
sensor_42         0
sensor_43         0
sensor_44         0
sensor_45         0
sensor_46         0
sensor_47         0
sensor_48         0
sensor_49         0
sensor_51         0
machine_status    0
dtype: int64
#checking of duplicate rows
    
sensor_df.duplicated().any()
False

No duplicate values, hence we don't need to remove any row.

# Now, lets make it a time series and set it as index
sensor_df['timestamp'] = pd.to_datetime(sensor_df['timestamp'])
sensor_df = sensor_df.set_index('timestamp')

The First 5 rows of the dataset looks as follows.

sensor_df.head()

Exploratory Data Analysis

## Machine status distribution
sensor_df.machine_status.value_counts()
NORMAL        205836
RECOVERING     14477
BROKEN             7
Name: machine_status, dtype: int64
#Plotting Machine Status Distribution
plt.figure(figsize=(5,3))
plt.title('machine_status')
sns.countplot(x='machine_status',data=sensor_df)

#machine status -  pie chart
plt.figure(figsize=(5,3))
stroke_labels = ["Normal","Recovering","Broken"]
sizes = sensor_df.machine_status.value_counts()


plt.pie(x=sizes,labels=stroke_labels)
plt.show()

plt.figure(figsize=(60,40))
sns.heatmap(sensor_df.corr(),annot=True,cmap='coolwarm');
corr = sensor_df.corr()

#The HeatMap shows Correlation between features greater than 0.8
corr80 = corr[abs(corr)> 0.8]
sns.heatmap(corr80,cmap='coolwarm')

#extract the readings from the Broken state of the pump
broken= sensor_df[sensor_df['machine_status']=='BROKEN']


#Extract the name of the numerical columns
sensor_df_2 = sensor_df.drop(['machine_status'],axis=1)
names= sensor_df_2.columns


#plot timeseries for each sensor with Broken state marked with X in red color


for name in names:
    _=plt.figure(figsize=(18,3))
    _=plt.plot(broken[name],linestyle='none',marker='X',color='red',markersize=12)
    _=plt.plot(sensor_df[name],color='blue')
    _=plt.title(name)
    plt.show()

#Converting the machine status column to numeric
MS_class_dict = {"BROKEN": 0, "NORMAL": 1, "RECOVERING": 2}
sensor_df['machine_status'] = sensor_df['machine_status'].map(MS_class_dict)
sensor_df.head()

Stationarity and Autocorrelation

# Resample the entire dataset by daily average
rollmean = sensor_df.resample(rule='D').mean()
rollstd = sensor_df.resample(rule='D').std()
# Plot time series for each sensor with its mean and standard deviation with the BROKEN state marked with X in red color
for name in names:
    _ = plt.figure(figsize=(18,3))
    _ = plt.plot(sensor_df[name], color='blue', label='Original')
    _ = plt.plot(rollmean[name], color='red', label='Rolling Mean')
    _ = plt.plot(rollstd[name], color='black', label='Rolling Std' )
    _ = plt.legend(loc='best')
    _ = plt.title(name)
    plt.show()

Pre-processing and Feature Engineering

1.Scale the data

2.Perform PCA and look at the most important principal components based on inertia

# Standardize/scale the dataset and apply PCA
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline


# Extract the names of the numerical columns
df2 = sensor_df.drop(['machine_status'], axis=1)
names=df2.columns
x = sensor_df[names]
scaler = StandardScaler()
pca = PCA()
pipeline = make_pipeline(scaler, pca)
pipeline.fit(x)
Pipeline(steps=[('standardscaler', StandardScaler()), ('pca', PCA())])
features = range(pca.n_components_)
_ = plt.figure(figsize=(22, 5))
_ = plt.bar(features, pca.explained_variance_)
_ = plt.xlabel('PCA feature')
_ = plt.ylabel('Variance')
_ = plt.xticks(features)
_ = plt.title("Importance of the Principal Components based on inertia")
plt.show()

# Calculate PCA with 2 components
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents, columns = ['pc1', 'pc2'])
principalDf.head()

pc1pc2
069.523134265.704571
169.523134265.704571
227.841569283.462197
324.548981290.213914
429.627090294.619639

sensor_df['pc1']=pd.Series(principalDf['pc1'].values, index=sensor_df.index)
sensor_df['pc2']=pd.Series(principalDf['pc2'].values, index=sensor_df.index)
sensor_df['pc1'],sensor_df['pc2']
(timestamp
 2018-04-01 00:00:00     69.523134
 2018-04-01 00:01:00     69.523134
 2018-04-01 00:02:00     27.841569
 2018-04-01 00:03:00     24.548981
 2018-04-01 00:04:00     29.627090
                           ...    
 2018-08-31 23:55:00   -308.531521
 2018-08-31 23:56:00   -294.603672
 2018-08-31 23:57:00   -300.207654
 2018-08-31 23:58:00   -285.141390
 2018-08-31 23:59:00   -298.182526
 Name: pc1, Length: 220320, dtype: float64,
 timestamp
 2018-04-01 00:00:00    265.704571
 2018-04-01 00:01:00    265.704571
 2018-04-01 00:02:00    283.462197
 2018-04-01 00:03:00    290.213914
 2018-04-01 00:04:00    294.619639
                           ...    
 2018-08-31 23:55:00   -274.597310
 2018-08-31 23:56:00   -256.328332
 2018-08-31 23:57:00   -256.932727
 2018-08-31 23:58:00   -263.099471
 2018-08-31 23:59:00   -264.545953
 Name: pc2, Length: 220320, dtype: float64)

Check stationarity with Dickey-Fuller Test

# Compute change in daily mean 
pca1 = principalDf['pc1'].pct_change()
# Compute autocorrelation
autocorrelation = pca1.dropna().autocorr()
print('Autocorrelation is: ', autocorrelation)
Autocorrelation is:  -6.773604518564134e-06
# Plot ACF
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(pca1.dropna(), lags=20, alpha=0.05)

# Compute change in daily mean 
pca2 = principalDf['pc2'].pct_change()
# Compute autocorrelation
autocorrelation = pca2.autocorr()
print('Autocorrelation is: ', autocorrelation)
Autocorrelation is:  -1.4768185278992946e-06
# Plot ACF
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(pca2.dropna(), lags=20, alpha=0.05)

知乎学术咨询:https://www.zhihu.com/consult/people/792359672131756032?isMe=1

担任《Mechanical System and Signal Processing》等审稿专家,擅长领域:信号滤波/降噪,机器学习/深度学习,时间序列预分析/预测,设备故障诊断/缺陷检测/异常检测。

分割线

基于自编码器的时间序列异常检测(Python,ipynb文件)

import pandas as pd import tensorflow as tf from keras.layers import Input, Dense from keras.models import Model from sklearn.metrics import precision_recall_fscore_support import matplotlib.pyplot as plt

完整代码:

mbd.pub/o/bread/mbd-Zpmcl5xq

图片

  • 4
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

哥廷根数学学派

码字不易,且行且珍惜

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值