滚动轴承数据集 故障诊断,预测、分类 最全套数据集 可做对比实验
1.CWRU西储大学轴承数据集
2.XJTU西安交通大学数据集
3.江南大学数据集
4.东南大学数据集
好的,下面我将详细介绍如何使用这四套滚动轴承数据集进行故障诊断、预测和分类,并进行对比实验。我们将使用Python和一些常见的机器学习库(如pandas
, numpy
, scikit-learn
, tensorflow
等)来处理这些数据集。
数据集概述
-
CWRU 西储大学轴承数据集
- 包含多种故障类型的数据。
- 提供了不同转速下的轴承振动信号。
-
XJTU 西安交通大学数据集
- 包含不同类型和严重程度的故障。
- 提供了多传感器数据。
-
江南大学数据集
- 包含不同工况下的轴承数据。
- 提供了丰富的故障样本。
-
东南大学数据集
- 包含多种类型的故障和不同的负载条件。
- 提供了详细的实验条件说明。
步骤概述
- 数据集准备
- 特征提取
- 数据预处理
- 模型训练
- 模型评估
- 结果对比
详细步骤
1. 数据集准备
确保你的数据集已经按照上述格式准备好,并且包含相应的文件目录结构。
bearing_datasets/
├── CWRU/
│ ├── normal.mat
│ ├── inner_race_fault.mat
│ └── ...
├── XJTU/
│ ├── normal.mat
│ ├── ball_fault.mat
│ └── ...
├── Jiangnan/
│ ├── normal.csv
│ ├── outer_race_fault.csv
│ └── ...
└── Southeast/
├── normal.csv
├── inner_race_fault.csv
└── ...
2. 特征提取
从原始振动信号中提取有用的特征。常用的特征包括时域特征、频域特征等。
import numpy as np
import pandas as pd
from scipy.signal import welch
from sklearn.preprocessing import StandardScaler
# Example function to extract features
def extract_features(signal, fs):
# Time domain features
mean_abs = np.mean(np.abs(signal))
std_dev = np.std(signal)
# Frequency domain features using Welch's method
freqs, Pxx = welch(signal, fs=fs, nperseg=256)
median_freq = np.median(freqs[Pxx > np.percentile(Pxx, 90)])
return [mean_abs, std_dev, median_freq]
# Example usage
def load_and_extract_features(dataset_path, fs=12000):
features = []
labels = []
for filename in os.listdir(dataset_path):
if filename.endswith('.mat'):
data = sio.loadmat(os.path.join(dataset_path, filename))
signal = data[list(data.keys())[-1]].flatten()
label = filename.split('_')[0] # Assuming label is part of the filename
features.append(extract_features(signal, fs))
labels.append(label)
elif filename.endswith('.csv'):
data = pd.read_csv(os.path.join(dataset_path, filename))
signal = data.iloc[:, 0].values
label = filename.split('_')[0] # Assuming label is part of the filename
features.append(extract_features(signal, fs))
labels.append(label)
return np.array(features), np.array(labels)
# Load and extract features from each dataset
cwru_features, cwru_labels = load_and_extract_features('bearing_datasets/CWRU')
xjtu_features, xjtu_labels = load_and_extract_features('bearing_datasets/XJTU')
jiangnan_features, jiangnan_labels = load_and_extract_features('bearing_datasets/Jiangnan')
southeast_features, southeast_labels = load_and_extract_features('bearing_datasets/Southeast')
3. 数据预处理
标准化特征数据并划分训练集和测试集。
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
# Combine all datasets
all_features = np.vstack((cwru_features, xjtu_features, jiangnan_features, southeast_features))
all_labels = np.concatenate((cwru_labels, xjtu_labels, jiangnan_labels, southeast_labels))
# Encode labels
label_encoder = LabelEncoder()
all_labels_encoded = label_encoder.fit_transform(all_labels)
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(all_features, all_labels_encoded, test_size=0.2, random_state=42)
# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
4. 模型训练
使用随机森林分类器进行训练。
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
# Initialize and train the model
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train_scaled, y_train)
# Predict on test set
y_pred = rf_classifier.predict(X_test_scaled)
# Evaluate the model
print("Random Forest Classifier:")
print(classification_report(y_test, y_pred, target_names=label_encoder.classes_))
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
5. 模型评估
评估每个数据集上的模型性能。
def evaluate_dataset(model, X_test, y_test, label_encoder):
y_pred = model.predict(X_test)
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=label_encoder.classes_))
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
# Evaluate on each dataset separately
print("\nEvaluating CWRU Dataset:")
evaluate_dataset(rf_classifier, cwru_features_scaled, cwru_labels_encoded, label_encoder)
print("\nEvaluating XJTU Dataset:")
evaluate_dataset(rf_classifier, xjtu_features_scaled, xjtu_labels_encoded, label_encoder)
print("\nEvaluating Jiangnan Dataset:")
evaluate_dataset(rf_classifier, jiangnan_features_scaled, jiangnan_labels_encoded, label_encoder)
print("\nEvaluating Southeast Dataset:")
evaluate_dataset(rf_classifier, southeast_features_scaled, southeast_labels_encoded, label_encoder)
6. 结果对比
通过比较每个数据集上的准确率和其他指标来进行结果对比。
import matplotlib.pyplot as plt
# Collect accuracies for comparison
accuracies = {
"CWRU": accuracy_score(cwru_labels_encoded, rf_classifier.predict(cwru_features_scaled)),
"XJTU": accuracy_score(xjtu_labels_encoded, rf_classifier.predict(xjtu_features_scaled)),
"Jiangnan": accuracy_score(jiangnan_labels_encoded, rf_classifier.predict(jiangnan_features_scaled)),
"Southeast": accuracy_score(southeast_labels_encoded, rf_classifier.predict(southeast_features_scaled))
}
# Plot accuracies
plt.bar(accuracies.keys(), accuracies.values())
plt.title('Model Accuracies on Different Datasets')
plt.xlabel('Dataset')
plt.ylabel('Accuracy')
plt.ylim([0, 1])
plt.show()
完整代码
以下是完整的代码示例,包含了从数据加载、特征提取、数据预处理、模型训练到结果对比的所有步骤。
运行脚本
在终端中运行以下命令来执行整个流程:
python main.py
总结
以上文档包含了从数据集准备、特征提取、数据预处理、模型训练与评估、可视化结果到结果对比的所有步骤。希望这些详细的信息和代码能够帮助你顺利实施和优化你的滚动轴承故障诊断系统