深度学习
一、数据集
二、思路
AutoEncoder:提取特征,以中间层作为特征
KNN:依据特征和标签进行分类
三、代码
1. 导入库.
import numpy as np
import pandas as pd
from keras import Input, Model
from keras.layers import Dense
from matplotlib import pyplot as plt
from sklearn import preprocessing
from sklearn.neighbors import KNeighborsClassifier
2. 读取 .arff 文件, 并转换为dataframe格式.
with open("Date_Fruit_Datasets.arff", encoding="utf-8") as f:
header = []
for line in f:
if line.startswith("@ATTRIBUTE"):
header.append(line.split()[1])
elif line.startswith("@DATA"):
break
df = pd.read_csv(f, header=None)
df.columns = header
df = pd.DataFrame(df)
3. 将文本标签转换为数字标签,并打乱顺序.
df.loc[df['Class']=='BERHI', 'Class'] = 0
df.loc[df['Class']=='DEGLET', 'Class'] = 1
df.loc[df['Class']=='DOKOL', 'Class'] = 2
df.loc[df['Class']=='IRAQI', 'Class'] = 3
df.loc[df['Class']=='ROTANA', 'Class'] = 4
df.loc[df['Class']=='SAFAVI', 'Class'] = 5
df.loc[df['Class']=='SOGAY', 'Class'] = 6
df = df.sample(frac=1).reset_index(drop=True)
4. 将数据拆分为特征和标签.
df_label = df['Class']
df = df.drop(columns='Class')
5. Training_Set and Test_Set.
dataset_train = df[0:600]
dataset_test = df[600:]
dataset_train_label = df_label[0:600]
dataset_test_label = df_label[600:]
6. 归一化.
scaler = preprocessing.MinMaxScaler()
X_train = pd.DataFrame(scaler.fit_transform(dataset_train),
columns=dataset_train.columns,
index=dataset_train.index)
X_test = pd.DataFrame(scaler.transform(dataset_test),
columns=dataset_test.columns,
index=dataset_test.index)
7. 构建 AutoEncoder 网络.
act_func = 'relu'
Net_In = Input(shape=(X_train.shape[1],))
net = Dense(68, activation=act_func,
kernel_initializer='glorot_uniform',)(Net_In)
net = Dense(34, activation=act_func,
kernel_initializer='glorot_uniform')(net)
Net_Mid = Dense(34)(net)
net = Dense(34, activation=act_func,
kernel_initializer='glorot_uniform')(Net_Mid)
net = Dense(68, activation=act_func,
kernel_initializer='glorot_uniform')(net)
Net_Out = Dense(X_train.shape[1],
kernel_initializer='glorot_uniform')(net)
# define autoencoder model
model = Model(inputs=Net_In, outputs=Net_Out)
# compile autoencoder model
model.compile(optimizer='adam', loss='mse')
print(model.summary())
8. 拟合自动编码器模型以重建输入.
history = model.fit(np.array(X_train), np.array(X_train),
epochs=200, batch_size=30,
verbose=1, validation_data=(X_test, X_test))
plt.plot(history.history['loss'],
'b',
label='Training loss')
plt.plot(history.history['val_loss'],
'r',
label='Validation loss')
plt.legend(loc='upper right')
plt.xlabel('Epochs')
plt.ylabel('Loss, [mse]')
plt.ylim([0, 0.1])
plt.show()
9. 定义一个编码器模型(没有解码器).
encoder = Model(inputs=Net_In, outputs=Net_Mid)
10. 编码训练数据和测试数据.
X_train_encode = encoder.predict(X_train)
X_test_encode = encoder.predict(X_test)
11. 准备数据 for KNN.
X_train_feature = np.array(X_train_encode.tolist())
X_train_target = np.array(dataset_train_label.tolist())
X_test_feature = np.array(X_test_encode.tolist())
12. X_test_feature 的真实目标值.
X_test_truth_target = np.array(dataset_test_label.tolist())
13. 用 KNN 得到 X_test_feature 的预测目标值.
knn = KNeighborsClassifier()
knn.fit(X_train_feature, X_train_target)
X_test_predict_target = knn.predict(X_test_feature)
14. 计算准确率.
total_test_num = X_test_predict_target.shape[0]
correct_count = sum(X_test_predict_target == X_test_truth_target)
success_rate = '{:.2%}'.format(correct_count / num)
print("Success rate is", success_rate)
总结
测试50次,准确率基本在83%和92%之间波动。
最低83.89%
最高91.28%
平均87.40%