自动编码器+K临近算法 (AutoEncoder+KNN)

深度学习

一、数据集

Date_Fruit_Datasets

二、思路

AutoEncoder:提取特征,以中间层作为特征
KNN:依据特征和标签进行分类

三、代码

1. 导入库.

import numpy as np
import pandas as pd
from keras import Input, Model
from keras.layers import Dense
from matplotlib import pyplot as plt
from sklearn import preprocessing
from sklearn.neighbors import KNeighborsClassifier

2. 读取 .arff 文件, 并转换为dataframe格式.

with open("Date_Fruit_Datasets.arff", encoding="utf-8") as f:
    header = []
    for line in f:
        if line.startswith("@ATTRIBUTE"):
            header.append(line.split()[1])
        elif line.startswith("@DATA"):
            break
    df = pd.read_csv(f, header=None)
    df.columns = header
df = pd.DataFrame(df)

3. 将文本标签转换为数字标签,并打乱顺序.

df.loc[df['Class']=='BERHI', 'Class'] = 0
df.loc[df['Class']=='DEGLET', 'Class'] = 1
df.loc[df['Class']=='DOKOL', 'Class'] = 2
df.loc[df['Class']=='IRAQI', 'Class'] = 3
df.loc[df['Class']=='ROTANA', 'Class'] = 4
df.loc[df['Class']=='SAFAVI', 'Class'] = 5
df.loc[df['Class']=='SOGAY', 'Class'] = 6

df = df.sample(frac=1).reset_index(drop=True)

4. 将数据拆分为特征和标签.

df_label = df['Class']
df = df.drop(columns='Class')

5. Training_Set and Test_Set.

dataset_train = df[0:600]
dataset_test = df[600:]
dataset_train_label = df_label[0:600]
dataset_test_label = df_label[600:]

6. 归一化.

scaler = preprocessing.MinMaxScaler()
X_train = pd.DataFrame(scaler.fit_transform(dataset_train),
                       columns=dataset_train.columns,
                       index=dataset_train.index)

X_test = pd.DataFrame(scaler.transform(dataset_test),
                      columns=dataset_test.columns,
                      index=dataset_test.index)

7. 构建 AutoEncoder 网络.

act_func = 'relu'

Net_In = Input(shape=(X_train.shape[1],))
net = Dense(68, activation=act_func,
            kernel_initializer='glorot_uniform',)(Net_In)
net = Dense(34, activation=act_func,
            kernel_initializer='glorot_uniform')(net)
Net_Mid = Dense(34)(net)
net = Dense(34, activation=act_func,
            kernel_initializer='glorot_uniform')(Net_Mid)
net = Dense(68, activation=act_func,
            kernel_initializer='glorot_uniform')(net)
Net_Out = Dense(X_train.shape[1],
                kernel_initializer='glorot_uniform')(net)
# define autoencoder model
model = Model(inputs=Net_In, outputs=Net_Out)

# compile autoencoder model
model.compile(optimizer='adam', loss='mse')
print(model.summary())

在这里插入图片描述

8. 拟合自动编码器模型以重建输入.

history = model.fit(np.array(X_train), np.array(X_train),
                    epochs=200, batch_size=30,
                    verbose=1, validation_data=(X_test, X_test))

plt.plot(history.history['loss'],
         'b',
         label='Training loss')
plt.plot(history.history['val_loss'],
         'r',
         label='Validation loss')
plt.legend(loc='upper right')
plt.xlabel('Epochs')
plt.ylabel('Loss, [mse]')
plt.ylim([0, 0.1])
plt.show()

在这里插入图片描述

9. 定义一个编码器模型(没有解码器).

encoder = Model(inputs=Net_In, outputs=Net_Mid)

10. 编码训练数据和测试数据.

X_train_encode = encoder.predict(X_train)
X_test_encode = encoder.predict(X_test)

11. 准备数据 for KNN.

X_train_feature = np.array(X_train_encode.tolist())
X_train_target = np.array(dataset_train_label.tolist())
X_test_feature = np.array(X_test_encode.tolist())

12. X_test_feature 的真实目标值.

X_test_truth_target = np.array(dataset_test_label.tolist())

13. 用 KNN 得到 X_test_feature 的预测目标值.

knn = KNeighborsClassifier()
knn.fit(X_train_feature, X_train_target)

X_test_predict_target = knn.predict(X_test_feature)

14. 计算准确率.

total_test_num = X_test_predict_target.shape[0]
correct_count = sum(X_test_predict_target == X_test_truth_target)
success_rate = '{:.2%}'.format(correct_count / num)
print("Success rate is", success_rate)

在这里插入图片描述

总结

测试50次,准确率基本在83%和92%之间波动。
最低83.89%
最高91.28%
平均87.40%
在这里插入图片描述

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值