python机器学习库keras——AutoEncoder自编码、特征压缩

腾讯AI架构师

已于 2022-03-27 16:53:25 修改

阅读量1.3w

点赞数 3

分类专栏： python 机器学习后端爬虫系列课程文章标签：压缩 keras 特征提取神经网络

于 2018-05-01 10:02:18 首次发布

本文链接：https://blog.csdn.net/luanpeng825485697/article/details/80154286

版权

python 机器学习后端爬虫系列课程专栏收录该内容

175 篇文章 128 订阅

订阅专栏

全栈工程师开发手册（作者：栾鹏）
python教程全解

keras使用深度网络实现自编码，也就是说对每个样本的n维特征，使用k为特征来表示，实现编码压缩的功能。也实现了特征选择的功能。比如手写体包含754个像素，也就包含754个特征，如果想用两个特征表示。在二维矩阵中就能识别手写体数字该怎么做呢。

自编码器是无监督的学习。它是一种仿人脑的对特征逐层抽象提取的过程，学习过程中有两点：一是无监督学习，即对训练数据不需要进行标签化标注，这种学习是对数据内容的组织形式的学习，提取的是频繁出现的特征；二是逐层抽象，特征是需要不断抽象的。

自编码器（AutoEncoder），即可以使用自身的高阶特征自我编码，自编码器其实也是一种神经网络，其输入和输出是一致的，借助了稀疏编码的思想，目标是使用稀疏的高阶特征重新组合来重构自己。

其中使用到了手写体读写py文件和手写体数据文件。

参考github：https://github.com/626626cdllp/kears/tree/master/AutoEncoder

import numpy as np
import MNIST
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Model  # 泛型模型
from keras.layers import Dense, Input
import matplotlib.pyplot as plt

X_train, Y_train = MNIST.get_training_data_set(60000, True,False)  # 加载训练样本数据集，和one-hot编码后的样本标签数据集。最大60000
X_test, Y_test = MNIST.get_test_data_set(10000, True,False)  # 加载测试特征数据集，和one-hot编码后的测试标签数据集，最大10000
X_train = np.array(X_train).astype(bool)    # 转化为黑白图
Y_train = np.array(Y_train)
X_test = np.array(X_test).astype(bool)   # 转化为黑白图
Y_test = np.array(Y_test)
print('样本数据集的维度：', X_train.shape,Y_train.shape)   # (600, 784)  (600, 10)
print('测试数据集的维度：', X_test.shape,Y_test.shape)   # (100, 784) (100, 10)


# 压缩特征维度至2维
encoding_dim = 2

# this is our input placeholder
input_img = Input(shape=(784,))

# 编码层
encoded = Dense(128, activation='relu')(input_img)
encoded = Dense(64, activation='relu')(encoded)
encoded = Dense(10, activation='relu')(encoded)
encoder_output = Dense(encoding_dim)(encoded)

# 解码层
decoded = Dense(10, activation='relu')(encoder_output)
decoded = Dense(64, activation='relu')(decoded)
decoded = Dense(128, activation='relu')(decoded)
decoded = Dense(784, activation='tanh')(decoded)

# 构建自编码模型
autoencoder = Model(inputs=input_img, outputs=decoded)

# 构建编码模型
encoder = Model(inputs=input_img, outputs=encoder_output)

# compile autoencoder
autoencoder.compile(optimizer='adam', loss='mse')

# 将训练特征即作为输入又作为输出，这样就同时训练的编码和解码
autoencoder.fit(X_train, X_train, epochs=200, batch_size=256, shuffle=True)

# plotting
encoded_imgs = encoder.predict(X_test)
print(encoded_imgs)
plt.scatter(encoded_imgs[:, 0], encoded_imgs[:, 1], c=Y_test, s=6)
plt.colorbar()
plt.show()

在这里插入图片描述
在代码中我们应用了一个小技巧，就是将灰度图转化为了黑白图。这就简化了模型。从结果图中我们可以看出不同数字的手写体压缩伟二维后，仍然具有很好的区分。通过在二维中我们也能把他们进行区分，所以这个二维特性就是源784维的一个有效压缩。

腾讯AI架构师

关注

3
点赞
踩
20

收藏

觉得还不错? 一键收藏
打赏
2
评论
python机器学习库keras——AutoEncoder自编码、特征压缩

全栈工程师开发手册（作者：栾鹏） python教程全解keras使用深度网络实现自编码，也就是说对每个样本的n维特征，使用k为特征来表示，实现编码压缩的功能。也实现了特征选择的功能。比如手写体包含754个像素，也就包含754个特征，如果想用两个特征表示。在二维矩阵中就能识别手写体数字该怎么做呢。自编码器是无监督的学习。它是一种仿人脑的对特征逐层抽象提取的过程，学习过程中...
复制链接

扫一扫