【读书笔记】【机器学习实战】第十五章：自动编码器

最新推荐文章于 2024-08-06 09:13:33 发布

大仙儿智

最新推荐文章于 2024-08-06 09:13:33 发布

阅读量850

点赞数

分类专栏：机器学习读书笔记

本文链接：https://blog.csdn.net/qq_35819160/article/details/89959879

版权

机器学习同时被 2 个专栏收录

16 篇文章 0 订阅

订阅专栏

读书笔记

16 篇文章 0 订阅

订阅专栏

阅读书籍为《Hands-On Machine Learning with Scikit-Learn & TensorFlow》王静源等翻译的中文译版《机器学习实战，基于 Scikit-Learn 和 TensorFlow》,本文中所有图片均来自于书籍相关部分截图。

文章目录

什么是自动编码器？

自动编码器是能在无监督的情况下学习有效的表示输入数据（编码）的人工神经网络；
从根本上来说，自动编码器的本质是学会高效的表示数据；
自动编码器可以用于：降噪，特征提取，无监督预训练以及生成模型等方面；
自动编码器的原理：编码器表面上只是将输入数据复制到输出；但实际上：神经网络内部会有多种约束限制（比如限制网络内部表示尺寸/为输入数据添加噪音训练网络去除噪音等）使得“复制过程”十分困难，从而迫使自动编码器学习有效表示数据的方法；

何为数据的高效表示？

在这里插入图片描述
观察上列两组数据，以人类的直觉会觉得第一个数组较短，较第二个数组更容易记忆；
经过分析，虽然第二组数组较长却有规律可循（如果当前n是偶数下一个则是n/2, 如果当前n是基数下一个则是3*n+1.(这个序列是我们所说的冰雹数组)），当我们掌握了这个特征我们只需记住第一个数字和整个序列的长度即可完成整个序列的记忆；
抛砖引玉后我们可知：自动编码器本质就在与学习输入数据中存在的规律（特征/模式），以此来高效的表示输入数据。

几种常见的自动编码器类型

一个简单的自动编码器：
在这里插入图片描述
根据神经网络中不同的限制约束，我们可将自动编码器分为一下几种：

1.栈式（深度）自动编码器：

有多个隐藏层的自动编码器，隐层越多就可以学习越复杂的编码
在这里插入图片描述

TF实现栈式自动编码器

import tensorflow as tf
from tensorflow.contrib.layers import fully_connected
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
#定义每一层的神经元个数
n_inputs = 784  # for MNIST 28 * 28
n_hidden1 = 300
n_hidden2 = 150 # codings
n_hidden3 = 300
n_outputs = 784
learning_rate = 0.01
l2_reg = 0.001
#为输入申请占位符
X = tf.placeholder(tf.float32, shape=[None, n_inputs])
#构建网络（输入->h1->h2->h3->输出）
with tf.contrib.framework.arg_scope(
        [fully_connected], activation_fn=tf.nn.elu,
        weights_initializer=tf.contrib.layers.variance_scaling_initializer(),
        weights_regularizer=tf.contrib.layers.l2_regularizer(l2_reg)):
    hidden1 = fully_connected(X, n_hidden1)
    hidden2 = fully_connected(hidden1, n_hidden2) # codings
    hidden3 = fully_connected(hidden2, n_hidden3)
    outputs = fully_connected(hidden3, n_outputs, activation_fn=None)
#损失函数
reconstruction_loss = tf.reduce_mean(tf.square(outputs - X)) # MSE
reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
loss = tf.add_n([reconstruction_loss] + reg_losses)
optimizer = tf.train.AdamOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()

#批量学习，每次加载150
n_epochs = 5
batch_size = 150
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        n_batches = mnist.train.num_examples // batch_size
        for iteration in range(n_batches):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X: X_batch})

提高训练速度的方法有二

1.权重绑定（可以降低过度配置的风险）但需手动建立网络：

activation = tf.nn.elu
regularizer = tf.contrib.layers.l2_regularizer(l2_reg)
initializer = tf.contrib.layers.variance_scaling_initializer()
X = tf.placeholder(tf.float32, shape=[None, n_inputs])
weights1_init = initializer([n_inputs, n_hidden1])
weights2_init = initializer([n_hidden1, n_hidden2])
weights1 = tf.Variable(weights1_init, dtype=tf.float32, name="weights1")
weights2 = tf.Variable(weights2_init, dtype=tf.float32, name="weights2")
weights3 = tf.transpose(weights2, name="weights3") # tied weights
weights4 = tf.transpose(weights1, name="weights4") # tied weights
biases1 = tf.Variable(tf.zeros(n_hidden1), name="biases1")
biases2 = tf.Variable(tf.zeros(n_hidden2), name="biases2")
biases3 = tf.Variable(tf.zeros(n_hidden3), name="biases3")
biases4 = tf.Variable(tf.zeros(n_outputs), name="biases4")
hidden1 = activation(tf.matmul(X, weights1) + biases1)
hidden2 = activation(tf.matmul(hidden1, weights2) + biases2)
hidden3 = activation(tf.matmul(hidden2, weights3) + biases3)
outputs = tf.matmul(hidden3, weights4) + biases4

2.一次只训练一个编码器：

optimizer = tf.train.AdamOptimizer(learning_rate)
with tf.name_scope("phase1"):
    phase1_outputs = tf.matmul(hidden1, weights4) + biases4
    phase1_reconstruction_loss = tf.reduce_mean(tf.square(phase1_outputs - X))
    phase1_reg_loss = regularizer(weights1) + regularizer(weights4)
    phase1_loss = phase1_reconstruction_loss + phase1_reg_loss
    phase1_training_op = optimizer.minimize(phase1_loss)
with tf.name_scope("phase2"):
    phase2_reconstruction_loss = tf.reduce_mean(tf.square(hidden3 - hidden1))
    phase2_reg_loss = regularizer(weights2) + regularizer(weights3)
    phase2_loss = phase2_reconstruction_loss + phase2_reg_loss
    train_vars = [weights2, biases2, weights3, biases3]
    phase2_training_op = optimizer.minimize(phase2_loss)

2.去噪自动编码器：

为输入数据添加噪音，训练网络以恢复原始的无噪音输入。这种方法阻止了自动编码器简单地复制其输入到输出，最终必须找到数据中的模式。
在这里插入图片描述
噪音可以是添加到输入中的纯高斯噪音，或者是随机打断输入的噪音，如dropout；

TF实现

添加高斯噪音：

X = tf.placeholder(tf.float32, shape=[None, n_inputs])
X_noisy = X + tf.random_normal(tf.shape(X))
[...]
hidden1 = activation(tf.matmul(X_noisy, weights1) + biases1)
[...]
reconstruction_loss = tf.reduce_mean(tf.square(outputs - X)) # MSE
[...]

随机打断：

from tensorflow.contrib.layers import dropout
keep_prob = 0.7
is_training = tf.placeholder_with_default(False, shape=(), name='is_training')
X = tf.placeholder(tf.float32, shape=[None, n_inputs])
X_drop = dropout(X, keep_prob, is_training=is_training)
[...]
hidden1 = activation(tf.matmul(X_drop, weights1) + biases1)
[...]
reconstruction_loss = tf.reduce_mean(tf.square(outputs - X)) # MSE
[...]

学习率0.004， dropout=0.7时输入图像和重建图像如下：
在这里插入图片描述

4.稀疏自动编码器：

通过在成本函数中增加适当的条件，推动自动编码器减小编码层中活动神经元的数量。：这会迫使自动编码器使用少量的激活神经元的组合来表示输入；
操作流程：
1.计算之前，首先要测量每个训练迭代编码层的实际稀疏度；（通过计算整个训练批次中编码层每个神经元的平均激活程度来实现）
2.得到稀疏性之后为成本函数添加稀疏性损失来消弱那些没有用但却激活了的神经元；（通过Kullback-Leibler(KL)散度计算来削弱程度）
Kullback-Leibler散度计算公式: 计算两个离散概率分布P和Q之间的KL散度
在这里插入图片描述
适应我们实际需求：计算神经元在编码层被激活的目标概率P和实际概率q之间的差距，上述公式可简化如下：

当我计算出编码层的每个神经元的稀疏度损失后，将其累加结果添加到成本函数中。来实现一个稀疏自动编码器（削弱那些没有太大用处的神经元，只用很少一部分（或者说稀疏：相对于原来神经元数量比较少的一部分神经元）进行计算）。

TF实现

import numpy as np
import tensorflow as tf
from tensorflow.contrib.layers import fully_connected
from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt


mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
n_inputs = 28 * 28 # for MNIST
n_hidden1 = 500
n_hidden2 = 200
n_hidden3 = 50 # codings
n_hidden4 = n_hidden2
n_hidden5 = n_hidden1
n_outputs = n_inputs


def kl_divergence(p, q):
    return p * tf.log(p / q) + (1 - p) * tf.log((1 - p) / (1 - q))


learning_rate = 0.01
sparsity_target = 0.1
sparsity_weight = 0.2
learning_rate = 0.004
with tf.contrib.framework.arg_scope([fully_connected], activation_fn=tf.nn.elu,
                                    weights_initializer=tf.contrib.layers.variance_scaling_initializer()):
    X = tf.placeholder(tf.float32, [None, n_inputs])
    hidden1 = fully_connected(X, n_hidden1)
    hidden2 = fully_connected(hidden1, n_hidden2)
    hidden3_mean = fully_connected(hidden2, n_hidden3, activation_fn=None)
    hidden3_gamma = fully_connected(hidden2, n_hidden3, activation_fn=None)
    hidden3_sigma = tf.exp(0.5 * hidden3_gamma)
    noise = tf.random_normal(tf.shape(hidden3_sigma), dtype=tf.float32)
    hidden3 = hidden3_mean + hidden3_sigma * noise
    hidden4 = fully_connected(hidden3, n_hidden4)
    hidden5 = fully_connected(hidden4, n_hidden5)
    logits = fully_connected(hidden5, n_outputs, activation_fn=None)
    outputs = tf.sigmoid(logits)
reconstruction_loss = tf.reduce_sum(
    tf.nn.sigmoid_cross_entropy_with_logits(labels=X, logits=logits))
latent_loss = 0.5 * tf.reduce_sum(
    tf.exp(hidden3_gamma) + tf.square(hidden3_mean) - 1 - hidden3_gamma)
cost = reconstruction_loss + latent_loss
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(cost)
init = tf.global_variables_initializer()

n_test_digits = 2
n_digits = 2
n_epochs = 50
batch_size = 150
n_test_digits = 3
X_test = mnist.test.images[:n_test_digits]

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        n_batches = mnist.train.num_examples // batch_size
    for iteration in range(n_batches):
        X_batch, y_batch = mnist.train.next_batch(batch_size)
        sess.run(training_op, feed_dict={X: X_batch})

    coding_rnd = np.random.normal(size=[n_digits, n_hidden3])
    outputs_val = outputs.eval(feed_dict={hidden3: coding_rnd})
    outputs_val1 = outputs.eval(feed_dict={X: X_test})


def plot_image(image, shape=[28, 28]):
    plt.imshow(image.reshape(shape), cmap="Greys", interpolation="nearest")
    plt.axis("off")


for digit_index in range(n_test_digits):
    plt.subplot(n_test_digits, 2, digit_index * 2 + 1)
    plot_image(X_test[digit_index])
    plt.subplot(n_test_digits, 2, digit_index * 2 + 2)
    plot_image(outputs_val1[digit_index])
    plt.show()

for iteration in range(n_digits):
    plot_image(outputs_val[iteration])
    plt.show()

对图片重建效果如下（这个效果也太好了吧）：
在这里插入图片描述
生成图片效果如下（看起来仿佛是2和8）：

大家可以多试试调整神经元个数，学习率，或者增加网络层来测试这个网络的性能。

5.变分自动编码器：

变分自动编码器是一种目前广为流行的自动编码器：他是一种概率自动编码器（输出的结果在一定程度上取决于运气），同时也是一个生成自动编码器（可以用来生成与输入实例相似的新实例）。
在这里插入图片描述

TF实现

import numpy as np
import tensorflow as tf
from tensorflow.contrib.layers import fully_connected
from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt

mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
n_inputs = 28 * 28 # for MNIST
n_hidden1 = 500
n_hidden2 = 500
n_hidden3 = 20 # codings
n_hidden4 = n_hidden2
n_hidden5 = n_hidden1
n_outputs = n_inputs
learning_rate = 0.004
with tf.contrib.framework.arg_scope([fully_connected], activation_fn=tf.nn.elu,
                                    weights_initializer=tf.contrib.layers.variance_scaling_initializer()):
    X = tf.placeholder(tf.float32, [None, n_inputs])
    hidden1 = fully_connected(X, n_hidden1)
    hidden2 = fully_connected(hidden1, n_hidden2)
    hidden3_mean = fully_connected(hidden2, n_hidden3, activation_fn=None)
    hidden3_gamma = fully_connected(hidden2, n_hidden3, activation_fn=None)
    hidden3_sigma = tf.exp(0.5 * hidden3_gamma)
    noise = tf.random_normal(tf.shape(hidden3_sigma), dtype=tf.float32)
    hidden3 = hidden3_mean + hidden3_sigma * noise
    hidden4 = fully_connected(hidden3, n_hidden4)
    hidden5 = fully_connected(hidden4, n_hidden5)
    logits = fully_connected(hidden5, n_outputs, activation_fn=None)
    outputs = tf.sigmoid(logits)
reconstruction_loss = tf.reduce_sum(
    tf.nn.sigmoid_cross_entropy_with_logits(labels=X, logits=logits))
latent_loss = 0.5 * tf.reduce_sum(
    tf.exp(hidden3_gamma) + tf.square(hidden3_mean) - 1 - hidden3_gamma)
cost = reconstruction_loss + latent_loss
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(cost)
init = tf.global_variables_initializer()

n_digits = 10
n_epochs = 50
batch_size = 150
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        n_batches = mnist.train.num_examples // batch_size
    for iteration in range(n_batches):
        X_batch, y_batch = mnist.train.next_batch(batch_size)
        sess.run(training_op, feed_dict={X: X_batch})
    coding_rnd = np.random.normal(size=[n_digits, n_hidden3])
    outputs_val = outputs.eval(feed_dict={hidden3: coding_rnd})

def plot_image(image, shape=[28, 28]):
    plt.imshow(image.reshape(shape), cmap="Greys", interpolation="nearest")
    plt.axis("off")

for iteration in range(n_digits):
    plot_image(outputs_val[iteration])
    plt.show()

生成图片如下所示（有的很像，有的很不像）：
在这里插入图片描述

6.其他自动编码器

收缩自动编码器（CAE）
在训练期间加入限制，使得输入编码的衍生物较小；（相似的输入会得到相似的编码）
栈式卷积自动编码器
通过卷积层重构图像来学习提取视觉特征
随机生成网络（GSN）
去噪自动编码器的推广，增加了生成数据的能力
获胜者（WTA）
训练期间，在计算了编码层所有神经元的激活度之后，只保留训练批次中前k%的激活度，其余都置为0。自然地，这将导致稀疏编码。此外，类似的WTA方法可以用于产生稀疏卷积自动编码器。
对抗自动编码器
一个网络被训练来重现输入，同时另一个网络被训练来找到不能正确重建第一个网络的输入。这促使第一个自动编码器学习鲁棒编码。