Denoising Autoencoder

最新推荐文章于 2024-05-28 07:30:00 发布

coderpai

最新推荐文章于 2024-05-28 07:30:00 发布

阅读量1.8k

点赞数

分类专栏：人工智能文章标签：人工智能

本文链接：https://blog.csdn.net/CoderPai/article/details/78941447

版权

人工智能专栏收录该内容

197 篇文章 7 订阅

订阅专栏

作者：chen_h
微信号 & QQ：862251340
微信公众号：coderpai
简书地址：https://www.jianshu.com/p/f7b9f10b7ac4

降噪自编码器（DAE）是另一种自编码器的变种。强烈推荐 Pascal Vincent 的论文，该论文很详细的描述了该模型。降噪自编码器认为，设计一个能够恢复原始信号的自编码器未必是最好的，而能够对 “被污染/破坏” 的原始数据进行编码、解码，然后还能恢复真正的原始数据，这样的特征才是好的。

从数学上来讲，假设原始数据 x 被我们“故意破坏”了，比如加入高斯噪声，或者把某些维度数据抹掉，变成 x'，然后在对 x' 进行编码、解码，得到回复信号 xx = g(f(x')) 。该恢复信号尽可能的逼近未被污染的原数据 x 。此时，监督训练的误差函数就从原来的 L(x, g(f(x))) 变成了 L(x, g(f(x')))。

从直观上理解，降噪自编码器希望学到的特征尽可能鲁棒，能够在一定程度上对抗原始数据的污染、缺失等情况。Vincent 论文里也对 DAE 提出了基于流行学习的解释，并且在图像数据上进行测试，发现 DAE 能够学出类似 Gabor 边缘提取的特征变换。

DAE 的系统结构如下图所示：

现在使用比较多的噪声主要是 mask noise，即原始数据中部分数据缺失，这是有着强烈的实际意义的，比如图像部分像素被遮挡、文本因记录原因漏掉一些单词等等。

实现代码如下：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import tensorflow as tf 
import numpy as np 
import input_data

N_INPUT = 784
N_HIDDEN = 100
N_OUTPUT = N_INPUT
corruption_level = 0.3
epoches = 1000

def main(_):

    w_init = np.sqrt(6. / (N_INPUT + N_HIDDEN))
    weights = {
        "hidden": tf.Variable(tf.random_uniform([N_INPUT, N_HIDDEN], minval = -w_init, maxval = w_init)),
        "out": tf.Variable(tf.random_uniform([N_HIDDEN, N_OUTPUT], minval = -w_init, maxval = w_init))
    }

    bias = {
        "hidden": tf.Variable(tf.random_uniform([N_HIDDEN], minval = -w_init, maxval = w_init)),
        "out": tf.Variable(tf.random_uniform([N_OUTPUT], minval = -w_init, maxval = w_init))
    }

    with tf.name_scope("input"):
        # input data
        x = tf.placeholder("float", [None, N_INPUT])
        mask = tf.placeholder("float", [None, N_INPUT])

    with tf.name_scope("input_layer"):
        # from input data to input layer
        input_layer = tf.mul(x, mask)

    with tf.name_scope("hidden_layer"):
        # from input layer to hidden layer
        hidden_layer = tf.sigmoid(tf.add(tf.matmul(input_layer, weights["hidden"]), bias["hidden"]))

    with tf.name_scope("output_layer"):
        # from hidden layer to output layer
        output_layer = tf.sigmoid(tf.add(tf.matmul(hidden_layer, weights["out"]), bias["out"]))

    with tf.name_scope("cost"):
        # cost function
        cost = tf.reduce_sum(tf.pow(tf.sub(output_layer, x), 2))

    optimizer = tf.train.AdamOptimizer().minimize(cost)

    # load MNIST data
    mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
    trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels

    with tf.Session() as sess:

        init = tf.initialize_all_variables()
        sess.run(init)

        for i in range(epoches):
            for start, end in zip(range(0, len(trX), 100), range(100, len(trX), 100)):
                input_ = trX[start:end]
                mask_np = np.random.binomial(1, 1 - corruption_level, input_.shape)
                sess.run(optimizer, feed_dict={x: input_, mask: mask_np})

            mask_np = np.random.binomial(1, 1 - corruption_level, teX.shape)
            print i, sess.run(cost, feed_dict={x: teX, mask: mask_np})

if __name__ == "__main__":
    tf.app.run()

Reference:

《Extracting and Composing Robust Features with Denoising
Autoencoders》

《Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion》

作者：chen_h
微信号 & QQ：862251340
简书地址：https://www.jianshu.com/p/f7b9f10b7ac4

CoderPai 是一个专注于算法实战的平台，从基础的算法到人工智能算法都有设计。如果你对算法实战感兴趣，请快快关注我们吧。加入AI实战微信群，AI实战QQ群，ACM算法微信群，ACM算法QQ群。长按或者扫描如下二维码，关注 “CoderPai” 微信号（coderpai）
这里写图片描述

这里写图片描述

coderpai

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Denoising Autoencoder

作者：chen_h 微信号 & QQ：862251340 微信公众号：coderpai 简书地址：https://www.jianshu.com/p/f7b9f10b7ac4自编码器 Autoencoder稀疏自编码器 Sparse Autoencoder降噪自编码器 Denoising Autoencoder堆叠自编码器 Stacked Autoencoder降噪自编码器（DAE）
复制链接

扫一扫