Theano-Deep Learning Tutorials 笔记:Denoising Autoencoders (dA)

最新推荐文章于 2024-01-05 17:42:36 发布

slim1017

最新推荐文章于 2024-01-05 17:42:36 发布

阅读量1.5k

点赞数 1

分类专栏：深度学习 Theano-Deep Learning Tutorials 笔记文章标签： DenoisingAutoencoder Deep Learning Theano

本文链接：https://blog.csdn.net/u012816943/article/details/50503976

版权

本文介绍了降噪自编码器（Denoising Autoencoders）的概念，它是通过在输入数据中引入随机噪声，迫使网络学习数据的内在结构而非简单的复制输入。自编码器在数据压缩和特征学习方面有应用，其与PCA的关系在于，线性情况下的自编码器等同于PCA，但非线性时则能捕获更多数据模式。文章还探讨了稀疏自编码器和随机性在防止自编码器学习恒等函数中的作用，并提供了Theano实现的简单教程链接。

摘要由CSDN通过智能技术生成

教程地址：http://www.deeplearning.net/tutorial/dA.html#autoencoders

降噪的自编码器由[Vincent08]提出，首先先介绍自编码器。

Autoencoders

See section 4.6 of [Bengio09] for an overview of auto-encoders.

UFLDL中自编码介绍http://deeplearning.stanford.edu/wiki/index.php/%E8%87%AA%E7%BC%96%E7%A0%81%E7%AE%97%E6%B3%95%E4%B8%8E%E7%A8%80%E7%96%8F%E6%80%A7

自编码器输入为： $\mathbf{x} \in [0,1]^d$ ，通过 $\mathbf{y} = s(\mathbf{W}\mathbf{x} + \mathbf{b})$ 映射到隐层，隐层为 $\mathbf{y} \in [0,1]^{d'}$ ， $s$ 是非线性激活函数，如sigmoid。这步是encoder。

通过 $\mathbf{z} = s(\mathbf{W'}\mathbf{y} + \mathbf{b'})$ 来重构 x，使 z 尽量与输入 x 一样。这是decoder。

自编码器的输入假设为100维，隐层神经元为50维，输出层（重构）100维，50少于100维，相当于用更少的维数表示更高维的数据，这就可以看成一种数据压缩。

注意： $\mathbf{W'}$ 这里不是W的转置，而是隐层到第3层的权重。

下面介绍下tied weights：当我们设置 $\mathbf{W'} = \mathbf{W}^T$ 时，就叫tied weights，这时3层的自编码器参数就只有 $\mathbf{W}$ , $\mathbf{b}$ , $\mathbf{b'}$ ；如果不设置tied weights， $\mathbf{W'}$ 和 W 就没关系，这是参数就是 $\mathbf{W}$ , $\mathbf{W'}$ , $\mathbf{b}$ , $\mathbf{b'}$ 。

有不有 tied weights 对结果影响并不是很大，这方面大家可以自己研究研究，我就觉得 tied weights 比较省内存点。

重构误差的度量方法的选取取决于输入服从的具体分布，这里使用 squared error $L(\mathbf{x} \mathbf{z}) = || \mathbf{x} - \mathbf{z} ||^2$

如果输入可以表示为 bit vectors or vectors of bit probabilities，可以使用 cross-entropy：

$L_{H} (\mathbf{x}, \mathbf{z}) = - \sum^d_{k=1}[\mathbf{x}_k \log \mathbf{z}_k + (1 - \mathbf{x}_k)\log(1 - \mathbf{z}_k)]$

下面这段描述了自编码和PCA的联系：（当激活函数为线性且使用MSE度量误差时，自编码的隐层就表示了PCA中数据主要成分；若为非线性时，表达能力强于PCA）

The hope is that the code $\mathbf{y}$ is a distributed representation that captures the coordinates along the main factors of variation in the data. This is similar to the way the projection on principal components would capture the main factors of variation in the data. Indeed, if there is one linear hidden layer (the code) and the mean squared error criterion is used to train the network, then the $k$ hidden units learn to

最低0.47元/天解锁文章

slim1017

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Theano-Deep Learning Tutorials 笔记:Denoising Autoencoders (dA)

python 实现降噪自编码，讨论了自编码与PCA的联系与区别，是一个降维的好方法，之后也可以逐层学习得到栈式自编码。
复制链接

扫一扫

专栏目录