sparse coding VS autoencoder

本文探讨了稀疏编码与自编码器两种无监督学习方法的原理与应用。通过对比两者的目标函数与求解过程,揭示了它们之间的相似性和区别。稀疏编码通过优化寻找最佳表示,而自编码器则利用神经网络进行高效建模。

Finding the differences can be done by looking at the models. Let's look at sparse coding first.

Sparse coding

Sparse coding minimizes the objective

Lsc=||WHX||22reconstruction term+λ||H||1sparsity termLsc=||WH−X||22⏟reconstruction term+λ||H||1⏟sparsity term
where WW is a matrix of bases, H is a matrix of codes and XX is a matrix of the data we wish to represent. λλ implements a trade of between sparsity and reconstruction. Note that if we are given HH, estimation of WW is easy via least squares.

In the beginning, we do not have HH however. Yet, many algorithms exist that can solve the objective above with respect to HH. Actually, this is how we do inference: we need to solve an optimisation problem if we want to know the hh belonging to an unseen xx.

Auto encoders

Auto encoders are a family of unsupervised neural networks. There are quite a lot of them, e.g. deep auto encoders or those having different regularisation tricks attached--e.g. denoising, contractive, sparse. There even exist probabilistic ones, such as generative stochastic networks or the variational auto encoder. Their most abstract form is

D(d(e(x;θr);θd),x)D(d(e(x;θr);θd),x)
but we will go along with a much simpler one for now:
Lae=||Wσ(WTX)X||2Lae=||Wσ(WTX)−X||2
where σσ is a nonlinear function such as the logistic sigmoid σ(x)=11+exp(x)σ(x)=11+exp⁡(−x).

Similarities

Note that LscLsc looks almost like LaeLae once we set H=σ(WTX)H=σ(WTX). The difference of both is that i) auto encoders do not encourage sparsity in their general form ii) an autoencoder uses a model for finding the codes, while sparse coding does so by means of optimisation.

For natural image data, regularized auto encoders and sparse coding tend to yield very similar WW. However, auto encoders are much more efficient and are easily generalized to much more complicated models. E.g. the decoder can be highly nonlinear, e.g. a deep neural network. Furthermore, one is not tied to the squared loss (on which the estimation of WW for LscLsc depends.)

Also, the different methods of regularisation yield representations with different characteristica. Denoising auto encoders have also been shown to be equivalent to a certain form of RBMs etc.

But why?

If you want to solve a prediction problem, you will not need auto encoders unless you have only little labeled data and a lot of unlabeled data. Then you will generally be better of to train a deep auto encoder and put a linear SVM on top instead of training a deep neural net.

However, they are very powerful models for capturing characteristica of distributions. This is vague, but research turning this into hard statistical facts is currently conducted. Deep latent Gaussian models aka Variational Auto encoders or generative stochastic networks are pretty interesting ways of obtaining auto encoders which provably estimate the underlying data distribution.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值