RBM Formula Deduction

最新推荐文章于 2019-03-27 11:13:06 发布

dengtiaolu0407

最新推荐文章于 2019-03-27 11:13:06 发布

阅读量79

点赞数

原文链接：http://www.cnblogs.com/ZJUT-jiangnan/p/5466814.html

版权

Energy based Model

the probability distribution (softmax function):

\[p(x)=\frac{\exp(-E(x))}{\sum\limits_x{\exp(-E(x))}}\]

when there are hidden units,

\[P(x)=\sum\limits_h{P(x,h)}=\frac{1}{\sum_x\exp(-E(x))}\sum\limits_h{\exp(-E(x,h))}\]

now, we define the free energy function:

\[F(x)=-\log \sum\limits_h \exp(-E(x,h))\]

so that,

\[\sum\limits_h \exp(-E(x,h))=-\exp( F(x))\]

now, we rewrite the probability distribution for simpilification:

\[P(x)=\frac{\exp(-F(x))}{\sum_x{\exp(-F(x))}}\]

then, we define the overall cost function:

\[\mathcal{L}(\theta,D)=-\frac{1}{N}\sum\limits_{x^{(i)} \in D}{\log p(x^{(i)})}\]

we firstly calculate the parcial gradient of $\log p(x)$ with respect to $\theta$:

\[-\log P(x)=F(x) + \log\left(\sum\limits_x{\exp(-F(x))}\right)\]

\[-\frac{\partial \log P(x)}{\partial \theta}=\frac{\partial F(x)}{\partial \theta}-\sum\limits_{\hat x}{p(\hat x)\frac{\partial F(\hat x)}{\partial \theta}}\]

note that, the gradient contains two terms, which is called the positive phase and the negative phase. The first term increase the probability of training data, and the second term decrease the probability of samples generated by the model.

It's difficult to determine this gradient analytically, as we can't calculate $E_P[\frac{\partial F(x)}{\partial \theta}]$. So we might estimate the expectation using sample method.

we would like elements $\tilde x$ of $\mathcal{N}$ to be sampled according to $P(\tilde x)$, where $\mathcal{N}$ is called negative particles.

Given that, the gradient can then be written as:

\[ - \frac{\partial \log p(x)}{\partial \theta}\approx \frac{\partial F(x)}{\partial \theta} - \frac{1}{|\mathcal{N}|} \sum\limits_{\tilde x \in \mathcal{N}}\frac{\partial F(\tilde x)}{\partial \theta}\]

RBM

_images/rbm.png

the energy function $E(v,h)$ of RBM is defined as :

\[E(v,h)=-b'v-c'h-h'Wv\]

where

$W$ represents the weights connecting hidden and visble units.
$b,c$ are bias terms of visible and hidden layers respectively.

转载于:https://www.cnblogs.com/ZJUT-jiangnan/p/5466814.html

dengtiaolu0407

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
RBM Formula Deduction

Energy based Modelthe probability distribution (softmax function):\[p(x)=\frac{\exp(-E(x))}{\sum\limits_x{\exp(-E(x))}}\]when there are hidden units,\[P(x)=\sum\limits_h{P(x,h)}=\frac{1}{...
复制链接

扫一扫