1. 玻尔兹曼分布:
p(E)∼e−E/kT
2. RBM
两层:隐层和可视层,
v
,
h
vi∈{0,1}, hj∈{0,1}
能量假设:
E(v,h;θ)=−b⋅v−c⋅h−vTWhθ={b, c, W}概率分布:
p(v,h;θ)=1Ze−E(v,h; θ)Z(θ)=∑v,he−E(v,h;θ)条件概率:
p(v|h; θ)=e−E(v,h)∑ve−E(v,h)p(h|v; θ)=e−E(v,h)∑he−E(v,h)p(vi=1 | h;θ)=σ(bi+∑jWijhj)p(hj=1 | v;θ)=σ(cj+∑iWijvi)全概率:
p(v)=∑hp(v,h)=∑he−E(v,h)∑v,he−E(v,h)
3. 优化
极大化似然函数:
(θ | v)=lnp(v; θ)=ln∑he−E(v,h)−ln∑v,he−E(v,h)梯度:
∂L∂θ=Ep(h|v)[−∂E(v,h)∂θ]−Ep(v,h)[−∂E(v,h)∂θ]∂E(v,h)∂Wij=−vihj,∂E(v,h)∂bi=−vi,∂E(v,h)∂cj=−hj
4. 其他能量模型
1) Gaussian-Bernoulli RBM:
能量定义:
E(v,h;θ)=∑i(vi−bi)22σ2i−∑jcjhj−∑ijWijviσihjθ={b, σ, c, W}条件概率:
p(vi=x | h; θ)=(bi+σi∑jWijhj, σi)p(hj=1 | v; θ)=σ(cj+∑iWijviσi)
2) extended energy
- 能量定义
E(v, y, h)=−∑bivi−∑cjhj−∑Wijvihj−∑dkyk−∑Ujkhjykθ={b, c, W, d, U} - 条件概率
p(vi=1|h)=σ(bi+∑jWijhj)p(hj=1|x, y)=σ(cj+∑iWijxi+∑kUjkyk)p(yk=1|h)=exp(dk+∑jUjkhj)∑kexp(dk+∑kUjkhj)
5. 附录
1. 玻尔兹曼分布的最大熵推导
封闭系统能量守恒,总能量
。共有
N
个状态,每个状态
则有约束条件:
∑ipi=1∑ipiEi=/N≡E¯
最大化信息熵:
H[p]=−∑ipilnpi
等效于最大化下面的拉格朗日量:
[p]=H[p]+α(1−∑ipi)+β(E¯−∑ipiEi)
即得能量的概率分布:
p(Ei)∝e−βEi
2. RBM 条件概率推导
p(vi=1|h)=∑vk≠ip(vi=1,vk,h)∑vp(v,h)=∑vk≠iexp[(bivi+∑jWijvibj)vi=1+∑k≠ibkvk+∑jcjhj+∑k≠i,jWkjvkhj]∑vi,vk≠iexp[(bivi+∑jWijvibj)+∑k≠ibkvk+∑jcjhj+∑k≠i,jWkjvkhj]]=exp[(bivi+∑jWijvibj)vi=1]⋅∑vk≠iexp[∑k≠ibkvk+∑jcjhj+∑k≠i,jWkjvkhj]∑viexp[(bivi+∑jWijvibj)]⋅∑vk≠iexp[∑k≠ibkvk+∑jcjhj+∑k≠i,jWkjvkhj]=exp[(bivi+∑jWijvibj)vi=1]∑viexp[(bivi+∑jWijvibj)]=11+exp[−bi−∑jWijbj].(vi∈{0,1})