前言

Restricted Boltzmann Machines (RBM)

Contrastive divergence multi-layer RBMs

理论

基于能量的模型EBM

p(x)=eE(x)Z

Z=xeE(x)

L(θ,D)=1Nx(i)Dlogp(x(i))loss=L(θ,D)

具有隐层的能量模型

p(x)=hp(x,h)=heE(x,h)Z

F(x)=logh(eE(x,h))

p(x)=eF(x)Z

logp(x)θ=(logeF(x)logZ)θ=(F(x)+logZ)θ=F(x)θeF(x^)ZF(x^)θ=F(x)θx^p(x^)F(x^)θ

logp(x)θF(x)θ1x^F(x^)θ

玻尔兹曼机

P(X=x,H=h)=eE(x,h)Z

Z=x,heE(x,h)

E(z)=E(x,h)=ibiziijziwijzjs.t.zi{0,1}

P(X=x)=hP(X=x,H=h)=heE(x,h)Z

logP(x)θ=logheE(x,h)Zθ=(logheE(x,h)+logZ)θ=logheE(x,h)θ+logZθ

logheE(x,h)θ=(1heE(x,h)heE(x,h)θ)=(1heE(x,h)h(eE(x,h)E(x,h)θ))=h(eE(x,h)heE(x,h)E(x,h)θ)=h(P(x,h)P(x)E(x,h)θ)=h(P(h|x)E(x,h)θ)

logZθ=1ZZθ=1Zx,heE(x,h)θ=1Zx,h(eE(x,h)E(x,h)θ)=x,h(eE(x,h)ZE(x,h)θ)=x,h(p(x,h)E(x,h)θ)

logP(x)θ=h(P(h|x)E(x,h)θ)x,h(p(x,h)E(x,h)θ)

限制玻尔兹曼机

能量相关内容

• 能量函数:bjhjiviWijhj$-b_j\cdot h_j-\sum_{i} v_i W_{ij}h_j$
• P(hj=1|v)=ebj+iWijvi1+ebj+iWijvi=sigmoid(bj+iWijvi)$P(h_j=1|v)=\frac{e^{b_j+\sum_i W_ij v_i}}{1+e^{b_j+\sum_i W_{ij} v_i}}=sigmoid\left(b_j+\sum_i W_{ij}v_i\right)$

• 能量函数: a2jh2jbjhjiviWijhj$a_j^2h_j^2-b_jh_j-\sum_i v_i W_{ij}h_j$
• P(hj|v)=1Zea2jh2j+bjhj+iviWijhj=1Ze(bjμ)22σ2$P(h_j|v)=\frac{1}{Z}e^{-a_j^2h_j^2+b_jh_j+\sum_i v_iW_{ij}h_j}=\frac{1}{Z}e^{-\frac{(b_j-\mu)^2}{2\sigma^2}}$
P(hj|v)=N(hj;μ,σ2) with σ2=12a2j,μ=bj+iviWij2a2j$P(h_j|v)=N(h_j;\mu,\sigma^2) \ with \ \sigma^2=\frac{1}{2a_j^2},\mu=\frac{b_j+\sum_iv_iW_{ij}}{2a_j^2}$

a$a$特别小的时候, 会容易崩掉(blow up), 因而经常使用a+ϵ$a+\epsilon$代替a$a$

• 能量函数: bjhjiviWijhj$-b_jh_j-\sum_i v_iW_{ij}h_j$
• P(hj=1|v)=ebj+iWijvijebj+iWijvi=softmax(bj+iWijvi)$P(h_j=1|v)=\frac{e^{b_j+\sum_i W_{ij}v_i}}{\sum_{j'} e^{b_{j'}+\sum_i W_{i{j'}}v_i}}=softmax(b_j+\sum_i W_{ij}v_i)$

P(x,y)P(y|x)=eE(x,y)Z=eE(x,y)yeE(x,y)

logP(x)θ=logyP(x,y)θ=logyeE(x,y)Zθ=ZyeE(x,y)yeE(x,y)Zθ=ZyeE(x,y)y(eE(x,y)E(x,y)θZ)yeE(x,y)ZθZ2=y(eE(x,y)yeE(x,y)E(x,y)θ)+1ZZθ=yP(y|x)E(x,y)θxyP(x,y)E(x,y)θ=E[E(x,y)θx]E[E(x,y)θ]

• 权重更新
• 正向阶段贡献: P(y0j=1|x0)×1×x0i+(1P(y0j=1|x0))×0×x0i=P(y0j=1|x0)x0i$P(y_j^0=1|x^0)\times 1\times x^0_i+(1-P(y_j^0=1|x^0))\times 0\times x_i^0=P(y_j^0=1|x^0)x_i^0$
• 反向阶段共享:P(y1j=1|x1)×1×x1i+(1P(y1j=1|x1))×0×x1i=P(y1j=1|x1i)x1i$P(y_j^1=1|x^1)\times 1\times x_i^1+(1-P(y_j^1=1|x^1))\times 0\times x_i^1=P(y_j^1=1|x_i^1)x_i^1$
• 偏置更新
• 正向阶段贡献:P(y0i=1|x0)$P(y_i^0=1|x^0)$
• 反向阶段贡献:P(y1i=1|x1)$P(y_i^1=1|x^1)$

• 权重和偏置更新同上
• 参数aj$a_j$:
• 正向阶段贡献:2ai(x0i)2$2a_i(x_i^0)^2$
• 反向阶段贡献:2ai(x1i)2$2a_i(x_i^1)^2$

• 与二值情况相同, 只不过P(yi=1|x)$P(y_i=1|x)$使用的是softmax而非sigmoid

理论梳理

• 可见层V={v0,v1,,vj,}$V=\{v_0,v_1,\cdots,v_j,\cdots\}$,偏置b={b1,b2,,bj,}$b=\{b_1,b_2,\cdots,b_j,\cdots\}$
• 隐藏层H={h1,h2,,hi,}$H=\{h_1,h_2,\cdots,h_i,\cdots\}$, 偏置c=c1,c2,,ci,$c={c_1,c_2,\cdots,c_i,\cdots}$
• 连接权重为W={w11,w12,,wji,}$W=\{w_{11},w_{12},\cdots,w_{ji},\cdots\}$

E(v,h)=bvchhWv

F(v)=loghieE(v,h)=bviloghiehi(ci+wiv)

P(hi=1|v)=sigmoid(ci+Wiv)P(vj=1|h)=sigmoid(bj+Wjh)

F(v)=bviloghiehi(ci+wiv)=bvilog{e0(ci+wiv)+e1(ci+wiv)}=bvilog(1+e(ci+wiv))

logP(v)WijlogP(v)bjlogP(v)ci=Ev[p(hi|v)vj]v(i)jsigmoid(Wiv(i)+ci)=Ev[p(vj|h)v(i)j]=Ev[p(hi|v)]sigmoid(Wiv(i))

判别式玻尔兹曼机

• 输入单元为P,索引j$j$
• 隐单元为L$L$,索引i$i$,偏置为C$C$
• 标签单元为Y$Y$,索引k$k$, 偏置为B$B$
• 输入单元和隐单元的连接权重为V$V$
• 输出单元和隐单元的连接权重为W$W$

active(Yk)=Bk+isoftplus(Wki+Ci+j(Vijp(Pj))

P(Yinputs)=softmax(active(Y))

• 计算cost=logP(Y=onehot(k)inputs)$cost=-\log P(Y=onehot(k)\mid inputs)$

• X=[Y,P]$X=[Y,P]$当做一个大的层
• x=[onehot(k),p(P)]$x=[onehot(k),p(P)]$作为隐层的输入
• (X,L)$(X,L)$当做RBM训练

代码

训练

【注】此代码是为了更加方便理解而对官方教程代码进行了理解后的修改, 建议看完以后立即去看标准的官方教程

import numpy as np
import theano
import theano.tensor as T
from theano.tensor.shared_randomstreams import RandomStreams
import cPickle,gzip
from PIL import Image
import pylab
import os

#定义读数据的函数,把数据丢入到共享区域
data_dir,data_file=os.path.split(dataset)
if os.path.isfile(dataset):
with gzip.open(dataset,'rb') as f:
#共享数据集
def shared_dataset(data_xy,borrow=True):
data_x,data_y=data_xy
shared_x=theano.shared(np.asarray(data_x,dtype=theano.config.floatX),borrow=borrow)
shared_y=theano.shared(np.asarray(data_y,dtype=theano.config.floatX),borrow=borrow)
return shared_x,T.cast(shared_y,'int32')
#定义三个元组分别存储训练集,验证集,测试集
train_set_x,train_set_y=shared_dataset(train_set)
valid_set_x,valid_set_y=shared_dataset(valid_set)
test_set_x,test_set_y=shared_dataset(test_set)
rval=[(train_set_x,train_set_y),(valid_set_x,valid_set_y),(test_set_x,test_set_y)]
return rval

• 首先是初始化该有的权重和偏置,一个权重两个偏置, numpy的随机函数用于初始化权重和偏置, theano的随机函数用于将神经元按照概率激活为二值单元


#定义RBM

class RBM(object):
def __init__(self,
rng=None,
trng=None,
input=None,
n_visible=784,
n_hidden=500,
W=None,
hbias=None,
vbias=None):
self.n_visible=n_visible
self.n_hidden=n_hidden

if rng is None:
rng=np.random.RandomState(1234)
if trng is None:
trng=RandomStreams(rng.randint(2**30))
#初始化权重和偏置
if W is None:
initW=np.asarray(rng.uniform(low=-4*np.sqrt(6./(n_hidden+n_visible)),
high=4*np.sqrt(6./(n_hidden+n_visible)),
size=(n_visible,n_hidden)),
dtype=theano.config.floatX)
W=theano.shared(value=initW,name='W',borrow=True)
if hbias is None:
inithbias=np.zeros(n_hidden,dtype=theano.config.floatX)
hbias=theano.shared(value=inithbias,name='hbias',borrow=True)
if vbias is None:
initvbias=np.zeros(n_visible,dtype=theano.config.floatX)
vbias=theano.shared(value=initvbias,name='vbias',borrow=True)
self.input=input
if not input:
self.input=T.matrix('input')
self.W=W
self.hbias=hbias
self.vbias=vbias
self.trng=trng
self.params=[self.W,self.hbias,self.vbias]
• 为了进行吉布斯采样, 我们需要定义positive phasenegative phase
前向计算过程: 可见层->隐层

  ##########前向计算,从可见层到隐层################
#激活概率
def propup(self,vis):
pre_sigmoid_activation=T.dot(vis,self.W)+self.hbias
return [pre_sigmoid_activation,T.nnet.sigmoid(pre_sigmoid_activation)]
#二值激活
def sample_h_given_v(self,v0_samples):
pre_sigmoid_h1,h1_mean=self.propup(v0_samples)
h1_sample=self.trng.binomial(size=h1_mean.shape,
n=1,
p=h1_mean,
dtype=theano.config.floatX)
return [pre_sigmoid_h1,h1_mean,h1_sample]
• 反向计算:隐层->可见层

  ##########反向计算,从隐层到可见层################
#激活概率
def propdown(self,hid):
pre_sigmoid_activation=T.dot(hid,self.W.T)+self.vbias
return [pre_sigmoid_activation,T.nnet.sigmoid(pre_sigmoid_activation)]
#二值激活
def sample_v_given_h(self,h0_samples):
pre_sigmoid_v1,v1_mean=self.propdown(h0_samples)
v1_sample=self.trng.binomial(size=v1_mean.shape,
n=1,
p=v1_mean,
dtype=theano.config.floatX)
return [pre_sigmoid_v1,v1_mean,v1_sample]
• 一次吉布斯采样: 可见层->隐层->可见层

  ##########吉布斯采样################
#可见层->隐层->可见层
def gibbs_vhv(self,v0_samples):
pre_sigmoid_h1,h1_mean,h1_sample=self.sample_h_given_v(v0_samples)
pre_sigmoid_v1,v1_mean,v1_sample=self.sample_v_given_h(h1_sample)
return [pre_sigmoid_v1,v1_mean,v1_sample,
pre_sigmoid_h1,h1_mean,h1_sample]
• 从第二章节: 具有隐层的能量模型来看, 梯度更新就是对自由能量函数球梯度, 当然如果我们自己写梯度的话,可以按照RBM理论梳理中来做, 事实上在matlab就是这样玩的, 手动写梯度更新方法, 而theano提供了自动求导方法, 所以我们就直接定义自由能函数, 然后对其求导即可
自由能量函数的定义:


############自由能量函数###############

def free_energy(self,v_samples):
wx_b=T.dot(v_samples,self.W)+self.hbias
vbias_term=T.dot(v_samples,self.vbias)#第一项
hidden_term=T.sum(T.log(1+T.exp(wx_b)),axis=1)#第二项
return -hidden_term-vbias_term

然后是梯度更新

  ############梯度更新#################
theano.scan(self.gibbs_vhv,
outputs_info=[None,None,self.input,None,None,None],
n_steps=k,
name='gibbs_vhv')
chain_end=nv_samples[-1]
cost=T.mean(self.free_energy(self.input))-T.mean(self.free_energy(chain_end))
for gparam,param in zip(gparams,self.params):

##################期望看到交叉熵损失##############
monitor_cost=self.get_reconstruction_cost(pre_sigmoid_nvs[-1])
return monitor_cost,updates

当然我们看到了一个函数叫get_reconstruction_cost, 想想RBM是干什么? 是模型的估计出来的样本分布和真实样本分布更加接近, 而度量这个的最好方法就是交叉熵了, 这就是我们在更新的时候希望看到的类似于loss的东东:

  ########非持续性对比散度,重构误差#########
def get_reconstruction_cost(self,pre_sigmoid_nv):
cross_entropy=T.mean(T.sum(self.input*T.log(T.nnet.sigmoid(pre_sigmoid_nv))+\
(1-self.input)*T.log(1-T.nnet.sigmoid(pre_sigmoid_nv)),
axis=1))
return cross_entropy

接下来我们就可以训练了, 一下采用15次CD算法训练(虽然一次就已经足够了)


#初始化并训练玻尔兹曼机

def test_rbm(learning_rate=0.1,training_epochs=15,dataset='mnist.pkl.gz',
batch_size=20,n_chains=20,n_samples=10,n_hidden=500):
#读取数据集
train_set_x,train_set_y=datasets[0]
test_set_x,test_set_y=datasets[2]
n_train_batches=train_set_x.get_value(borrow=True).shape[0]//batch_size
index=T.lscalar()
x=T.matrix('x')

rng=np.random.RandomState(1234)
trng=RandomStreams(rng.randint(2**30))
#初始化一个RBM实例
rbm=RBM(input=x,n_visible=28*28,n_hidden=n_hidden,rng=rng,trng=trng)
#更新参数
#训练RBM
x:train_set_x[index*batch_size:(index+1)*batch_size]
},name='train_rbm')

for epoch in range(training_epochs):
mean_cost=[]
for batch_index in range(n_train_batches):
mean_cost+=[train_rbm(batch_index)]
print ('Training epoch %d,cost is ' % epoch,np.mean(mean_cost))
save_file=open('best_model_rbm.pkl','wb')
model=rbm.params
l=[model[0].get_value(),model[1].get_value(),model[2].get_value()]
cPickle.dump(l,save_file)

接下来看看训练及结果

test_rbm()
'''
('Training epoch 0,cost is ', -221.54436)
('Training epoch 1,cost is ', -193.33549)
('Training epoch 2,cost is ', -188.36243)
('Training epoch 3,cost is ', -185.52127)
('Training epoch 4,cost is ', -183.83319)
('Training epoch 5,cost is ', -182.85869)
('Training epoch 6,cost is ', -181.76605)
('Training epoch 7,cost is ', -180.4059)
('Training epoch 8,cost is ', -179.306)
('Training epoch 9,cost is ', -178.59416)
('Training epoch 10,cost is ', -178.47371)
('Training epoch 11,cost is ', -177.25623)
('Training epoch 12,cost is ', -177.09161)
('Training epoch 13,cost is ', -176.70366)
('Training epoch 14,cost is ', -175.92307)
'''

测试

#取一张图片预测
from PIL import Image
import pylab

dataset='mnist.pkl.gz'
test_set_x,test_set_y=datasets[2]
test_set_x=test_set_x.get_value()
test_data=test_set_x[12:13]

img=test_data.reshape(28,28)
pylab.imshow(img)
pylab.show()

a=cPickle.load(open('best_model_rbm.pkl'))
x=T.matrix('x')
rbm_test=RBM(input=x,n_visible=28*28,n_hidden=500)
rbm_test.params[0].set_value(a[0])
rbm_test.params[1].set_value(a[1])
rbm_test.params[2].set_value(a[2])

([pre_sigmoid_nvs,nv_means,nv_samples,pre_sigmoid_nhs,nh_means,nh_samples],updates)=\
theano.scan(rbm_test.gibbs_vhv,
outputs_info=[None,None,x,None,None,None],
n_steps=1000,
name='gibbs_vhv')
recon=nv_samples[-1]
recon_fn=theano.function([x],recon,updates=updates)

b=recon_fn(test_data)
b=b.reshape(28,28)
pylab.imshow(b)
pylab.show()

【注】若感觉程序有问题, 请反馈在评论区, 毕竟我是python
code链接:链接: https://pan.baidu.com/s/1hsiYqNi 密码: wu2d