# 一、RBM的网络结构

RBM的网络结构如下图所示：

RBM中包括两层，即：

• 可见层(visible layer)，图上的v$\mathbf{v}$
• 隐藏层(hidden layer)，图上的h$\mathbf{h}$

(图片来自参考文献1)

# 二、RBM模型的计算

## 2.1、能量函数

Eθ(v,h)=i=1nvaivij=1nhbjhji=1nvj=1nhhjwj,ivi

Pθ(v,h)=1ZθeEθ(v,h)

Zθ=v,heEθ(v,h)

Pθ(v)=hPθ(v,h)=1ZθheEθ(v,h)

Pθ(h)=vPθ(v,h)=1ZθveEθ(v,h)

## 2.2、激活概率

hk=Δ(h1,h2,,hk1,hk+1,,hnh)T

αk(v)=Δbk+i=1nvwk,ivi

β(v,hk)=Δi=1nvaivi+j=1,jknhbjhj+i=1nvj=1,jknhhjwj,ivi

E(v,h)=β(v,hk)hkαk(v)

P(hk=1v)=P(hk=1hk,v)=P(hk=1,hk,v)P(hk,v)=P(hk=1,hk,v)P(hk=0,hk,v)+P(hk=1,hk,v)=eE(hk=1,hk,v)eE(hk=0,hk,v)+eE(hk=1,hk,v)=11+eE(hk=0,hk,v)+E(hk=1,hk,v)=11+e[β(v,hk)+0αk(v)]+[β(v,hk)1αk(v)]=11+eαk(v)

Sigmoid(x)=11+ex

P(hk=1v)=Sigmoid(αk(v))=Sigmoid(bk+i=1nvwk,ivi)

P(vk=1h)=Sigmoid(αk(h))=Sigmoidak+j=1nhwj,khj

## 2.3、模型的训练

### 2.3.1、模型的优化函数

X={v1,v2,,vns}

Lθ=i=1nsP(vi)

lnLθ=lni=1nsP(vi)=i=1nslnP(vi)

### 2.3.2、最大似然的求解

θ=θ+ηlnLθθ

lnLθ=lnP(v)=ln(1ZheE(v,h))=lnheE(v,h)lnZ=lnheE(v,h)lnv,heE(v,h)

lnLθθ$\frac{\partial lnL_\theta }{\partial \theta }$为：

lnLθθ=lnP(v)θ=θ(lnheE(v,h))θlnv,heE(v,h)=1heE(v,h)heE(v,h)E(v,h)θ+1v,heE(v,h)v,heE(v,h)E(v,h)θ

eE(v,h)heE(v,h)=eE(v,h)ZheE(v,h)Z=P(v,h)P(v)=P(hv)

lnLθθ=hP(hv)E(v,h)θ+v,hP(v,h)E(v,h)θ

v,hP(v,h)E(v,h)θ=vhP(v)P(hv)E(v,h)θ=vP(v)hP(hv)E(v,h)θ

• hP(hv)E(v,h)wi,j$\sum _{\mathbf{h}}P\left ( \mathbf{h}\mid \mathbf{v} \right )\frac{\partial E\left ( \mathbf{v},\mathbf{h} \right )}{\partial w_{i,j} }$
• hP(hv)E(v,h)ai$\sum _{\mathbf{h}}P\left ( \mathbf{h}\mid \mathbf{v} \right )\frac{\partial E\left ( \mathbf{v},\mathbf{h} \right )}{\partial a_i}$
• hP(hv)E(v,h)bj$\sum _{\mathbf{h}}P\left ( \mathbf{h}\mid \mathbf{v} \right )\frac{\partial E\left ( \mathbf{v},\mathbf{h} \right )}{\partial b_j}$

Eθ(v,h)=i=1nvaivij=1nhbjhji=1nvj=1nhhjwj,ivi

• wj,i$w_{j,i}$求导数

hP(hv)E(v,h)wj,i=hP(hv)hjvi=hk=1nhP(hkv)hjvi=hP(hjv)P(hjv)hjvi=hjP(hjv)hjvihjP(hjv)=hjP(hjv)hjvi=(P(hj=0v)0vi+P(hj=1v)1vi)=P(hj=1v)vi

• ai$a_i$求导数

hP(hv)E(v,h)ai=hP(hv)vi=vihP(hv)=vi

• bj$b_j$求导数

hP(hv)E(v,h)bj=hP(hv)hj=hk=1nhP(hkv)hj=hP(hjv)P(hjv)hj=hjP(hjv)hjhjP(hjv)=hjP(hjv)hj=(P(hj=0v)0+P(hj=1v)1)=P(hj=1v)

lnLθwj,i=P(hj=1v)vivP(v)P(hj=1v)vi

lnLθai=vivP(v)vi

lnLθbj=P(hj=1v)vP(v)P(hj=1v)

### 2.3.3、优化求解

Hinton提出了高效的训练RBM的算法——对比散度(Contrastive Divergence, CD)算法。

k$k$步CD算法的具体步骤为：

v$\forall \mathbf{v}$，取初始值：v(0):=v$\mathbf{v}^{\left ( 0 \right )}:=\mathbf{v}$，然后执行k$k$步Gibbs采样，其中第t$t$步先后执行：

• 利用P(hv(t1))$P\left ( \mathbf{h}\mid \mathbf{v}^{\left ( t-1 \right )} \right )$采样出h(t1)$\mathbf{h}^{\left ( t-1 \right )}$
• 利用P(vh(t1))$P\left ( \mathbf{v}\mid \mathbf{h}^{\left ( t-1 \right )} \right )$采样出v(t)$\mathbf{v}^{\left ( t \right )}$

• for j=1,2,,nh$j=1,2,\cdots ,n_h$ do
• {
• 产生[0,1]$\left [ 0,1 \right ]$上的随机数rj$r_j$
• hj={10 if rj<pvj otherwise
• }

• for i=1,2,,nv$i=1,2,\cdots ,n_v$ do
• {
• 产生[0,1]$\left [ 0,1 \right ]$上的随机数rj$r_j$
• vi={10 if ri<phi otherwise
• }

# 三、实验

# coding:UTF-8

import numpy as np
import random as rd

data = []
f = open(file_name)
lines = line.strip().split("\t")
tmp = []
for x in lines:
tmp.append(float(x) / 255.0)
data.append(tmp)
f.close()
return data

def sigmrnd(P):
m, n = np.shape(P)
X = np.mat(np.zeros((m, n)))
P_1 = sigm(P)
for i in xrange(m):
for j in xrange(n):
r = rd.random()
if P_1[i, j] >= r:
X[i, j] = 1

return X

def sigm(P):
return 1.0 / (1 + np.exp(-P))

datafile = "b.txt"
m, n = np.shape(data)

# step_2: initialize
num_epochs = 10
batch_size = 100
input_dim = n

hidden_sz = 100

alpha = 1
momentum = 0.1
W = np.mat(np.zeros((hidden_sz, input_dim)))
vW = np.mat(np.zeros((hidden_sz, input_dim)))
b = np.mat(np.zeros((input_dim, 1)))
vb = np.mat(np.zeros((input_dim, 1)))
c = np.mat(np.zeros((hidden_sz, 1)))
vc = np.mat(np.zeros((hidden_sz, 1)))

# step_3: training
print "Start to train RBM: "

num_batches = int(m / batch_size)
for i in xrange(num_epochs):
kk = np.random.permutation(range(m))
err = 0.0

for j in xrange(num_batches):
batch = data[kk[j * batch_size:(j + 1) * batch_size], ]

v1 = batch
h1 = sigmrnd(np.ones((batch_size, 1)) * c.T + v1 * W.T)
v2 = sigmrnd(np.ones((batch_size, 1)) * b.T + h1 * W)
h2 = sigm(np.ones((batch_size, 1)) * c.T + v2 * W.T)

c1 = h1.T * v1
c2 = h2.T * v2

vW = momentum * vW + alpha * (c1 - c2) / batch_size
vb = momentum * vb + alpha * sum(v1 - v2).T / batch_size
vc = momentum * vc + alpha * sum(h1 - h2).T / batch_size

W = W + vW
b = b + vb
c = c + vc

#cal_err
err_result = v1 - v2
err_1 = 0.0
m_1, n_1 = np.shape(err_result)
for x in xrange(m_1):
for y in xrange(n_1):
err_1 = err_1 + err_result[x, y] ** 2

err = err + err_1 / batch_size
#print i,j,err

print i, err / num_batches

#print W

m_2,n_2 = np.shape(W)

for i in xrange(m_2):
for j in xrange(n_2):
print str(W[i, j]) + " ",
print "\n",