深度学习 —— 受限玻尔曼机 RBM


能量基础模型(EBM)

能量基础模型为每一个感兴趣的变量设置分配一个标量能量。学习目的是改变能量函数以使它具有期待属性。例如我们希望通过理想或可行的设置获得低能量。能量基础的概率模型定义了能量函数的概率分布。

p(x) = \frac {e^{-E(x)}} {Z}.

均一化因子Z通过模拟物理系统称为分割函数。

Z = \sum_x e^{-E(x)}

能量基础模型可以通过在训练数据的实证负指数相似上(随机)梯度下降习得。同逻辑回归,我们首先定义指数相似然后定义损失为负指数相似。

\mathcal{L}(\theta, \mathcal{D}) = \frac{1}{N} \sum_{x^{(i)} \in\mathcal{D}} \log\ p(x^{(i)})\\\ell (\theta, \mathcal{D}) = - \mathcal{L} (\theta, \mathcal{D})

使用随机梯度-\frac{\partial  \log p(x^{(i)})}{\partial\theta}

隐藏单元的EBMs

很多情况下,我们没有完全观测到样本x,或者我们希望引入一些非观测性变量以提升模型能力。所以我们考虑观测部分(仍记为x)和隐藏部分h。

P(x) = \sum_h P(x,h) = \sum_h \frac{e^{-E(x,h)}}{Z}.

这种情况下,为类似的来写方程,我们引入(受物理学启发)自由能量。

\mathcal{F}(x) = - \log \sum_h e^{-E(x,h)}

这样我们获得

&P(x) = \frac{e^{-\mathcal{F}(x)}}{Z} \text{ with } Z=\sum_x e^{-\mathcal{F}(x)}.

数据负指数相似梯度有了一个有意思的形式

- \frac{\partial  \log p(x)}{\partial \theta} &= \frac{\partial \mathcal{F}(x)}{\partial \theta} -       \sum_{\tilde{x}} p(\tilde{x}) \           \frac{\partial \mathcal{F}(\tilde{x})}{\partial \theta}.

注意,上述梯度有两个项,称为正数阶段和负数阶段。正数和负数不是指方程中的符号,而是反应它们被模型定义的对概率密度的影响。第一项提高训练数据的概率(通过降低相应的自由能量),而第二想降低模型产生样本的概率。

通常很难确定梯度,因为涉及计算E_P [ \frac{\partial \mathcal{F}(x)} {\partial \theta} ]

这等同于对所有输入x(在模型P分布)的可能设置进行期望值计算。

要想计算它,首先使用固定数量的模型样本来估计期望值。用来估计负数阶段梯度的样本称为负数分子,记作N,梯度可写作- \frac{\partial \log p(x)}{\partial \theta} &\approx  \frac{\partial \mathcal{F}(x)}{\partial \theta} -   \frac{1}{|\mathcal{N}|}\sum_{\tilde{x} \in \mathcal{N}} \   \frac{\partial \mathcal{F}(\tilde{x})}{\partial \theta}.

理想的我们在这里使用蒙特卡洛抽样。上述我们基本有了一个可行的随机函数来学习EBM。唯一缺少的是如何提取负数分子。大部分统计学著作围绕取样方法展开,马可夫链蒙特卡洛对受限玻尔曼机,一种特殊的EBM尤为适用。

受限玻尔曼机

玻尔曼机(BM) 是马可夫随机场(MRF)的一种特殊形式,即在自由参数中能量函数呈线性。为使他们能表达复杂分布(即从有限的参数设置到非参),我们考虑一些变量始终未被观察到(隐藏)。通过有更多隐藏变量(隐藏单元)我们可以提升玻尔曼机的建模能力。受限玻尔曼机更进一步,去除了可见与可见,隐藏与隐藏之间的连接,图示如下

_images/rbm.png

能量函数E(v,h)定义为E(v,h) = - b'v - c'h - h'Wv

W是连接隐藏和可见层的权重,b, c分别是可见和隐藏层的偏置。

这将自由能量公式转化为

\mathcal{F}(v)= - b'v - \sum_i \log \sum_{h_i} e^{h_i (c_i + W_i v)}.

因为RBM的特殊结构,可见和隐藏单元有条件的相互独立,我们可以写作

p(h|v) &= \prod_i p(h_i|v) \\p(v|h) &= \prod_j p(v_j|h).

使用二分单元更新公式

结合公式我们得到下列指数相似梯度

P(v_j=1|h) = sigm(b_j + W'_j h)

P(v_j=1|h) = sigm(b_j + W'_j h)

\mathcal{F}(v)= - b'v - \sum_i \log(1 + e^{(c_i + W_i v)}).

- \frac{\partial{ \log p(v)}}{\partial W_{ij}} &=    E_v[p(h_i|v) \cdot v_j]    - v^{(i)}_j \cdot sigm(W_i \cdot v^{(i)} + c_i) \\-\frac{\partial{ \log p(v)}}{\partial c_i} &=    E_v[p(h_i|v)] - sigm(W_i \cdot v^{(i)})  \\-\frac{\partial{ \log p(v)}}{\partial b_j} &=    E_v[p(v_j|h)] - v^{(i)}_j

受限玻尔曼机的取样

p(x)的样本可以通过运行马可夫链直到聚合获得,使用吉布斯取样作为过渡。

对于受限玻尔曼机来说,S由可见和隐藏单元组成。但是由于他们有条件独立,可以使用块吉布斯取样。在这里,给定隐藏单元值固定,可见单元同时取样。同样,给定可见单元,隐藏单元同时取样。

h^{(n+1)} &\sim sigm(W'v^{(n)} + c) \\v^{(n+1)} &\sim sigm(W h^{(n+1)} + b),

_images/markov_chain.png

理论上,学习过程中每一个参数更新要求运行这一链直到聚合。毋庸置言,这样做代价太高。因此为了在学习过程中更有效的取样,为RBMs设计了一系列算法。

对比发散(CD-k)

对比发散使用两个技巧来加快取样过程:

既然我们最终希望p(v) \approx p_{train}(v)(数据的实际分布),我们使用训练样本来初始化马可夫链(从接近p的分布,这样该链已然接近聚合到最终分布p)。

CD不等链聚合。只在k步吉布斯取样后取样。实践中,k=1出人意料的有效。

持续CD

持续CD使用另一种估算取样。它依赖于单个马可夫链,具有持续状态(不会为每一个观察到的样本重启)。对每一个参数更新,我们简单运行k步来提取新样本。链状态为后续更新保留。

普遍直觉是如果参数更新与链融合率相比是如此之小,马可夫链应该可以跟上模型中的变化。

实现

我们构造一个RBM类。网络参数可以通过构造器初始化或者声明传递。这种方式在RBM构建深度网络时很有用,在那里权重矩阵和隐藏层偏差和相应的MLP网络的sigmoid层共享。

class RBM(object):
    """Restricted Boltzmann Machine (RBM)  """
    def __init__(
        self,
        input=None,
        n_visible=784,
        n_hidden=500,
        W=None,
        hbias=None,
        vbias=None,
        numpy_rng=None,
        theano_rng=None
    ):
        """
        RBM constructor. Defines the parameters of the model along with
        basic operations for inferring hidden from visible (and vice-versa),
        as well as for performing CD updates.

        :param input: None for standalone RBMs or symbolic variable if RBM is
        part of a larger graph.

        :param n_visible: number of visible units

        :param n_hidden: number of hidden units

        :param W: None for standalone RBMs or symbolic variable pointing to a
        shared weight matrix in case RBM is part of a DBN network; in a DBN,
        the weights are shared between RBMs and layers of a MLP

        :param hbias: None for standalone RBMs or symbolic variable pointing
        to a shared hidden units bias vector in case RBM is part of a
        different network

        :param vbias: None for standalone RBMs or a symbolic variable
        pointing to a shared visible units bias
        """

        self.n_visible = n_visible
        self.n_hidden = n_hidden

        if numpy_rng is None:
            # create a number generator
            numpy_rng = numpy.random.RandomState(1234)

        if theano_rng is None:
            theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))

        if W is None:
            # W is initialized with `initial_W` which is uniformely
            # sampled from -4*sqrt(6./(n_visible+n_hidden)) and
            # 4*sqrt(6./(n_hidden+n_visible)) the output of uniform if
            # converted using asarray to dtype theano.config.floatX so
            # that the code is runable on GPU
            initial_W = numpy.asarray(
                numpy_rng.uniform(
                    low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                    high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                    size=(n_visible, n_hidden)
                ),
                dtype=theano.config.floatX
            )
            # theano shared variables for weights and biases
            W = theano.shared(value=initial_W, name='W', borrow=True)

        if hbias is None:
            # create shared variable for hidden units bias
            hbias = theano.shared(
                value=numpy.zeros(
                    n_hidden,
                    dtype=theano.config.floatX
                ),
                name='hbias',
                borrow=True
            )

        if vbias is None:
            # create shared variable for visible units bias
            vbias = theano.shared(
                value=numpy.zeros(
                    n_visible,
                    dtype=theano.config.floatX
                ),
                name='vbias',
                borrow=True
            )

        # initialize input layer for standalone RBM or layer0 of DBN
        self.input = input
        if not input:
            self.input = T.matrix('input')

        self.W = W
        self.hbias = hbias
        self.vbias = vbias
        self.theano_rng = theano_rng
        # **** WARNING: It is not a good idea to put things in this list
        # other than shared variables created in this function.
        self.params = [self.W, self.hbias, self.vbias]

接下来定义构建象征图的函数

  
   def propup(self, vis):
        '''This function propagates the visible units activation upwards to
        the hidden units

        Note that we return also the pre-sigmoid activation of the
        layer. As it will turn out later, due to how Theano deals with
        optimizations, this symbolic variable will be needed to write
        down a more stable computational graph (see details in the
        reconstruction cost function)

        '''
        pre_sigmoid_activation = T.dot(vis, self.W) + self.hbias
        return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]

  def sample_h_given_v(self, v0_sample):
        ''' This function infers state of hidden units given visible units '''
        # compute the activation of the hidden units given a sample of
        # the visibles
        pre_sigmoid_h1, h1_mean = self.propup(v0_sample)
        # get a sample of the hiddens given their activation
        # Note that theano_rng.binomial returns a symbolic sample of dtype
        # int64 by default. If we want to keep our computations in floatX
        # for the GPU we need to specify to return the dtype floatX
        h1_sample = self.theano_rng.binomial(size=h1_mean.shape,
                                             n=1, p=h1_mean,
                                             dtype=theano.config.floatX)
        return [pre_sigmoid_h1, h1_mean, h1_sample]

   def propdown(self, hid):
        '''This function propagates the hidden units activation downwards to
        the visible units

        Note that we return also the pre_sigmoid_activation of the
        layer. As it will turn out later, due to how Theano deals with
        optimizations, this symbolic variable will be needed to write
        down a more stable computational graph (see details in the
        reconstruction cost function)

        '''
        pre_sigmoid_activation = T.dot(hid, self.W.T) + self.vbias
        return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)] 

def sample_v_given_h(self, h0_sample):
        ''' This function infers state of visible units given hidden units '''
        # compute the activation of the visible given the hidden sample
        pre_sigmoid_v1, v1_mean = self.propdown(h0_sample)
        # get a sample of the visible given their activation
        # Note that theano_rng.binomial returns a symbolic sample of dtype
        # int64 by default. If we want to keep our computations in floatX
        # for the GPU we need to specify to return the dtype floatX
        v1_sample = self.theano_rng.binomial(size=v1_mean.shape,
                                             n=1, p=v1_mean,
                                             dtype=theano.config.floatX)
        return [pre_sigmoid_v1, v1_mean, v1_sample]

然后使用这些函数来定义吉布斯取样,我们定义两个函数:

gibbs_vhv 从可见单元中进行单步吉布斯取样,这在从RBM取样中将很有用。

gibbs_hvh 从隐藏单元中进行单步吉布斯取样,这在执行CD或PCD更新时会很有用。

  
  def gibbs_hvh(self, h0_sample):
        ''' This function implements one step of Gibbs sampling,
            starting from the hidden state'''
        pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h0_sample)
        pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v1_sample)
        return [pre_sigmoid_v1, v1_mean, v1_sample,
                pre_sigmoid_h1, h1_mean, h1_sample]

  def gibbs_vhv(self, v0_sample):
        ''' This function implements one step of Gibbs sampling,
            starting from the visible state'''
        pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v0_sample)
        pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h1_sample)
        return [pre_sigmoid_h1, h1_mean, h1_sample,
                pre_sigmoid_v1, v1_mean, v1_sample]

注意我们也返回pre-sigmoid激活。理解为什么这样做首先要了解一点Theano的工作方式。当你编译一个Theano函数时,传递输入的计算图被优化以提高速度和稳定性。这通过改变一些图的区域实现。一种优化以softplus来表示log(sigmoid(x))。我们需要这种优化来计算交叉熵,因为sigmoid对于大于30的数字转化为1,对于小于-30的数字为0,这回迫使Theano计算log(0),从而使我们得到-inf或者NaN成本。如果使用softplus,我们避免这种情况。这种优化一般可行,但有特例。sigmoid在scan操作内运行,而log在之外。这样Theano只能看到log(scan(...))而不是log(sigmoid(...))从而无法优化。我们也无法替代scan里的sigmoid,因为只需要在最后一步进行这样的操作。因此最简单有效的方法是使pre-sigmoid激活也作为scan的一个输出,然后执行log和sigmoid,这样Theano可以捕捉并优化它们。

同时还有函数计算模型的自由能量,用来计算参数的梯度,这里我们同样需要返回pre-sigmoid

 def free_energy(self, v_sample):
        ''' Function to compute the free energy '''
        wx_b = T.dot(v_sample, self.W) + self.hbias
        vbias_term = T.dot(v_sample, self.vbias)
        hidden_term = T.sum(T.log(1 + T.exp(wx_b)), axis=1)
        return -hidden_term - vbias_term

然后加入get_cost_updates方法,生成CD-k和PCD-k的象征梯度

 
 def get_cost_updates(self, lr=0.1, persistent=None, k=1):
        """This functions implements one step of CD-k or PCD-k

        :param lr: learning rate used to train the RBM

        :param persistent: None for CD. For PCD, shared variable
            containing old state of Gibbs chain. This must be a shared
            variable of size (batch size, number of hidden units).

        :param k: number of Gibbs steps to do in CD-k/PCD-k

        Returns a proxy for the cost and the updates dictionary. The
        dictionary contains the update rules for weights and biases but
        also an update of the shared variable used to store the persistent
        chain, if one is used.

        """

        # compute positive phase
        pre_sigmoid_ph, ph_mean, ph_sample = self.sample_h_given_v(self.input)

        # decide how to initialize persistent chain:
        # for CD, we use the newly generate hidden sample
        # for PCD, we initialize from the old state of the chain
        if persistent is None:
            chain_start = ph_sample
        else:
            chain_start = persistent

注意get_cost_updates接受persistent声明变量。这使我们可以使用相同代码来执行CD和PCD。使用PCD时,persistent指代共享变量,包含了此前迭代的吉布斯链状态。

如果persistent是None,我们使用正数阶段生成的隐藏样本来初始化吉布斯链。一旦我们建立了链的起点,我们可以计算吉布斯链终点的样本,我们需要该样本来计算梯度。

   
   # perform actual negative phase
        # in order to implement CD-k/PCD-k we need to scan over the
        # function that implements one gibbs step k times.
        # Read Theano tutorial on scan for more information :
        # http://deeplearning.net/software/theano/library/scan.html
        # the scan will return the entire Gibbs chain
        (
            [
                pre_sigmoid_nvs,
                nv_means,
                nv_samples,
                pre_sigmoid_nhs,
                nh_means,
                nh_samples
            ],
            updates
        ) = theano.scan(
            self.gibbs_hvh,
            # the None are place holders, saying that
            # chain_start is the initial state corresponding to the
            # 6th output
            outputs_info=[None, None, None, None, None, chain_start],
            n_steps=k,
            name="gibbs_hvh"
        )

一旦我们生成了链,我们使用最终的样本来计算负数阶段的自由能量。注意chain_end是一个象征Theano变量,体现为模型参数,如果我们简单使用T.grad,函数会尝试遍历吉布斯链以获得梯度。这不是我们想要的,因此我们要标注T.grad和chain_end为常量,我们使用声明T.grad的consider_constant来实现。

   
 # determine gradients on RBM parameters
        # note that we only need the sample at the end of the chain
        chain_end = nv_samples[-1]

        cost = T.mean(self.free_energy(self.input)) - T.mean(
            self.free_energy(chain_end))
        # We must not compute the gradient through the gibbs sampling
        gparams = T.grad(cost, self.params, consider_constant=[chain_end])

最后,我们把scan(包含theano_rng随机态的更新规则)返回的参数更新加入更新字典。在PCD中,应该同时更新包含吉布斯链状态的分享变量。

   
  # constructs the update dictionary
        for gparam, param in zip(gparams, self.params):
            # make sure that the learning rate is of the right dtype
            updates[param] = param - gparam * T.cast(
                lr,
                dtype=theano.config.floatX
            )
        if persistent:
            # Note that this works only if persistent is a shared variable
            updates[persistent] = nh_samples[-1]
            # pseudo-likelihood is a better proxy for PCD
            monitoring_cost = self.get_pseudo_likelihood_cost(updates)
        else:
            # reconstruction cross-entropy is a better proxy for CD
            monitoring_cost = self.get_reconstruction_cost(updates,
                                                           pre_sigmoid_nvs[-1])

        return monitoring_cost, updates

追踪进程

由于分割函数Z,RBMs训练起来很困难。我们无法在训练时估计指数相似log(P(x)),因此没有办法直接衡量以选择最优的超参数。

这里仅提供一些想法。

检查负样本

训练过程中得到的负样本可以图形化。随着训练进行,我们知道模型定义的RBM与实际的Ptrain(x)分布越来越接近。负样本因此应该与训练集中的样本相似。明显的不良超参数可以以这种方式舍弃。

图形化检视过滤器

模型学习到的过滤器可以图形化。相当于勾画每个单元权重的灰图(在重新调整为平方矩阵)。过滤器应该找出数据中的突出特征。虽然对于任意数据集很难说特征应该怎样,对于MNIST的训练过滤器一般能捕捉到笔画,而对于自然图像的训练如果结合稀疏标准则过滤器类似于Gabor。

相似性的代理

其他更能追踪的函数可以作为相似性的代理。当使用PCD训练RBM时,可以使用伪相似作为代理。伪相似(PL)计算代价小,因为它假设所有部分互相独立

PL(x) = \prod_i P(x_i | x_{-i}) \text{ and }\\\log PL(x) = \sum_i \log P(x_i | x_{-i})

这里x-i表示除了i的所有x。log-PL因此是每个Xi指数概率之和,给定其他所有部分状态。对于MNIST,涉及加总所有784输入维度,依然昂贵。因此我们使用log-PL的随机相似

g = N \cdot \log P(x_i | x_{-i}) \text{, where } i \sim U(0,N), \text{, and}\\E[ g ] = \log PL(x)

期望由索引i的一致随机选择,N是可见单元的数量。为进行二进制操作,我们进一步引进\tilde{x}_i指代x,i比特互反(1->0, 0->1)。RBM的log-PL写成\log PL(x) &\approx N \cdot \log   \frac {e^{-FE(x)}} {e^{-FE(x)} + e^{-FE(\tilde{x}_i)}} \\&\approx N \cdot \log[ sigm (FE(\tilde{x}_i) - FE(x)) ]

我们因此在RBM类的get_cost_updates函数返回这个成本以及RBM更新。 注意我们调整更新字典来递增i的索引。这会导致i遍历所有可能的值{0,1,...,N}。

注意对于输入和重构的(同降噪自动编码机)CD训练交叉熵成本比伪指数相似更可靠。以下是我们计算伪相似的代码:

 
def get_pseudo_likelihood_cost(self, updates):
        """Stochastic approximation to the pseudo-likelihood"""

        # index of bit i in expression p(x_i | x_{\i})
        bit_i_idx = theano.shared(value=0, name='bit_i_idx')

        # binarize the input image by rounding to nearest integer
        xi = T.round(self.input)

        # calculate free energy for the given bit configuration
        fe_xi = self.free_energy(xi)

        # flip bit x_i of matrix xi and preserve all other bits x_{\i}
        # Equivalent to xi[:,bit_i_idx] = 1-xi[:, bit_i_idx], but assigns
        # the result to xi_flip, instead of working in place on xi.
        xi_flip = T.set_subtensor(xi[:, bit_i_idx], 1 - xi[:, bit_i_idx])

        # calculate free energy with bit flipped
        fe_xi_flip = self.free_energy(xi_flip)

        # equivalent to e^(-FE(x_i)) / (e^(-FE(x_i)) + e^(-FE(x_{\i})))
        cost = T.mean(self.n_visible * T.log(T.nnet.sigmoid(fe_xi_flip -
                                                            fe_xi)))

        # increment bit_i_idx % number as part of updates
        updates[bit_i_idx] = (bit_i_idx + 1) % self.n_visible

        return cost

主循环

现在我们具备了训练网络所有的条件。

在我们开始训练之前,应熟悉tile_raster_images,见Miscellaneous - DeepLearning 0.1 documentation

因为RBMs是生成模型,我们对取样以及图形化这些样本更改兴趣。我们也希望图形化习得的过滤器(权重),以了解RBMs的工作。但是记住我们忽略了偏差并且将权重乘以常数以转化到了0和1之前。

我们开始训练RBM并保存/画出每次训练后的过滤器,我们使用PCD训练因为它展示了更好的生成模型。

  # it is ok for a theano function to have no output
    # the purpose of train_rbm is solely to update the RBM parameters
    train_rbm = theano.function(
        [index],
        cost,
        updates=updates,
        givens={
            x: train_set_x[index * batch_size: (index + 1) * batch_size]
        },
        name='train_rbm'
    )

    plotting_time = 0.
    start_time = timeit.default_timer()

    # go through training epochs
    for epoch in range(training_epochs):

        # go through the training set
        mean_cost = []
        for batch_index in range(n_train_batches):
            mean_cost += [train_rbm(batch_index)]

        print('Training epoch %d, cost is ' % epoch, numpy.mean(mean_cost))

        # Plot filters after each training epoch
        plotting_start = timeit.default_timer()
        # Construct image from the weight matrix
        image = Image.fromarray(
            tile_raster_images(
                X=rbm.W.get_value(borrow=True).T,
                img_shape=(28, 28),
                tile_shape=(10, 10),
                tile_spacing=(1, 1)
            )
        )
        image.save('filters_at_epoch_%i.png' % epoch)
        plotting_stop = timeit.default_timer()
        plotting_time += (plotting_stop - plotting_start)

    end_time = timeit.default_timer()

    pretraining_time = (end_time - start_time) - plotting_time

    print ('Training took %f minutes' % (pretraining_time / 60.))

训练完RBM之后,我们可以使用gibbs_vhv来实现取样所需的吉布斯链。我们从测试样本开始初始化吉布斯链(虽然我们也可以从训练集中选择)以提高聚合速度和避免随机初始化的问题。我们使用Theano的scan来执行1000步,然后作图。

 
 #################################
    #     Sampling from the RBM     #
    #################################
    # find out the number of test samples
    number_of_test_samples = test_set_x.get_value(borrow=True).shape[0]

    # pick random test examples, with which to initialize the persistent chain
    test_idx = rng.randint(number_of_test_samples - n_chains)
    persistent_vis_chain = theano.shared(
        numpy.asarray(
            test_set_x.get_value(borrow=True)[test_idx:test_idx + n_chains],
            dtype=theano.config.floatX
        )
    )

然后我们建立20个并行的持续链来获得样本。我们编译一个Theano函数来执行吉布斯步骤并用新的可见样本来更新持续链的状态。我们迭代一个较大步数该函数,并在每1000步图形化样本。

  
 plot_every = 1000
    # define one step of Gibbs sampling (mf = mean-field) define a
    # function that does `plot_every` steps before returning the
    # sample for plotting
    (
        [
            presig_hids,
            hid_mfs,
            hid_samples,
            presig_vis,
            vis_mfs,
            vis_samples
        ],
        updates
    ) = theano.scan(
        rbm.gibbs_vhv,
        outputs_info=[None, None, None, None, None, persistent_vis_chain],
        n_steps=plot_every,
        name="gibbs_vhv"
    )

    # add to updates the shared variable that takes care of our persistent
    # chain :.
    updates.update({persistent_vis_chain: vis_samples[-1]})
    # construct the function that implements our persistent chain.
    # we generate the "mean field" activations for plotting and the actual
    # samples for reinitializing the state of our persistent chain
    sample_fn = theano.function(
        [],
        [
            vis_mfs[-1],
            vis_samples[-1]
        ],
        updates=updates,
        name='sample_fn'
    )

    # create a space to store the image for plotting ( we need to leave
    # room for the tile_spacing as well)
    image_data = numpy.zeros(
        (29 * n_samples + 1, 29 * n_chains - 1),
        dtype='uint8'
    )
    for idx in range(n_samples):
        # generate `plot_every` intermediate samples that we discard,
        # because successive samples in the chain are too correlated
        vis_mf, vis_sample = sample_fn()
        print(' ... plotting sample %d' % idx)
        image_data[29 * idx:29 * idx + 28, :] = tile_raster_images(
            X=vis_mf,
            img_shape=(28, 28),
            tile_shape=(1, n_chains),
            tile_spacing=(1, 1)
        )

    # construct image
    image = Image.fromarray(image_data)
    image.save('samples.png')

结果

我们使用PCD-15,学习速率0.1,批次规模20,运行15次。在Intel Xeon E5430 @2.66GHz CPU上单线程GotoBLAS,模型使用122.466分钟。

输入如下

... loading data
Training epoch 0, cost is  -90.6507246003
Training epoch 1, cost is  -81.235857373
Training epoch 2, cost is  -74.9120966945
Training epoch 3, cost is  -73.0213216101
Training epoch 4, cost is  -68.4098570497
Training epoch 5, cost is  -63.2693021647
Training epoch 6, cost is  -65.99578971
Training epoch 7, cost is  -68.1236650015
Training epoch 8, cost is  -68.3207365087
Training epoch 9, cost is  -64.2949797113
Training epoch 10, cost is  -61.5194867893
Training epoch 11, cost is  -61.6539369402
Training epoch 12, cost is  -63.5465278086
Training epoch 13, cost is  -63.3787093527
Training epoch 14, cost is  -62.755739271
Training took 122.466000 minutes
 ... plotting sample  0
 ... plotting sample  1
 ... plotting sample  2
 ... plotting sample  3
 ... plotting sample  4
 ... plotting sample  5
 ... plotting sample  6
 ... plotting sample  7
 ... plotting sample  8
 ... plotting sample  9

15次以后过滤器图形如下

_images/filters_at_epoch_14.png

下列是RBM训练后生成的样本,每一行代表负分子的微批次(来自吉布斯链的独立样本),两行间执行1000步吉布斯取样。

_images/samples.png




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值