受限Boltzmann机器rbms简介

受限Boltzmann机(RBM)是由Geoffrey Hinton发明的无监督深度学习模型,用于降维、分类、回归等任务。RBM具有可见层和隐藏层,但层内节点不相连,简化了训练过程。与自动编码器不同,RBM使用随机单位和特定分布。训练RBM的目标是找出输入和隐藏层变量之间的连接。
摘要由CSDN通过智能技术生成

介绍 (Introduction)

Invented by Geoffrey Hinton(Sometimes referred to as the Godfather of Deep Learning), a Restricted Boltzmann machine is an algorithm useful for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modeling.

受限的玻尔兹曼机由Geoffrey Hinton (有时被称为深度学习的教父)发明,是一种用于降维,分类,回归,协作过滤,特征学习和主题建模的算法。

Before moving forward let us first understand what is Boltzmann Machines?

在继续前进之前,让我们首先了解什么是玻尔兹曼机器?

什么是玻尔兹曼机? (What are Boltzmann Machines?)

A Boltzmann machine is a stochastic(non-deterministic) or generative deep learning model which has only visible(input) and hidden nodes.

Boltzmann机器是一种随机(不确定性)或生成型深度学习模型,它仅具有可见(输入)节点和隐藏节点。

The image below presents ten nodes in it and all of them are inter-connected and are also often referred to as States. Brown ones represent Hidden nodes (h)and blue ones represent Visible nodes (v). If you already understand Artificial, Convolutional, and Recurrent Neural networks, you’ll notice they never had their Input nodes connected, whereas Boltzmann Machines have their inputs connected & that is what makes them fundamentally unconventional. All these nodes exchange information among themselves and self-generate subsequent data hence termed as Generative deep model.

下面呈现的是十个节点和所有的人的形象是相互连接的,并且也经常被称为国家 。 棕色代表隐藏节点 ( h ),蓝色代表可见节点 ( v )。 如果您已经了解了人工,卷积和递归神经网络,您会发现它们从未连接输入节点,而玻尔兹曼机器却连接了输入,这使它们在本质上是非常规的。 所有这些节点之间相互交换信息,并自行生成后续数据,因此称为生成深度模型

Image for post
Boltzmann machine hidden and visible nodes
玻尔兹曼机器的隐藏和可见节点

There is no Output node in this model hence like our other classifiers, we cannot make this model learn 1 or 0 from the Target variable of the training dataset after applying gradient descent or stochastic gradient descent (SGD), etc. Exactly similar cases with our regressor models as well, where it cannot learn the pattern from Target variables. These attributes make the model non-deterministic. Thinking of how does this model then learns and predicts, is that intriguing enough?

该模型中没有Output节点,因此像我们的其他分类器一样,在应用梯度下降或随机梯度下降(SGD)等之后, 我们无法使该模型从 训练数据集 的Target变量 学习1或0。与我们的情况完全相似回归模型,也无法从目标变量中学习模式。 这些属性使模型不确定。 考虑到该模型如何学习和预测,这是否足够吸引人?

Here, Visible nodes are what we measure and Hidden nodes are what we don’t measure. When we input data, these nodes learn all the parameters, their patterns, and correlation between those on their own and forms an efficient system, hence Boltzmann Machine is termed as an Unsupervised Deep Learning model. This model then gets ready to monitor and study abnormal behavior depending on what it has learned.

在这里,可见节点是我们测量的,而隐藏节点是我们不测量的。 当我们输入数据时,这些节点自己学习所有参数,它们的模式以及它们之间的相关性,并形成一个有效的系统,因此,玻尔兹曼机被称为无监督机 深度学习模型。 然后,该模型准备根据所学知识来监视和研究异常行为。

Hinton once referred to the illustration of a Nuclear Power plant as an example for understanding Boltzmann Machines. This is a complex topic so we shall proceed slowly to understand the intuition behind each concept, with a minimum amount of mathematics and physics involved.

欣顿曾经以核电站的图示为例来理解玻尔兹曼机器。 这是一个复杂的主题,因此我们将以最少的数学和物理量来慢慢理解每个概念的直觉。

Image for post
Nuclear power plant diagram 核电站图

So in the simplest introductory terms, Boltzmann Machines are primarily divided into two categories: Energy-based Models (EBMs) and Restricted Boltzmann Machines (RBMs). When these RBMs are stacked on top of each other, they are known as Deep Belief Networks (DBNs).

因此,在最简单的介绍性术语中,玻尔兹曼机主要分为两类: 基于能量的模型(EBM)受限玻尔兹曼机(RBM) 。 当这些RBM彼此堆叠在一起时,它们被称为深度信任网络(DBN)

什么是受限玻尔兹曼机? (What are Restricted Boltzmann Machines?)

A Restricted Boltzmann Machine (RBM) is a generative, stochastic, and 2-layer artificial neural network that can learn a probability distribution over its set of inputs.

受限玻尔兹曼机(RBM)是一种生成的,随机的2层人工神经网络,可以学习其输入集的概率分布。

Stochastic means “randomly determined”, and in RBMs, the coefficients that modify inputs are randomly initialized.

随机意味着“随机确定”,在RBM中,修改输入的系数是随机初始化的。

The first layer of the RBM is called the visible, or input layer, and the second is the hidden layer. Each circle represents a neuron-like unit called a node. Each node in the input layer is connected to every node of the hidden layer.

RBM的第一层称为可见层或输入层,第二层称为隐藏层。 每个圆圈代表一个称为节点的类似神经元的单元 输入层中的每个节点都连接到隐藏层的每个节点。

Image for post
Layers in Restricted Boltzmann Machine 受限玻尔兹曼机中的层

The restriction in a Restricted Boltzmann Machine is that there is no intra-layer communication(nodes of the same layer are not connected). This restriction allows for more efficient training algorithms than what is available for the general class of Boltzmann machines, in particular, the gradient-based contrastive divergence algorithm. Each node is a locus of computation that processes input and begins by making stochastic decisions about whether to transmit that input or not.

在一个受限玻尔兹曼机的限制没有层内的通信 (同一层的节点不连接)。 这种限制允许使用比一般Boltzmann机器更有效的训练算法,特别是基于梯度的 对比发散算法。 每个节点都是处理输入的计算位置,并首先就是否传输该输入做出随机决定。

RBM received a lot of attention after being proposed as building blocks of multi-layer learning architectures called Deep Belief Networks(DBNs). When these RBMs are stacked on top of each other, they are known as DBNs.

RBM被提议为称为深度信念网络(DBN)的多层学习体系结构的构建块后,引起了很多关注。 当这些RBM相互堆叠时,它们称为DBN。

Image for post
(a) Restricted Boltzmann Machine. (b) A stack of RBMs. (c) The corresponding DBN.
(a)受限的玻尔兹曼机。 (b)一堆成果管理制。 (c)相应的DBN。

自动编码器和RBM之间的区别 (Difference between Autoencoders & RBMs)

Autoencoder is a simple 3-layer neural network where output units are directly connected back to input units. Typically, the number of hidden units is much less than the number of visible ones. The task of training is to minimize an error or reconstruction, i.e. find the most efficient compact representation for input data.

自动编码器 是一个简单的三层神经网络,其中输出单元直接连接回输入单元 通常,隐藏单元的数量远远少于可见单元的数量。 训练的任务是最大程度地减少错误或重建,即找到输入数据的最有效紧凑表示形式。

Image for post
Layers in Autoencoders
自动编码器中的图层

RBM shares a similar idea, but it uses stochastic units with particular distribution instead of deterministic distribution. The task of training is to find out how these two sets of variables are connected.

RBM也有类似的想法,但是它使用具有特定分布的随机单位,而不是确定性分布。 培训的任务是找出这两组变量之间的联系。

受限玻尔兹曼机的工作 (Working of Restricted Boltzmann Machine)

One aspect that distinguishes RBM from other Neural networks is that it has two biases.

将RBM与其他神经网络区分开的一个方面是,它具有两个偏差。

  • The hidden bias helps the RBM produce the activations on the forward pass, while

    隐藏的偏见有助于RBM在正向传递时产生激活,而

  • The visible layer’s biases help the RBM learn the reconstructions on the backward pass.

    可见层的偏差有助于RBM学习反向传递的重建。

The reconstructed input is always different from the actual input as there are no connections among visible nodes and therefore, no way of transferring information among themselves.

重建的输入始终与实际输入不同,因为可见节点之间没有连接,因此无法在它们之间传递信息。

Image for post
Forward pass
前传

The above image shows the first step in training an RBM with multiple inputs. The inputs are multiplied by the weights and then added to the bias. The result is then passed through a sigmoid activation function and the output determines if the hidden state gets activated or not. Weights will be a matrix with the number of input nodes as the number of rows and the number of hidden nodes as the number of columns. The first hidden node will receive the vector multiplication of the inputs multiplied by the first column of weights before the corresponding bias term is added to it.

上图显示了训练带有多个输入的RBM的第一步。 输入乘以权重,然后加到偏差上。 然后将结果通过S型激活函数传递,并且输出确定隐藏状态是否被激活。 权重将是一个矩阵,其中输入节点数为行数,隐藏节点数为列数。 在向其添加相应的偏差项之前,第一个隐藏节点将接收输入的矢量乘以权重的第一列。

The sigmoid function is given by:

乙状结肠功能由下式给出:

Image for post

So the equation that we get in this step would be,

因此,我们在此步骤中得到的等式是,

Image for post

where h(1) and v(0) are the corresponding vectors (column matrices) for the hidden and the visible layers with the superscript as the iteration (v(0) means the input that we provide to the network) and a is the hidden layer bias vector.

其中h(1)v(0)是隐藏层和可见层的相应向量(列矩阵),其中上标为迭代( v(0)表示我们提供给网络的输入),而a是隐藏层偏差向量。

Image for post
Backward pass
后退通行证

Now this image shows the reverse phase or the reconstruction phase. It is similar to the first pass but in the opposite direction. The equation comes out to be:

现在,此图像显示了反相或重建阶段。 它与第一遍相似,但方向相反。 公式得出为:

Image for post

where v(1) and h(1) are the corresponding vectors (column matrices) for the visible and the hidden layers with the superscript as the iteration and b is the visible layer bias vector.

其中v(1)h(1)是可见层和隐藏层的对应向量(列矩阵),其中上标为迭代, b是可见层偏差向量。

训练受限的玻尔兹曼机 (Training a Restricted Boltzmann Machine)

The training of the Restricted Boltzmann Machine differs from the training of regular neural networks via stochastic gradient descent(SGD).

受限玻尔兹曼机器的训练与通过随机梯度下降(SGD)进行的常规神经网络训练不同。

The difference v(0)-v(1) can be considered as the reconstruction error that we need to reduce in subsequent steps of the training process. So the weights are adjusted in each iteration to minimize this error and this is what the learning process essentially is.

差异v(0)-v(1)可以看作是我们在训练过程的后续步骤中需要减少的重构误差。 因此,在每次迭代中都会调整权重以最大程度地减少此错误,这实际上就是学习过程。

In the forward pass, we are calculating the probability of output h(1) given the input v(0) and the weights W denoted by:

在前向传递中,我们在给定输入v(0)和权重W的情况下计算输出h(1)的概率,其权重表示为:

Image for post

and in the backward pass, while reconstructing the input, we are calculating the probability of output v(1) given the input h(1) and the weights W denoted by:

在反向遍历中,在重构输入时,我们在给定输入h(1)和权重W的情况下计算出输出v(1)的概率:

Image for post

The weights used in both the forward and the backward pass are the same. Together, these two conditional probabilities lead us to the joint distribution of inputs and the activations:

前向和后向传递中使用的权重相同。 这两个条件概率一起使我们得出输入和激活的联合分布:

Image for post

Reconstruction is different from regression or classification in that it estimates the probability distribution of the original input instead of associating a continuous/discrete value to an input example. This means it is trying to guess multiple values at the same time. This is known as generative learning as opposed to discriminative learning that happens in a classification problem (mapping input to labels).

重构与回归或分类的不同之处在于,它可以估计原始输入的概率分布,而不是将连续/离散值与输入示例相关联。 这意味着它试图同时猜测多个值。 这被称为生成学习,而不是在分类问题(将输入映射到标签)中进行的区别学习。

对比散度(CD-k) (Contrastive Divergence (CD-k))

Boltzmann Machines (and RBMs) are Energy-based models and a joint configuration, (v,h) of the visible and hidden units has energy given by:

Boltzmann机器(和RBM)是基于能量的模型,可见和隐藏单元的联合配置( v,h )的能量为:

Image for post

where vi, hj, are the binary states of the visible unit i and hidden unit j, ai, bj are their biases and wij is the weight between them.

其中v i, h j ,是可见单位i和隐藏单位j, a i, b j的二进制状态j, a i, b j是它们的偏差, w ij 是它们之间的重量。

The probability that the network assigns to a visible vector vis given by summing over all possible hidden vectors:

网络分配给可见向量v的概率是通过对所有可能的隐藏向量求和得出的:

Image for post

Z here is the partition function and is given by summing over all possible pairs of visible and hidden vectors:

Z 这是分区函数,通过对所有可能的可见和隐藏向量对求和来给出:

Image for post

This gives us:

这给我们:

Image for post

The log-likelihood gradient or the derivative of the log probability of a training vector concerning weight is surprisingly simple:

对数似然梯度或训练向量对数的对数概率的导数非常简单:

Image for post

where the angle brackets are used to denote expectations under the distribution specified by the subscript that follows. This leads to a very simple learning rule for performing stochastic steepest ascent in the log probability of the training data:

其中尖括号用于表示在下标指定的分布下的期望。 这导致了一个非常简单的学习规则,用于在训练数据的对数概率中执行随机最陡峭的上升:

Image for post

where alpha is a learning rate.

其中alpha是学习率。

For more information on what the above equations mean or how they are derived, refer to the Guide on training RBM by Geoffrey Hinton. The important thing to note here is that because there are no direct connections between hidden units in an RBM, it is very easy to get an unbiased sample of ⟨vi hj⟩data. Getting an unbiased sample of ⟨vi hj⟩model, however, is much more difficult. This is because it would require us to run a Markov chain until the stationary distribution is reached (which means the energy of the distribution is minimized — equilibrium!) to approximate the second term. So instead of doing that, we perform Gibbs Sampling from the distribution. It is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution when direct sampling is difficult (like in our case). The Gibbs chain is initialized with a training example v(0) of the training set and yields the sample v(k) after k steps. Each step t consists of sampling h(t) from p(h | v(t)) and sampling v(t+1) from p(v | h(t)) subsequently (the value k = 1 surprisingly works quite well). The learning rule now becomes:

有关上述方程式含义或如何推导的更多信息,请参阅Geoffrey Hinton撰写的RBM训练指南 。 这里要注意的重要一点是,由于RBM中隐藏单元之间没有直接连接,因此很容易获得⟨vihj⟩data的无偏样本。 但是,要获取一个无偏的⟨vihj⟩model样本要困难得多。 这是因为这将需要我们运行马尔可夫链,直到达到固定分布(这意味着分布的能量最小化-平衡!)才能近似第二项。 因此,我们没有从分发中执行Gibbs采样 ,而是这样做。 这是一种马尔可夫链蒙特卡罗(MCMC)算法,用于在难以进行直接采样时(例如在我们的案例中)获得一系列观察值,这些观察值从指定的多元概率分布中近似得出。 Gibbs链使用训练集的训练示例v(0)进行初始化,并在k步后生成样本v(k) 。 每个步长t是由采样从P H(t)的 (H | V(t))的从p和取样V(T + 1)|随后(V H(t))的 (值k = 1出奇工作得很好) 。 现在,学习规则变为:

Image for post

The learning works well even though it is only crudely approximating the gradient of the log probability of the training data. The learning rule is much more closely approximating the gradient of another objective function called the Contrastive Divergence which is the difference between two Kullback-Liebler divergences.

即使仅粗略地近似训练数据的对数概率的梯度,学习也能很好地进行。 学习规则更接近于另一个目标函数的梯度,称为“ 对比发散” ,这是两个Kullback-Liebler发散之间的差。

When we apply this, we get:

当我们应用它时,我们得到:

Image for post

where the second term is obtained after each k steps of Gibbs Sampling.

其中第二项是在吉布斯采样的每k步之后获得的。

Now let us understand RBM with the help of an example.

现在让我们借助示例来了解RBM。

RBM的一个实际示例:协同过滤 (A practical example of RBM: Collaborative Filtering)

Image for post

识别数据中的潜在因素 (Recognizing Latent factors in the Data)

Let us assume that some people were asked to rate a set of movies on a scale of 1–5 and each movie could be explained in terms of a set of latent factors(in this case genre) such as action, fantasy, horror, drama, etc. RBMs are used to analyze and find out these underlying latent factors.

让我们假设有人被要求以1-5的评分来评价一组电影,并且每部电影都可以用一系列潜在因素(在这种情况下为流派)来解释,例如动作,幻想,恐怖,戏剧等。RBM用于分析和找出这些潜在的潜在因素。

Image for post

The analysis of hidden factors is performed in a binary way, i.e, the user only tells if they liked (rating 1) a specific movie or not (rating 0) and it represents the inputs for the input/visible layer. Given the inputs, the RMB then tries to discover latent factors in the data that can explain the movie choices and each hidden neuron represents one of the latent factors.

隐藏因素的分析是以二进制方式执行的,即,用户仅告诉他们是否喜欢(评定1)特定电影或不喜欢(评定0),并且它表示输入/可见层的输入。 给定输入,人民币然后尝试在数据中发现可以解释电影选择的潜在因素,每个隐藏的神经元代表潜在因素之一。

Let us consider the following example where a user likes Lord of the Rings and Harry Potter but does not like The Matrix, Fight Club, and Titanic. The Hobbit has not been seen yet so it gets a -1 rating. Given these inputs, the RBM may identify three hidden factors Drama, Fantasy, and Science Fiction which correspond to the movie genres.

让我们考虑以下示例,其中用户喜欢《指环王》和《哈利·波特》,但不喜欢《黑客帝国》,搏击俱乐部和泰坦尼克号。 尚未见到《霍比特人》,所以它获得了-1评分。 给定这些输入,RBM可以识别与电影流派相对应的三个隐藏因素: 戏剧幻想科幻

预测的潜在因素 (Latent Factors for Prediction)

After the training RBM, our goal is to predict a binary rating for the movies that had not been seen yet. Given the training data of a specific user, the network can identify the latent factors based on the user’s preference and sample from Bernoulli distribution can be used to find out which of the visible neurons now become active.

经过RBM训练后,我们的目标是预测尚未上映的电影的二进制等级。 给定特定用户的训练数据,网络可以基于用户的偏好来识别潜在因素,并且可以使用伯努利分布中的样本来找出哪些可见神经元现在变得活跃。

Image for post

The image shows the new ratings after using the hidden neuron values for the inference. The network identified Fantasy as the preferred movie genre and rated The Hobbit as a movie the user would like.

该图显示了使用隐藏的神经元值进行推断后的新评分。 该网络将《 幻想 》确定为首选电影类型,并将《霍比特人》评为用户想要的电影。

The process from training to the prediction phase goes as follows:

训练预测阶段的过程如下:

  • Train the network on the data of all users

    在所有用户的数据上训练网络
  • During inference-time, take the training data of a specific user

    在推理期间,获取特定用户的训练数据
  • Use this data to obtain the activations of hidden neurons

    使用此数据获得隐藏神经元的激活
  • Use the hidden neuron values to get the activations of input neurons

    使用隐藏的神经元值来获取输入神经元的激活
  • The new values of input neurons show the rating the user would give yet unseen movies.

    输入神经元的新值显示用户将给尚未看过的电影的评级。

结论 (Conclusion)

You can interpret RBMs’ output numbers as percentages. Every time the number in the reconstruction is not zero, that’s a good indication the RBM learned the input.

您可以将RBM的输出数字解释为百分比。 每当重建中的数字不为零时 ,这都很好地表明RBM学会了输入。

It should be noted that RBMs do not produce the most stable, consistent results of all shallow, feedforward networks. In many situations, a dense-layer autoencoder works better. Indeed, the industry is moving toward tools such as variational autoencoders and Generative Adversarial Networks(GAN).

应该注意的是,RBM并不会在所有浅层,前馈网络中产生最稳定,一致的结果。 在许多情况下,密集层自动编码器效果更好。 实际上,该行业正朝着诸如变型自动编码器和Generative Adversarial Networks(GAN)之类的工具迈进。

Well, that’s all for this article hope you guys have enjoyed reading it and I’ll be glad if the article is of any help. Feel free to share your thoughts/feedback in the comment section.

好吧,这就是本文的全部内容,希望你们喜欢阅读,如果本文对您有所帮助,我将感到高兴。 随时在评论部分分享您的想法/反馈。

谢谢阅读!!! (Thanks for reading!!!)

翻译自: https://medium.com/@nageshsinghchauhan/introduction-to-restricted-boltzmann-machines-rbms-53089b9b4b15

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值