【转载】Boltzmann machine

本文介绍了玻尔兹曼机的基础概念及其学习算法。玻尔兹曼机是一种能够发现训练数据中复杂规律特征的网络模型,特别适用于解决搜索和学习问题。文章详细解释了其随机动力学特性、学习规则,并探讨了不同类型的玻尔兹曼机。

Boltzmann machine

Geoffrey E. Hinton (2007), Scholarpedia, 2(5):1668.doi:10.4249/scholarpedia.1668revision #91075 [link to/cite this article]
Post-publication activity

Curator: Geoffrey E. Hinton

Boltzmann machine is a network of symmetrically connected, neuron-like units that make stochastic decisions about whether to be on or off. Boltzmann machines have a simple learning algorithm (Hinton & Sejnowski, 1983) that allows them to discover interesting features that represent complex regularities in the training data. The learning algorithm is very slow in networks with many layers of feature detectors, but it is fast in "restricted Boltzmann machines" that have a single layer of feature detectors. Many hidden layers can be learned efficiently by composing restricted Boltzmann machines, using the feature activations of one as the training data for the next.

Boltzmann machines are used to solve two quite different computational problems. For a search problem, the weights on the connections are fixed and are used to represent a cost function. The stochastic dynamics of a Boltzmann machine then allow it to sample binary state vectors that have low values of the cost function.

For a learning problem, the Boltzmann machine is shown a set of binary data vectors and it must learn to generate these vectors with high probability. To do this, it must find weights on the connections so that, relative to other possible binary vectors, the data vectors have low values of the cost function. To solve a learning problem, Boltzmann machines make many small updates to their weights, and each update requires them to solve many different search problems.

Contents

 [hide

The stochastic dynamics of a Boltzmann machine

When unit i is given the opportunity to update its binary state, it first computes its total input, zi , which is the sum of its own bias, bi , and the weights on connections coming from other active units:

zi=bi+jsjwij(1)

 

where wij is the weight on the connection between i and j , and sj is 1 if unit j is on and 0 otherwise. Unit i then turns on with a probability given by the logistic function:

prob(si=1)  =  11+ezi(2)

 


If the units are updated sequentially in any order that does not depend on their total inputs, the network will eventually reach a Boltzmann distribution (also called itsequilibrium or stationary distribution) in which the probability of a state vector, v , is determined solely by the "energy" of that state vector relative to the energies of all possible binary state vectors:

P(v)=eE(v)/ueE(u)(3)

 

As in Hopfield nets, the energy of state vector v is defined as

E(v)=isvibii<jsvisvjwij(4)

 

where svi is the binary state assigned to unit i by state vector v .

If the weights on the connections are chosen so that the energies of state vectors represent the cost of those state vectors, then the stochastic dynamics of a Boltzmann machine can be viewed as a way of escaping from poor local optima while searching for low-cost solutions. The total input to unit i , zi , represents the difference in energy depending on whether that unit is off or on, and the fact that unit i occasionally turns on even if zi is negative means that the energy can occasionally increase during the search, thus allowing the search to jump over energy barriers.

The search can be improved by using simulated annealing. This scales down all of the weights and energies by a factor, T , which is analogous to the temperature of a physical system. By reducing T from a large initial value to a small final value, it is possible to benefit from the fast equilibration at high temperatures and still have a final equilibrium distribution that makes low-cost solutions much more probable than high-cost ones. At a temperature of 0 the update rule becomes deterministic and a Boltzmann machine turns into a Hopfield Network

Learning in Boltzmann machines

Learning without hidden units

Given a training set of state vectors (the data), learning consists of finding weights and biases (the parameters) that make those state vectors good. More specifically, the aim is to find weights and biases that define a Boltzmann distribution in which the training vectors have high probability. By differentiating Eq. (3) and using the fact that E(v)/wij=svisvj it can be shown that

logP(v)wijdata=sisjdatasisjmodel(5)

 

where data is an expected value in the data distribution and model is an expected value when the Boltzmann machine is sampling state vectors from its equilibrium distribution at a temperature of 1. To perform gradient ascent in the log probability that the Boltzmann machine would generate the observed data when sampling from its equilibrium distribution, wij is incremented by a small learning rate times the RHS of Eq. (5). The learning rule for the bias, bi , is the same as Eq. (5), but with sj ommitted.

If the observed data specifies a binary state for every unit in the Boltzmann machine, the learning problem is convex: There are no non-global optima in the parameter space. However, sampling from model may involve overcoming energy barriers in the binary state space.

Learning with hidden units

Learning becomes much more interesting if the Boltzmann machine consists of some "visible" units, whose states can be observed, and some "hidden" units whose states are not specified by the observed data. The hidden units act as latent variables (features) that allow the Boltzmann machine to model distributions over visible state vectors that cannot be modelled by direct pairwise interactions between the visible units. A surprising property of Boltzmann machines is that, even with hidden units, the learning rule remains unchanged. This makes it possible to learn binary features that capture higher-order structure in the data. With hidden units, the expectation sisjdata is the average, over all data vectors, of the expected value of sisj when a data vector is clamped on the visible units and the hidden units are repeatedly updated until they reach equilibrium with the clamped data vector.

It is surprising that the learning rule is so simple because logP(v)/wij depends on all the other weights in the network. Fortunately, the locally available difference in the two correlations in Eq. (5) tells wij everthing it needs to know about the other weights. This makes it unnecessary to explicitly propagate error derivatives, as in the backpropagation algorithm.

Different types of Boltzmann machine

Higher-order Boltzmann machines

The stochastic dynamics and the learning rule can accommodate more complicated energy functions (Sejnowski, 1986). For example, the quadratic energy function in Eq. (4) can be replaced by an energy function whose typical term is sisjskwijk . The total input to unit i that is used in the update rule must then be replaced by zi=bi+j<ksjskwijk . The only change in the learning rule is that sisj is replaced by sisjsk .

Conditional Boltzmann machines

Boltzmann machines model the distribution of the data vectors, but there is a simple extension for modeling conditional distributions (Ackley et. al. ,1985). The only difference between the visible and the hidden units is that, when sampling sisjdata , the visible units are clamped and the hidden units are not. If a subset of the visible units are also clamped when sampling sisjmodel this subset acts as "input" units and the remaining visible units act as "output" units. The same learning rule applies, but now it maximizes the log probabilities of the observed output vectors conditional on the input vectors.

Mean field Boltzmann machines

Instead of using units that have stochastic binary states, it is possible to use "mean field" units that have deterministic, real-valued states between 0 and 1, as in an analog Hopfield net. Eq. (2) is used to compute an "ideal" value for a unit's state given the current states of the other units and the actual value is moved towards the ideal value by some fraction of the difference. If this fraction is small, all the units can be updated in parallel. The same learning rules can be used by simply replacing the stochastic, binary values by the deterministic real-values (Petersen and Andersen, 1987), but the learning algorithm is hard to justify and mean field nets have problems modeling multi-modal distributions.

Non-binary units

The binary stochastic units used in Boltzmann machines can be generalized to "softmax" units that have more than 2 discrete values, Gaussian units whose output is simply their total input plus Gaussian noise, binomial units, Poisson units, and any other type of unit that falls in the exponential family (Welling et. al., 2005). This family is characterized by the fact that the adjustable parameters have linear effects on the log probabilities. The general form of the gradient required for learning is simply the change in the sufficient statistics caused by clamping data on the visible units.

The speed of learning

Learning is typically very slow in Boltzmann machines with many hidden layers because large networks can take a long time to approach their equilibrium distribution, especially when the weights are large and the equilibrium distribution is highly multimodal, as it usually is when the visible units are unclamped. Even if samples from the equilibrium distribution can be obtained, the learning signal is very noisy because it is the difference of two sampled expectations. These difficulties can be overcome by restricting the connectivity, simplifying the learning algorithm, and learning one hidden layer at a time.

Restricted Boltzmann machines

A restricted Boltzmann machine (Smolensky, 1986) consists of a layer of visible units and a layer of hidden units with no visible-visible or hidden-hidden connections. With these restrictions, the hidden units are conditionally independent given a visible vector, so unbiased samples from sisjdata can be obtained in one parallel step. To sample from sisjmodel still requires multiple iterations that alternate between updating all the hidden units in parallel and updating all of the visible units in parallel. However, learning still works well if sisjmodel is replaced by sisjreconstruction which is obtained as follows:

  1. Starting with a data vector on the visible units, update all of the hidden units in parallel.
  2. Update all of the visible units in parallel to get a "reconstruction".
  3. Update all of the hidden units again.

This efficient learning procedure does approximate gradient descent in a quantity called "contrastive divergence" and works well in practice (Hinton, 2002).

Learning deep networks by composing restricted Boltzmann machines

After learning one hidden layer, the activity vectors of the hidden units, when they are being driven by the real data, can be treated as "data" for training another restricted Boltzmann machine. This can be repeated to learn as many hidden layers as desired. After learning multiple hidden layers in this way, the whole network can be viewed as a single, multilayer generative model and each additional hidden layer improves a lower bound on the probability that the multilayer model would generate the training data (Hinton et. al., 2006).

Learning one hidden layer at a time is a very effective way to learn deep neural networks with many hidden layers and millions of weights. Even though the learning is unsupervised, the highest level features are typically much more useful for classification than the raw data vectors. These deep networks can be fine-tuned to be better at classification or dimensionality reduction using the backpropagation algorithm (Hinton & Salakhutdinov, 2006). Alternatively, they can be fine-tuned to be better generative models using a version of the "wake-sleep" algorithm (Hinton et. al., 2006).

Relationships to other models

Markov random fields and Ising models

Boltzmann machines are a type of Markov random field, but most Markov random fields have simple, local interaction weights which are designed by hand rather than being learned. Boltzmann machines are Ising models, but Ising models typically use random or hand-designed interaction weights.

Graphical models

The learning algorithm for Boltzmann machines was the first learning algorithm for undirected graphical models with hidden variables (Jordan 1998). When restricted Boltzmann machines are composed to learn a deep network, the top two layers of the resulting graphical model form an unrestricted Boltzmann machine, but the lower layers form a directed acyclic graph with directed connections from higher layers to lower layers (Hinton et. al. 2006).

Gibbs sampling

The search procedure for Boltzmann machines is an early example of Gibbs sampling, a Markov chain Monte Carlo method which was invented independently (Geman & Geman, 1984) and was also inspired by simulated annealing.

Conditional random fields

Conditional random fields (Lafferty et. al., 2001) can be viewed as simplified versions of higher-order, conditional Boltzmann machines in which the hidden units have been eliminated. This makes the learning problem convex, but removes the ability to learn new features.

References

  • Ackley, D., Hinton, G., and Sejnowski, T. (1985). A Learning Algorithm for Boltzmann Machines. Cognitive Science, 9(1):147-169.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6):721-741.
  • Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711-1800.
  • Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief netsNeural Computation, 18:1527-1554.
  • Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313:504-507.
  • Hinton, G. E. and Sejnowski, T. J. (1983). Optimal Perceptual Inference. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Washington DC, pp. 448-453.
  • Jordan, M. I. (1998) Learning in Graphical Models, MIT press, Cambridge Mass.
  • Lafferty, J. and McCallum, A. and Pereira, F. (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. 18th International Conf. on Machine Learning, pages 282-289 Morgan Kaufmann, San Francisco, CA
  • Peterson, C. and Anderson, J.R. (1987), A mean field theory learning algorithm for neural networks. Complex Systems, 1(5):995--1019.
  • Sejnowski, T. J. (1986). Higher-order Boltzmann machines. AIP Conference Proceedings, 151(1):398-403.
  • Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In Rumelhart, D. E. and McClelland, J. L., editors, Parallel Distributed Processing: Volume 1: Foundations, pages 194-281. MIT Press, Cambridge, MA.
  • Welling, M., Rosen-Zvi, M., and Hinton, G. E. (2005). Exponential family harmoniums with an application to information retrieval. Advances in Neural Information Processing Systems 17, pages 1481-1488. MIT Press, Cambridge, MA.

Internal references

See also

Associative MemoryBoltzmann DistributionHopfield NetworkNeural NetworksSimulated AnnealingUnsupervised Learning

转载于:https://www.cnblogs.com/daleloogn/p/4442366.html

内容概要:本文介绍了一个基于MATLAB实现的无人机三维路径规划项目,采用蚁群算法(ACO)与多层感知机(MLP)相结合的混合模型(ACO-MLP)。该模型通过三维环境离散化建模,利用ACO进行全局路径搜索,并引入MLP对环境特征进行自适应学习与启发因子优化,实现路径的动态调整与多目标优化。项目解决了高维空间建模、动态障碍规避、局部最优陷阱、算法实时性及多目标权衡等关键技术难题,结合并行计算与参数自适应机制,提升了路径规划的智能性、安全性和工程适用性。文中提供了详细的模型架构、核心算法流程及MATLAB代码示例,涵盖空间建模、信息素更新、MLP训练与融合优化等关键步骤。; 适合人群:具备一定MATLAB编程基础,熟悉智能优化算法与神经网络的高校学生、科研人员及从事无人机路径规划相关工作的工程师;适合从事智能无人系统、自动驾驶、机器人导航等领域的研究人员; 使用场景及目标:①应用于复杂三维环境下的无人机路径规划,如城市物流、灾害救援、军事侦察等场景;②实现飞行安全、能耗优化、路径平滑与实时避障等多目标协同优化;③为智能无人系统的自主决策与环境适应能力提供算法支持; 阅读建议:此资源结合理论模型与MATLAB实践,建议读者在理解ACO与MLP基本原理的基础上,结合代码示例进行仿真调试,重点关注ACO-MLP融合机制、多目标优化函数设计及参数自适应策略的实现,以深入掌握混合智能算法在工程中的应用方法。
### Boltzmann机介绍及其在人工智能中的应用 #### 定义与历史背景 Boltzmann机是由特伦斯·谢诺夫斯基(Terrence Sejnowski) 和 杰夫·辛顿 (Geoffrey Hinton) 发明的一种随机神经网络模型,其设计受到了统计力学中玻尔兹曼分布的启发[^2]。这种类型的机器能够通过模拟退火算法找到能量函数下的全局最小值。 #### 学习机制 Boltzmann机的学习过程依赖于对比散度(Contrastive Divergence),这是一种近似最大似然估计的方法。在网络训练期间,权重更新遵循梯度下降原则,旨在减少自由能差。具体来说,在给定输入模式下调整连接权值使得该模式更可能被激活;而在未提供特定输入时,则鼓励隐藏单元之间的相互作用以发现潜在的数据结构特性。 #### 应用场景 由于具备强大的表示能力,Boltzmann机已被广泛应用于多个领域: - **图像识别**:通过对像素强度间的复杂关系进行建模,从而实现高效的视觉对象分类任务。 - **自然语言处理(NLP)**:可用于构建词向量空间或完成序列预测等问题。 - **推荐系统**:借助用户行为数据挖掘个性化偏好信息,进而优化商品推送策略。 此外,受限版本——即受限玻尔兹曼机(RBM)—更是成为了深层信念网(DBN) 的基础构件之一,并推动了深度学习的发展进程。 ```python import numpy as np class RBM: def __init__(self, num_visible, num_hidden): self.num_hidden = num_hidden self.num_visible = num_visible # Initialize weights randomly with small values. self.weights = 0.1 * np.random.randn(self.num_visible, self.num_hidden) def train(self, data, max_epochs=1000, learning_rate=0.1): """Train the model using contrastive divergence.""" for epoch in range(max_epochs): pos_hidden_activations = sigmoid(np.dot(data, self.weights)) ... def sigmoid(x): return 1 / (1 + np.exp(-x)) # Example usage of training an RBM instance rbm = RBM(num_visible=6, num_hidden=2) training_data = [[1,1,1,0,0,1], [1,0,1,1,1,1], [1,1,1,1,0,1]] rbm.train(training_data) ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符  | 博主筛选后可见
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值