贝叶斯网-贝叶斯回归

转载:https://weirping.github.io/blog/Bayesian-Networks-regression.html

概述

概率在现代机器学习模型中起着重要的作用。然而我们会发现,使用概率分布的图形表示进行分析很有好处。这种概率分布的图形表示被称为概率图模型(probabilistic graphical models)。概率模型的这种图形表示有如下性质:

  • 它们提供了一种简单的方式将概率模型的结构可视化,可以用于设计新的模型。
  • 通过观察图形,我们可以更深刻地认识模型的性质,如条件独立性。
  • 在复杂模型中,复杂的计算可以表示为图的操作。(这些图的操作实际上代表了复杂的数据表达式的推导)

一个图有两部分组成节点(nodes)和连接(links)。其中节点表示模型中的变量,连接表示节点之间的关系。根据连接是否具有方向性可以将概率图模型分为两类:

  1. 贝叶斯网(Bayesian Networks): 连接具有方向,用箭头表示方向,连接的方向也表示了变量之间的条件关系,如A–>B对应条件概率 p ( B ∣ A ) p(B|A) p(BA)。贝叶斯网也称为有向图模型(directed graphical models)。有向图对于表达随机变量之间的因果关系很有用。
  2. 马尔科夫随机场(Markov random fields): 连接无方向性,也称为无向图模型(undirected graphical models)。无向图对于表示随机变量之间的软限制比较有用。

为了求解推断问题,通常比较方便的做法是把有向图和无向图都转化为一个不同的表示形式,被称为因子图(factor graph)。

本文讨论贝叶斯网。

贝叶斯网络是贝叶斯方法的扩展。它描述的是贝叶斯模型,比如贝叶斯线性回归模型,贝叶斯逻辑回归模型。

数学表达式与图的对应

如上文所述,图模型将数学表达式与图对应起来,从而提供了一种简单的方式将概率模型的结构可视化。

在有向图模型中是怎样将复杂的概率表达式和图对应起来的?

直接举例如下:

在这里插入图片描述

根据上图可以直接将所有随机变量的联合概率分布分解为下式的右边,多个因子的乘积。
p ( x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 ) = p ( x 1 ) p ( x 2 ) p ( x 3 ) p ( x 4 ∣ x 1 , x 2 , x 3 ) p ( x 5 ∣ x 1 , x 3 ) p ( x 6 ∣ x 4 ) p ( x 7 ∣ x 4 , x 5 ) p(x_1,x_2,x_3,x_4,x_5,x_6,x_7) = p(x_1)p(x_2)p(x3)p(x_4|x_1,x_2,x_3)p(x_5|x_1,x_3)p(x_6|x_4)p(x_7|x_4,x_5) p(x1,x2,x3,x4,x5,x6,x7)=p(x1)p(x2)p(x3)p(x4x1,x2,x3)p(x5x1,x3)p(x6x4)p(x7x4,x5)

具体理论请见PRML 8.1。

对于有 K K K个节点的图,这 K K K个节点的联合分布可以表示为:
p ( X ) = ∏ k = 1 K p ( x k ∣ p a k ) p(\mathbf X) = \prod_{k=1}^K p(x_k|pa_k) p(X)=k=1Kp(xkpak)
其中 p a k pa_k pak是节点 x k x_k xk的所有父节点的集合, X S = { x 1 , … x K } \mathbf XS = \{ x_1, \dots x_K\} XS={x1,xK}

贝叶斯网络的图必须是有向无环图。

贝叶斯回归的图模型

先回顾一下贝叶斯回归。

假设训练集有N个样本,样本集的特征用 X \mathrm X X表示, x i x_i xi表示第 i i i个样本。样本集的label值用 T \mathrm T T表示, t i t_i ti表示第 i i i个样本的label值。即 X ≡ ( x 1 … x N ) T \mathbf{X}\equiv (x_{1} \dots x_{N})^{\mathrm{T}} X(x1xN)T T = { t 1 … t N } T \mathrm{T}=\{t_{1} \dots t_{N} \}^{\mathrm{T}} T={t1tN}T ,样本集表示为 D = { X , T } \mathcal D = \{\mathrm X, \mathrm{T} \} D={X,T} 。 基于该数据集训练一个回归模型 y ( x ; w ) y(x;\mathrm w) y(x;w) ,使用该模型根据新数据的特征预测其label值。

线性回归: y ( x , w ) = w T x y(x ,\mathrm w) = \mathrm w ^{\mathrm T} x y(x,w)=wTx

在回归问题中,认为lable值 t t t服从均值为 y ( x , w ) y(x,\mathrm w) y(x,w),方差为 β − 1 \beta^{-1} β1的高斯分布。
p ( t ∣ x , w , β ) = N ( t ∣ y ( x , w ) , β − 1 ) p(t|x,\mathrm w, \beta)=\mathcal N(t|y(x , \mathrm w),\beta^{-1}) p(tx,w,β)=N(ty(x,w),β1)
β \beta β为高斯噪声,反应的是样本集的采样误差即噪声。

贝叶斯学派认为模型中的参数 w \mathrm w w是一个不确定的值,使用概率分布对其进行建模。此处我们假设 w \mathrm{w} w的是服从均值为 0 0 0方差为 α − 1 I {\alpha }^{-1}\mathbf{I} α1I的高斯分布(也可以进行其他假设,其他情况可参考贝叶斯线性回归与贝叶斯逻辑回归)。
p ( w ∣ α ) = N ( w ∣ 0 , α − 1 I ) = ( α 2 π ) ( M + 1 ) / 2 exp ⁡ { − α 2 w T w } p(\mathrm{w}|\alpha )= \mathcal{N}(\mathrm{w}|0,{\alpha }^{-1}\mathbf{I})=(\frac{\alpha }{2\pi })^{(M +1)/2} \exp \{-\frac{\alpha }{2}\mathrm{w}^{\mathrm{T}}\mathrm{w}\} p(wα)=N(w0,α1I)=(2πα)(M+1)/2exp{2αwTw}
总结一下上面涉及的符号:

符号含义
x x x or x i x_i xi一个样本的特征
X \mathbf X X样本集的特征
t t t or t i t_i ti一个样本的label
T \mathrm T T样本集的label
w \mathbf w w模型的参数
β \beta β样本的噪声
α \alpha α w \mathbf w w所服从分布的参数

贝叶斯网络考虑的主要是随机变量。与之等价的是所有随机变量的联合分布那么在贝叶斯模型中的随机变量有哪些呢?

在模型训练阶段只有 w \mathbf w w T = ( t 1 , … , t N ) \mathrm T =(t_1,…,t_N) T=(t1,,tN)是随机变量, X = ( x 1 … x N ) T \mathbf{X}= (x_{1} \dots x_{N})^{\mathrm{T}} X=(x1xN)T β \beta β α \alpha α被称为deterministic parameters,他们是模型的(超)参数而不是随机变量。

随机变量的贝叶斯网

所有随机变量的联合分布可以表示为:
p ( T , w ) = p ( w ) ∏ n = 1 N p ( t n ∣ w ) p(\mathrm T, \mathbf w)=p(\mathbf w)\prod _{n=1}^{N}p(t_n|\mathbf{w}) p(T,w)=p(w)n=1Np(tnw)
注意,每一个样本中的lable t t t 都是联合分布中的一个元素,也是图模型的一个节点。使用圆圈表示随机变量,其图模型表示为如下如所示。

在这里插入图片描述

可以看到上图中需要显示重复表示 N N N t t t节点,太复杂了。对于重复的节点可以改成下图的表示方法。使用一个方框(box)表示重复节点,其中右下角的 N N N 表示重复次数。

在这里插入图片描述

增加模型参数

有时候显示的表达出模型的参数,对于问题的分析是有帮助的。包含模型参数的随机变量的联合分布表示如下。
p ( T , w ∣ X , α , β ) = p ( w ∣ α ) ∏ n = 1 N p ( t n ∣ w , x n , β ) p(\mathrm T, \mathbf w | \mathbf{X},\alpha ,\beta)=p(\mathbf w|\alpha )\prod _{n=1}^Np(t_n|\mathbf w,x_n,\beta) p(T,wX,α,β)=p(wα)n=1Np(tnw,xn,β)
在图模型中,模型参数表示为实心小圆点。

在这里插入图片描述

observed variables

在模型训练过程中,所有的随机变量 T = ( t 1 , … , t N ) \mathrm T =(t_1,…,t_N) T=(t1,,tN) 对于模型来说都是已知的,即观测到的变量(observed variables)。 相应的, w \mathbf w w 是未被观测到的,称为隐变量(latent variable)。

在贝叶斯网中,观测到的变量使用实心圆圈表示,隐变量使用空心圆圈表示。如下图所示:

在这里插入图片描述

增加预测变量

我们的最终目标是对新输入的变量进行预测。假设给定一个输如值 x ^ \hat x x^,我们想找到以观测数据为条件的对应的 t ^ \hat t t^的概率分布。描述这个问题的图模型如下图所示:
在这里插入图片描述

这个模型的所有随机变量的联合分布为:
p ( t ^ , T , w ∣ X , α , β ) = { ∏ n = 1 N p ( t n ∣ w , x n , β ) } p ( w ∣ α ) p ( t ^ ∣ x ^ , w , β ) p(\hat t, \mathrm T, \mathbf w | \mathbf{X},\alpha ,\beta)=\{\prod _{n=1}^Np(t_n|\mathbf w,x_n,\beta)\}p(\mathbf w|\alpha )p(\hat t|\hat x, \mathbf w,\beta) p(t^,T,wX,α,β)={n=1Np(tnw,xn,β)}p(wα)p(t^x^,w,β)

总结
  • 使用圆圈表示随机变量;
  • 观测到的变量使用实心圆圈表示,隐变量使用空心圆圈表示;
  • 使用一个方框(box)表示重复节点,其中右下角的 N N N 表示重复次数;
  • 模型参数表示为实心小圆点,连随机变量的联合分布中是条件变量部分,如 p ( T , w ∣ X , α , β ) p(\mathrm T, \mathbf w | \mathbf{X},\alpha ,\beta) p(T,wX,α,β)
参数的后验分布

对于训练数据来说,所有随机变量的联合分布表示如下:
p ( T , w ∣ X , α , β ) = p ( w ∣ α ) ∏ n = 1 N p ( t n ∣ w , x n , β ) p(\mathrm T, \mathbf w | \mathbf{X},\alpha ,\beta)=p(\mathbf w|\alpha )\prod _{n=1}^Np(t_n|\mathbf w,x_n,\beta) p(T,wX,α,β)=p(wα)n=1Np(tnw,xn,β)
根据贝叶斯公式有参数 w \mathbf w w的后验分布:
p ( w ∣ T , X , α , β ) = p ( T , w ∣ X , α , β ) p ( T ∣ X , α , β ) p( \mathbf w |\mathrm T, \mathbf{X},\alpha ,\beta) = \frac {p(\mathrm T, \mathbf w | \mathbf{X},\alpha ,\beta)}{p(\mathrm T|\mathbf{X},\alpha ,\beta)} p(wT,X,α,β)=p(TX,α,β)p(T,wX,α,β)
其中 T \mathrm T T是观察到的变量, p ( T ∣ X , α , β ) p(\mathrm T|\mathbf{X},\alpha ,\beta) p(TX,α,β) 是一个常数,所以:
p ( w ∣ T , X , α , β ) ∝ p ( T , w ∣ X , α , β ) = p ( w ∣ α ) ∏ n = 1 N p ( t n ∣ w , x n , β ) p( \mathbf w |\mathrm T, \mathbf{X},\alpha ,\beta) \propto p(\mathrm T, \mathbf w | \mathbf{X},\alpha ,\beta) = p(\mathbf w|\alpha )\prod _{n=1}^Np(t_n|\mathbf w,x_n,\beta) p(wT,X,α,β)p(T,wX,α,β)=p(wα)n=1Np(tnw,xn,β)

预测分布

由公式
p ( t ^ , T , w ∣ X , α , β ) = { ∏ n = 1 N p ( t n ∣ w , x n , β ) } p ( w ∣ α ) p ( t ^ ∣ x ^ , w , β ) p(\hat t, \mathrm T, \mathbf w | \mathbf{X},\alpha ,\beta)=\{\prod _{n=1}^Np(t_n|\mathbf w,x_n,\beta)\}p(\mathbf w|\alpha )p(\hat t|\hat x, \mathbf w,\beta) p(t^,T,wX,α,β)={n=1Np(tnw,xn,β)}p(wα)p(t^x^,w,β)
对于新数据 t ^ \hat t t^在给定训练数据集 { X , T } \{\mathbf{X},\mathrm T\} {X,T}时的预测分布
p ( t ^ ∣ X , T , α , β ) = p ( t ^ , T ∣ X , α , β ) p ( T ∣ X , α , β ) p(\hat t | \mathbf{X},\mathrm T, \alpha ,\beta) = \frac {p(\hat t, \mathrm T | \mathbf{X},\alpha ,\beta)}{p(\mathrm T|\mathbf{X},\alpha ,\beta)} p(t^X,T,α,β)=p(TX,α,β)p(t^,TX,α,β)
其中 T \mathrm T T是观察到的变量, p ( T ∣ X , α , β ) p(\mathrm T|\mathbf{X},\alpha ,\beta) p(TX,α,β) 是一个常数,所以:
p ( t ^ ∣ X , T , α , β ) ∝ p ( t ^ , T ∣ X , α , β ) = ∫ p ( t ^ , T , w ∣ X , α , β ) d w p(\hat t | \mathbf{X},\mathrm T, \alpha ,\beta) \propto p(\hat t, \mathrm T | \mathbf{X},\alpha ,\beta) = \int p(\hat t, \mathrm T, \mathbf w | \mathbf{X},\alpha ,\beta) d{\mathbf w} p(t^X,T,α,β)p(t^,TX,α,β)=p(t^,T,wX,α,β)dw

参考资料

Pattern Recognition and Machine Learning

  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
用python写的一段贝叶斯网络的程序 This file describes a Bayes Net Toolkit that we will refer to now as BNT. This version is 0.1. Let's consider this code an "alpha" version that contains some useful functionality, but is not complete, and is not a ready-to-use "application". The purpose of the toolkit is to facilitate creating experimental Bayes nets that analyze sequences of events. The toolkit provides code to help with the following: (a) creating Bayes nets. There are three classes of nodes defined, and to construct a Bayes net, you can write code that calls the constructors of these classes, and then you can create links among them. (b) displaying Bayes nets. There is code to create new windows and to draw Bayes nets in them. This includes drawing the nodes, the arcs, the labels, and various properties of nodes. (c) propagating a-posteriori probabilities. When one node's probability changes, the posterior probabilities of nodes downstream from it may need to change, too, depending on firing thresholds, etc. There is code in the toolkit to support that. (d) simulating events ("playing" event sequences) and having the Bayes net respond to them. This functionality is split over several files. Here are the files and the functionality that they represent. BayesNetNode.py: class definition for the basic node in a Bayes net. BayesUpdating.py: computing the a-posteriori probability of a node given the probabilities of its parents. InputNode.py: class definition for "input nodes". InputNode is a subclass of BayesNetNode. Input nodes have special features that allow them to recognize evidence items (using regular-expression pattern matching of the string descriptions of events). OutputNode.py: class definition for "output nodes". OutputBode is a subclass of BayesNetNode. An output node can have a list of actions to be performed when the node's posterior probability exceeds a threshold ReadWriteSigmaFiles.py: Functionality for loading and saving Bayes nets in an XML format. SampleNets.py: Some code that constructs a sample Bayes net. This is called when SIGMAEditor.py is started up. SIGMAEditor.py: A main program that can be turned into an experimental application by adding menus, more code, etc. It has some facilities already for loading event sequence files and playing them. sample-event-file.txt: A sequence of events that exemplifies the format for these events. gma-mona.igm: A sample Bayes net in the form of an XML file. The SIGMAEditor program can read this type of file. Here are some limitations of the toolkit as of 23 February 2009: 1. Users cannot yet edit Bayes nets directly in the SIGMAEditor. Code has to be written to create new Bayes nets, at this time. 2. If you select the File menu's option to load a new Bayes net file, you get a fixed example: gma-mona.igm. This should be changed in the future to bring up a file dialog box so that the user can select the file. 3. When you "run" an event sequence in the SIGMAEditor, the program will present each event to each input node and find out if the input node's filter matches the evidence. If it does match, that fact is printed to standard output, but nothing else is done. What should then happen is that the node's probability is updated according to its response method, and if the new probability exceeds the node's threshold, then its successor ("children") get their probabilities updated, too. 4. No animation of the Bayes net is performed when an event sequence is run. Ideally, the diagram would be updated dynamically to show the activity, especially when posterior probabilities of nodes change and thresholds are exceeded. To use the BNT, do three kinds of development: A. create your own Bayes net whose input nodes correspond to pieces of evidence that might be presented and that might be relevant to drawing inferences about what's going on in the situation or process that you are analyzing. You do this by writing Python code that calls constructors etc. See the example in SampleNets.py. B. create a sample event stream that represents a plausible sequence of events that your system should be able to analyze. Put this in a file in the same format as used in sample-event-sequence.txt. C. modify the code of BNT or add new modules as necessary to obtain the functionality you want in your system. This could include code to perform actions whenever an output node's threshold is exceeded. It could include code to generate events (rather than read them from a file). And it could include code to describe more clearly what is going on whenever a node's probability is updated (e.g., what the significance of the update is -- more certainty about something, an indication that the weight of evidence is becoming strong, etc.)

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值