概率图模型11:Minimal I-Maps

作者:孙相国

E-mail:sunxiangguodut@qq.com

1. 引言

如我们之前讨论过的,实际问题中,变量的联合概率分布的原子情况往往非常巨大,我们根本不可能,或者说我们的数据也不可能把所有的情况都囊括其中。这就意味着,我们很难全面的发现这个真实的概率分布,而我们所能够做到的就是根据已有的数据,尽可能的发掘这个真是概率分布中的独立性子集。然后构建一个满足这个独立性子集的I-map。本节的工作是:给定一个概率分布 P ,我们能在多大程度上构建出一个图,使得这个图为 P 的一个IMap呢?一般的情况是,根据一部分独立性集合,我们可以构建多个字图。为此我们希望找一个特殊的。

2.回顾

定理1:令 是定义在变量集 上的一个贝叶斯网络,并且 P 是同一个空间上的联合分布。如果 P 的一个I-map,那么P根据 因子分解。

证明:

假定 X1,X2,,Xn 的顺序就是图 的一个拓扑序。

由概率的链式法则有:

>P(X1,,Xn)=P(X1)P(X2|X1)P(X3|X1,X2)P(Xn|X1,,Xn1)>

由于 为I-map,因此 中蕴含了如下的独立性论断: l()={(XiNonDescendantsXi|PaXi):XiX1:n} .且 l()(P)

由于 X1,X2,,Xn 是图 的一个拓扑序,因此对于式子 (11) 中的任意一项 P(Xi|X1,,Xi1) Xi 的所有父节点都在集合 {X1,,Xi1} 中,并且这个集合不存在任何 Xi 的后代节点,即: {X1,,Xi1}=PaXiZ,ZNonDescendantsXi ,根据独立性论断 l() 和条件独立性分解性质,有: P(Xi|X1,,Xi1)=P(Xi|PaXiZ)=P(Xi|PaXi) ,进而有公式 (9) .

得证

定理2:令 是定义在变量集 上的一个贝叶斯网络,并且 P 是同一个空间上的联合分布。如果P根据 因子分解,那么 P 的一个I-map。

P是某个根据 Gstudents 因子分解的概率分布。我们需要证明 (Gstudents) P 中成立。考虑任意随机变量Xk的独立性假设 (XkNonDescendantsXk|PaXk) ,为了证明其在P中成立,需要证明:

P(Xk|NonDescendantsXk,PaXk)=P(Xk|PaXk)(1)

根据定义,
P(Xk|NonDescendantsXk,PaXk)=P(Xk,NonDescendantsXk,PaXk)P(NonDescendantsXk,PaXk)(2)

根据贝叶斯网的链式法则,分式的分子为:
P(Xk,NonDescendantsXk,PaXk)=ΠXiDescendantsXkP(Xi|PaXi)(3)

通过对联合分布执行边缘化,分式的分母为:
P(NonDescendantsXk,PaXk)=XkP(Xk,NonDescendantsXk,PaXk)=XkΠXiDescendantsXkP(Xi|PaXi)=XkP(Xk|PaXk)ΠXiDescendantsXk,XiXkP(Xi|PaXi)=ΠXiDescendantsXk,XiXkP(Xi|PaXi)XkP(Xk|PaXk)=ΠXiDescendantsXk,XiXkP(Xi|PaXi)(4)

这样, (2) 可以写为:
P(Xk|NonDescendantsXk,PaXk)=P(Xk,NonDescendantsXk,PaXk)P(NonDescendantsXk,PaXk)=ΠXiDescendantsXkP(Xi|PaXi)ΠXiDescendantsXk,XiXkP(Xi|PaXi)=P(Xk|PaXk)ΠXiDescendantsXk,XiXkP(Xi|PaXi)ΠXiDescendantsXk,XiXkP(Xi|PaXi)=P(Xk|PaXk)

证毕

3. minimal I-map

A graph is a minimal I-map for a set of independencies if it is an I-map for , and if the removal of even a single edge from renders it not an I-map.

第2节的定理1和定理2为我们找到minimal I-map提供了依据,We assume we are given a predetermined variable ordering, say, {X 1 , … , X n }. We now examine each variable X i , i = 1, … , n in turn. For each X i , we pick some minimal subset U of {X 1 , … , X i−1 } to be X i ’s parents in G. More precisely, we require that U satisfy (X i ⊥ {X 1 , … , X i−1 } − U | U), and that no node can be removed from U without violating this property. We then set U to be the parents of X i .

The proof of theorem 1 tells us that, if each node X i is independent of X 1 , … , X i−1 given its parents in G, then P factorizes over G. We can then conclude from theorem 3.2 that G is an I-map for P. By construction, G is minimal, so that G is a minimal I-map for P.

Screen Shot 2017-11-20 at 2.28.54 PM

事实上,给定一个拓扑序列,找 Xi 节点的父节点最小集 U ,这个最小集U的寻找并不是唯一的,例如有 X1,X2,X3 这3个节点,其中 X1,X2 在逻辑上等价(如下图),那么我们可以选择 X1,X2 中的任一个节点作为 X3 的父节点,不过一旦选择了一个,就不等选择另一个了,Hence, the minimal parent set U in our construction is not necessarily unique.

However, one can show that, if the distribution is positive (see definition 2.5), that is, if for any instantiation ξ to all the network variables X we have that P(ξ) > 0, then the choice of parent set, given an ordering, is unique. Under this assumption, algorithm 3.2 can produce all minimal I-maps for P: Let G be any minimal I-map for P. If we give call Build-Minimal-I-Map with an ordering ≺ that is topological for G, then, due to the uniqueness argument, the algorithm must return G.

Picture1

At first glance, the minimal I-map seems to be a reasonable candidate for capturing the structure in the distribution: It seems that if G is a minimal I-map for a distribution P, then we should be able to “read off” all of the independencies in P directly from G. Unfortunately, this intuition is false.

A distribution P is said to be positive if for all events α ∈ S such that α = ∅, we have that P(α) > 0.


4. Minimal I-Map的问题

Note that the graphs in figure 3.8b,c really are minimal I-maps for this distribution. However, they fail to capture some or all of the independencies that hold in the distribution. Thus, they show that the fact that G is a minimal I-map for P is far from a guarantee that G captures the independence structure in P.

Screen Shot 2017-11-20 at 3.28.04 PM

Consider the distribution P B student , as defined in figure 3.4, and let us go through the process of constructing a minimal I-map for P B student . We note that the graph G student precisely reflects the

independencies in this distribution P B student (that is, I(P B student ) = I(G student )), so that we can use G student to determine which independencies hold in P B student .

Our construction process starts with an arbitrary ordering on the nodes; we will go through this process for three different orderings. Throughout this process, it is important to remember that we are testing independencies relative to the distribution P B student . We can use G student (figure 3.4) to guide our intuition about which independencies hold in P B student , but we can always resort to testing these independencies in the joint distribution P B student .

The first ordering is a very natural one: D, I, S, G, L. We add one node at a time and see which of the possible edges from the preceding nodes are redundant. We start by adding D, then I. We can now remove the edge from D to I because this particular distribution satisfies (I ⊥ D), so I is independent of D given its other parents (the empty set). Continuing on, we add S, but we can remove the edge from D to S because our distribution satisfies (S ⊥ D | I). We then add G, but we can remove the edge from S to G, because the distribution satisfies (G ⊥ S | I, D).

Finally, we add L, but we can remove all edges from D, I, S. Thus, our final output is the graph in figure 3.8a, which is precisely our original network for this distribution.

Now, consider a somewhat less natural ordering: L, S, G, I, D. In this case, the resulting I-map is not quite as natural or as sparse. To see this, let us consider the sequence of steps. We start by adding L to the graph. Since it is the first variable in the ordering, it must be a root. Next, we consider S. The decision is whether to have L as a parent of S. Clearly, we need an edge from L to S, because the quality of the student’s letter is correlated with his SAT score in this distribution, and S has no other parents that help render it independent of L. Formally, we have that (S ⊥ L) does not hold in the distribution. In the next iteration of the algorithm, we introduce G. Now, all possible subsets of {L, S} are potential parents set for G. Clearly, G is dependent on L. Moreover, although G is independent of S given I, it is not independent of S given L. Hence, we must add the edge between S and G. Carrying out the procedure, we end up with the graph shown in figure 3.8b.

Finally, consider the ordering: L, D, S, I, G. In this case, a similar analysis results in the graph shown in figure 3.8c, which is almost a complete graph, missing only the edge from S to G, which we can remove because G is independent of S given I.

为了解决这样的问题,我们接下来将要提到的概念是P-Maps,请见此系列下一篇博文。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值