作者:孙相国
E-mail:sunxiangguodut@qq.com
1. 引言
如我们之前讨论过的,实际问题中,变量的联合概率分布的原子情况往往非常巨大,我们根本不可能,或者说我们的数据也不可能把所有的情况都囊括其中。这就意味着,我们很难全面的发现这个真实的概率分布,而我们所能够做到的就是根据已有的数据,尽可能的发掘这个真是概率分布中的独立性子集。然后构建一个满足这个独立性子集的I-map。本节的工作是:给定一个概率分布
P
,我们能在多大程度上构建出一个图
2.回顾
定理1:令
是定义在变量集
上的一个贝叶斯网络,并且
P
是同一个空间上的联合分布。如果
证明:
假定 X1,X2,⋯,Xn 的顺序就是图 的一个拓扑序。
由概率的链式法则有:
>P(X1,⋯,Xn)=P(X1)P(X2|X1)P(X3|X1,X2)⋯P(Xn|X1,⋯,Xn−1)>
由于 为I-map,因此 中蕴含了如下的独立性论断: l()={(Xi⊥NonDescendantsXi|PaXi):Xi∈X1:n} .且 l()⊆(P) 。由于 X1,X2,⋯,Xn 是图 的一个拓扑序,因此对于式子 (11) 中的任意一项 P(Xi|X1,⋯,Xi−1) , Xi 的所有父节点都在集合 {X1,⋯,Xi−1} 中,并且这个集合不存在任何 Xi 的后代节点,即: {X1,⋯,Xi−1}=PaXi∪Z,Z⊆NonDescendantsXi ,根据独立性论断 l() 和条件独立性分解性质,有: P(Xi|X1,⋯,Xi−1)=P(Xi|PaXi∪Z)=P(Xi|PaXi) ,进而有公式 (9) .
得证
定理2:令
是定义在变量集
上的一个贝叶斯网络,并且
P
是同一个空间上的联合分布。如果
令
根据定义,
根据贝叶斯网的链式法则,分式的分子为:
通过对联合分布执行边缘化,分式的分母为:
这样, (2) 可以写为:
证毕
3. minimal I-map
A graph is a minimal I-map for a set of independencies if it is an I-map for , and if the removal of even a single edge from renders it not an I-map.
第2节的定理1和定理2为我们找到minimal I-map提供了依据,We assume we are given a predetermined variable ordering, say, {X 1 , … , X n }. We now examine each variable X i , i = 1, … , n in turn. For each X i , we pick some minimal subset U of {X 1 , … , X i−1 } to be X i ’s parents in G. More precisely, we require that U satisfy (X i ⊥ {X 1 , … , X i−1 } − U | U), and that no node can be removed from U without violating this property. We then set U to be the parents of X i .
The proof of theorem 1 tells us that, if each node X i is independent of X 1 , … , X i−1 given its parents in G, then P factorizes over G. We can then conclude from theorem 3.2 that G is an I-map for P. By construction, G is minimal, so that G is a minimal I-map for P.
事实上,给定一个拓扑序列,找
Xi
节点的父节点最小集
U
,这个最小集
However, one can show that, if the distribution is positive (see definition 2.5), that is, if for any instantiation ξ to all the network variables X we have that P(ξ) > 0, then the choice of parent set, given an ordering, is unique. Under this assumption, algorithm 3.2 can produce all minimal I-maps for P: Let G be any minimal I-map for P. If we give call Build-Minimal-I-Map with an ordering ≺ that is topological for G, then, due to the uniqueness argument, the algorithm must return G.
At first glance, the minimal I-map seems to be a reasonable candidate for capturing the structure in the distribution: It seems that if G is a minimal I-map for a distribution P, then we should be able to “read off” all of the independencies in P directly from G. Unfortunately, this intuition is false.
A distribution P is said to be positive if for all events α ∈ S such that α = ∅, we have that P(α) > 0.
4. Minimal I-Map的问题
Note that the graphs in figure 3.8b,c really are minimal I-maps for this distribution. However, they fail to capture some or all of the independencies that hold in the distribution. Thus, they show that the fact that G is a minimal I-map for P is far from a guarantee that G captures the independence structure in P.
Consider the distribution P B student , as defined in figure 3.4, and let us go through the process of constructing a minimal I-map for P B student . We note that the graph G student precisely reflects the
independencies in this distribution P B student (that is, I(P B student ) = I(G student )), so that we can use G student to determine which independencies hold in P B student .
Our construction process starts with an arbitrary ordering on the nodes; we will go through this process for three different orderings. Throughout this process, it is important to remember that we are testing independencies relative to the distribution P B student . We can use G student (figure 3.4) to guide our intuition about which independencies hold in P B student , but we can always resort to testing these independencies in the joint distribution P B student .
The first ordering is a very natural one: D, I, S, G, L. We add one node at a time and see which of the possible edges from the preceding nodes are redundant. We start by adding D, then I. We can now remove the edge from D to I because this particular distribution satisfies (I ⊥ D), so I is independent of D given its other parents (the empty set). Continuing on, we add S, but we can remove the edge from D to S because our distribution satisfies (S ⊥ D | I). We then add G, but we can remove the edge from S to G, because the distribution satisfies (G ⊥ S | I, D).
Finally, we add L, but we can remove all edges from D, I, S. Thus, our final output is the graph in figure 3.8a, which is precisely our original network for this distribution.
Now, consider a somewhat less natural ordering: L, S, G, I, D. In this case, the resulting I-map is not quite as natural or as sparse. To see this, let us consider the sequence of steps. We start by adding L to the graph. Since it is the first variable in the ordering, it must be a root. Next, we consider S. The decision is whether to have L as a parent of S. Clearly, we need an edge from L to S, because the quality of the student’s letter is correlated with his SAT score in this distribution, and S has no other parents that help render it independent of L. Formally, we have that (S ⊥ L) does not hold in the distribution. In the next iteration of the algorithm, we introduce G. Now, all possible subsets of {L, S} are potential parents set for G. Clearly, G is dependent on L. Moreover, although G is independent of S given I, it is not independent of S given L. Hence, we must add the edge between S and G. Carrying out the procedure, we end up with the graph shown in figure 3.8b.
Finally, consider the ordering: L, D, S, I, G. In this case, a similar analysis results in the graph shown in figure 3.8c, which is almost a complete graph, missing only the edge from S to G, which we can remove because G is independent of S given I.
为了解决这样的问题,我们接下来将要提到的概念是P-Maps,请见此系列下一篇博文。