Sum-Product Networks: A New Deep Architecture
H. Poon, P. Domingos, Sum-Product Networks: A New Deep Architecture, ICCV (2011), Best Paper
摘要
图模型(graphical model)推理(inference)和学习(learning)的主要制约因素(key limiting factor)为配分函数(partition function)的复杂度。
本文提出一种和积网络(SPN):以变量为叶节点,中间节点为和、积运算,且对边加权的有向无环图(SPNs are directed acyclic graphs with variables as leaves, sums and products as internal nodes, and weighted edges)。
若SPN完备(complete)且一致(consistent),则该SPN表示图模型的配分函数及所有边缘、SPN的节点表示语义(the partition function and all marginals of some graphical model, and give semantics to its nodes)。
本文提出一种基于反向传播(backpropagation)和EM的SPN学习算法(learning algorithms)
SPN的学习和推理速度、准确性均优于传统深度网络。
1 引言
图模型(graphical models)将分布表示为因子的归一化乘积(graphical models represent distributions compactly normalized products of factors): P ( X = x ) = 1 Z ∏ k ϕ k ( x { k } ) P(X = x) = \frac{1}{Z} \prod_{k} \phi_{k} (x_{\{k\}}) P(X=x)=Z1∏kϕk(x{ k}),其中,
-
x ∈ X x \in \mathcal{X} x∈X为 d d d维向量
-
势(potential) ϕ k \phi_{k} ϕk为变量子集(作用域) x { k } x_{\{k\}} x{ k}的函数(each potential ϕ k \phi_{k} ϕk is a function of a subset x { k } x_{\{k\}} x{ k} of the variables (its scope))
-
Z = ∑ x ∈ X ∏ k ϕ k ( x { k } ) Z = \sum_{x \in \mathcal{X}} \prod_{k} \phi_{k} (x_{\{k\}}) Z=∑x∈X∏kϕk(x{ k})表示配分函数(partition function)。
图模型的缺点:
-
一些分布无法表示成上述形式;
-
最坏情况下(in the worst case),推理(inference)的时间复杂度呈指数(exponential)增长;
-
最坏情况下,学习所需样本数量(sample size required for accurate learning)随变量数量(scope size)呈指数增长;
-
由于学习过程涉及推理,即使固定变量,其时间复杂度依然为指数(because learning requires inference as a subroutine, it can take exponential time even with fixed scopes)。
通过假设隐含变量(hidden variables) y y y,可显著提高图模型的紧凑性(compactness): P ( X = x ) = 1 Z ∑ y ∏ k ϕ k ( ( x , y ) k ) P(X = x) = \frac{1}{Z} \sum_{y} \prod_{k} \phi_{k} ( (x, y)_{k} ) P(X=x)=Z1∑y∏kϕk((x,y)k)
多层隐藏变量的模型能够在类别数量众多的分布上高效推理(models with multiple layers of hidden variables allow for efficient inference in a much larger class of distributions)。
若能通过分配律将 ∑ x ∈ X ∏ k ϕ k ( x { k } ) \sum_{x \in \mathcal{X}} \prod_{k} \phi_{k} (x_{\{k\}}) ∑x∈X∏kϕk(x{ k})改写为多项式数量的和、积项(if ∑ x ∈ X ∏ k ϕ k ( x { k } ) \sum_{x \in \mathcal{X}} \prod_{k} \phi_{k} (x_{\{k\}}) ∑x∈X∏kϕk(x{ k}) can be reorganized using the distributive law into a computation involving only a polynomial number of sums and products),则配分函数 Z Z Z可高效计算。
本文提出和积网络(sum-product networks,SPNs)。SPN可视为混合模型的广义有向无环图(generalized directed acyclic graphs of mixture models),其和节点对应变量子集的混合(sum nodes corresponding to mixtures over subsets of variables)、积节点对应混合的特征(product nodes corresponding to features or mixture components)。SPN可采用反向传播或EM学习(efficient learning by backpropagation or EM)。
2 和积网络(Sum-Product Networks)
考虑布尔变量(Boolean variables) X i X_{i} Xi,其反(negation)记为 X ˉ i \bar{X}_{i} Xˉi。
指示函数(indicator function) [ ⋅ ] [\cdot] [⋅]:当输入(argument)为真时,其值为1;反之为0。本文中,变量指示器 [ X i ] [X_{i}] [Xi]、 [ X ˉ i ] [\bar{X}_{i}] [X