18 高斯网络
18.1 高斯网络介绍
概率图模型可以分成主要的几部分:
- Bayesian Network
- Markov Network
- Gaussian Network——连续性的概率图模型:
- Gaussian Bayesian Network
- Gaussian Markov Network
高斯网络的特点有:
-
假设每个节点通过 x i x_i xi表示图中的每个节点都服从高斯分布: x i ∽ N ( μ i , Σ i ) x_i \backsim N(\mu_i, \Sigma_i) xi∽N(μi,Σi)。
-
一张高斯图可以表示为 X = ( x 1 , x 2 , … , x p ) T X=(x_1, x_2, \dots, x_p)^T X=(x1,x2,…,xp)T,并且图与高维高斯分布对应:
p ( x ) = 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 ⋅ exp { − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) } p(x) = \frac{1}{(2\pi)^{\frac{p}{2}} {|\Sigma|}^{\frac{1}{2}}} \cdot \exp{\lbrace -\frac{1}{2}{(x-\mu)}^T \Sigma^{-1} (x-\mu) \rbrace} p(x)=(2π)2p∣Σ∣211⋅exp{−21(x−μ)TΣ−1(x−μ)} -
具有高斯分布的性质,已知 Σ = ( σ i j ) p × p \Sigma = (\sigma_{ij})_{p \times p} Σ=(σij)p×p:
x i ⊥ x j ⟺ σ i j = 0 x_i \bot x_j \iff \sigma_{ij} = 0 xi⊥xj⟺σij=0 -
具有图的性质——条件独立性 X A ⊥ X B ∣ X C X_A \bot X_B | X_C XA⊥XB∣XC,定义精度矩阵 Λ = Σ − 1 = ( λ i j ) p × p \Lambda = \Sigma^{-1} = (\lambda_{ij})_{p \times p} Λ=Σ−1=(λij)p×p(称为precision matrix/ information matrix),则条件独立性定义为:
x i ⊥ x j ∣ X − { x i , x j } ⟺ λ i j = 0 x_i \bot x_j|_{X-{\lbrace x_i, x_j \rbrace}} \iff \lambda_{ij} = 0 xi⊥xj∣X−{xi,xj}⟺λij=0
18.2 Gaussian Bayesian Network——高斯有向图
高斯贝叶斯网络的案例——Kalman Filter:
-
链式的,特殊的GBN
-
和HMM相同, λ = ( π , A , B ) \lambda = (\pi, A, B) λ=(π,A,B),同时满足高斯:
{ x t ∣ x t − 1 ∽ N ( x t ∣ A x t − 1 + B , Q ) y t ∣ x t ∽ N ( y t ∣ C x t + D , R ) \begin{cases} x_t| x_{t-1} \backsim N(x_t| Ax_{t-1} + B, Q) \\ y_t| x_t \backsim N(y_t| Cx_t + D, R) \end{cases} {xt∣xt−1∽N(xt∣Axt−1+B,Q)yt∣xt∽N(yt∣Cxt+D,R)
和线性:
{ x t = A x t − 1 + B + ε , ε ∽ N ( 0 , Q ) y t = C x t + D + σ , σ ∽ N ( 0 , R ) \begin{cases} x_t = A x_{t-1} + B + \varepsilon, \quad \varepsilon \backsim N(0, Q) \\ y_t = C x_{t} + D + \sigma, \quad \sigma \backsim N(0, R) \\ \end{cases} {xt=Axt−1+B+ε,ε∽N(0,Q)yt=Cxt+D+σ,σ∽N(0,R)
GBN的因子分解:
-
Bayesian Network的性质有一条因子分解:
P ( x ) = ∏ i = 1 p P ( x i ∣ x p a ( i ) ) P(x) = \prod_{i=1}^p P(x_i | x_{pa(i)}) P(x)=i=1∏pP(xi∣xpa(i)) -
GBN(global model)基于线性高斯模型(local model):
{ P ( x ) = N ( x ∣ μ x , Σ x ) p ( y ∣ x ) = N ( y ∣ A x + B , Σ y ) \begin{cases} P(x) = N(x|\mu_x, \Sigma_x) \\ p(y|x) = N(y|Ax+B, \Sigma_y) \end{cases} {P(x)=N(x∣μx,Σx)p(y∣x)=N(y∣Ax+B,Σy) -
在 x i x_i xi是一维的情况下,可以得到:
{ P ( x ) = ∏ i = 1 p P ( x i ∣ x p a ( i ) ) x p a ( i ) = ( x 1 , x 2 , … , x K ) T ⟹ { x i ∣ x p a ( i ) ∽ N ( x i ∣ μ i + w i T x p a ( i ) , σ i 2 ) x i − μ i = ∑ j ∈ x p a ( i ) w i j ( x j − μ j ) + σ i ε i , ε i ∽ N ( 0 , 1 ) \begin{align} &\begin{cases} P(x) = \prod_{i=1}^p P(x_i | x_{pa(i)}) \\ x_{pa(i)} = {(x_1, x_2, \dots, x_K)}^T \end{cases} \\ \implies &\begin{cases} x_i| x_{pa(i)} \backsim N(x_i| \mu_i + w_i^T x_{pa(i)}, \sigma_i^2) \\ x_i - \mu_i = \sum_{j \in x_{pa(i)}} {w_{ij}(x_j - \mu_j) + \sigma_i\varepsilon_i}, \quad \varepsilon_i \backsim N(0, 1) \end{cases} \end{align} ⟹{P(x)=∏i=1pP(xi∣xpa(i))xpa(i)=(x1,x2,…,xK)T{xi∣xpa(i)∽N(xi∣μi+wiTxpa(i),σi2)xi−μi=∑j∈xpa(i)wij(xj−μj)+σiεi,εi∽N(0,1)
可以一看的出来 i , j i,j i,j之间有线性关系,实际上是线性高斯模型。化成 x i − μ i x_i - \mu_i xi−μi的形式是为了简化运算,因为这样等同于将Gaussian Disk平移到了原点。
18.3 Gaussian Markov Network——高斯无向图
高斯马尔可夫网络的因子分解:
-
目前有什么条件呢:
{ p ( x ) = 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 ⋅ exp { − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) } p ( x ) = 1 Z ∏ i = 1 p φ i ( x i ) ⏟ node potential ⋅ ∏ i , j ∈ x φ i , j ( x i , x j ) ⏟ edge potential \begin{cases} p(x) = \frac{1}{(2\pi)^{\frac{p}{2}} {|\Sigma|}^{\frac{1}{2}}} \cdot \exp{\lbrace -\frac{1}{2}{(x-\mu)}^T \Sigma^{-1} (x-\mu) \rbrace} \\ p(x) = \frac{1}{Z} \prod_{i=1}^p \underbrace{\varphi_i(x_i)}_{\text{node potential}} \cdot \prod_{i, j \in x} \underbrace{\varphi_{i, j}(x_i, x_j)}_{\text{edge potential}} \end{cases} ⎩ ⎨ ⎧p(x)=(2π)2p∣Σ∣211⋅exp{−21(x−μ)TΣ−1(x−μ)}p(x)=Z1∏i=1pnode potential φi(xi)⋅∏i,j∈xedge potential φi,j(xi,xj) -
我们的目的是将他们结合起来,我们来看一下上面的公式有什么性质(已知 x = ( x 1 , x 2 , … , x p ) T , Λ = Σ − 1 = ( λ i j ) p × p x = (x_1, x_2, \dots, x_p)^T, \Lambda = \Sigma^{-1} = {(\lambda_{ij})}_{p \times p} x=(x1,x2,…,xp)T,Λ=Σ−1=(λij)p×p,且 Σ \Sigma Σ为对称矩阵):
p ( x ) ∝ exp { − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) } = exp { − 1 2 ( x T Λ x − x T Λ μ ⏟ 1 × 1 − μ T Λ x ⏟ 1 × 1 + μ T Λ μ ) } = exp { − 1 2 ( x T Λ x − 2 μ T Λ x + μ T Λ μ ⏟ 与 x 无关 ) } ∝ exp { − 1 2 x T Λ x ⏟ 二次项 − ( Λ μ ) T x ⏟ 一次项 } \begin{align} p(x) &\propto \exp{\lbrace -\frac{1}{2}{(x-\mu)}^T \Sigma^{-1} (x-\mu) \rbrace} \\ &= \exp{\lbrace -\frac{1}{2} (x^T \Lambda x - \underbrace{x^T \Lambda \mu}_{1 \times 1} - \underbrace{\mu^T \Lambda x}_{1 \times 1} + \mu^T \Lambda \mu) \rbrace} \\ &= \exp{\lbrace -\frac{1}{2} (x^T \Lambda x - 2 \mu^T \Lambda x + \underbrace{\mu^T \Lambda \mu}_{与x无关}) \rbrace} \\ &\propto \exp{\lbrace \underbrace{-\frac{1}{2} x^T \Lambda x}_{二次项} - \underbrace{{(\Lambda \mu)}^T x}_{一次项} \rbrace} \\ \end{align} p(x)∝exp{−21(x−μ)TΣ−1(x−μ)}=exp{−21(xTΛx−1×1 xTΛμ−1×1 μTΛx+μTΛμ)}=exp{−21(xTΛx−2μTΛx+与x无关 μTΛμ)}∝exp{二次项 −21xTΛx−一次项 (Λμ)Tx}
其中我们将 Λ \Lambda Λ称为precisian matrix, Λ μ \Lambda \mu Λμ称为potential vector,且假设 Λ μ = h = ( h 1 , h 2 , … , h p ) T \Lambda \mu = h = {(h_1, h_2, \dots, h_p)}^T Λμ=h=(h1,h2,…,hp)T: -
我们在Markov Network中的node potential与 x i x_i xi相关,edge potential与 x i , x j x_i, x_j xi,xj相关,所以从上文中可以发现在Gaussian中势函数所表示的东西:
与 x i x_i xi相关(有 x x x的项都可以):
x i : − 1 2 x i 2 λ i i − h i x i x_i: {-\frac{1}{2} x_i^2 \lambda_{ii}} - {h_i x_i} xi:−21xi2λii−hixi
与 x i , x j x_i, x_j xi,xj相关(必须是二次项,且要考虑两种顺序):
x i , x j : − 1 2 ( λ i j x i x j + λ j i x j x i ) = − λ i j x i x j x_i, x_j: {-\frac{1}{2}} ({\lambda_{ij} x_i x_j} + {\lambda_{ji} x_j x_i}) = -{\lambda_{ij} x_i x_j} xi,xj:−21(λijxixj+λjixjxi)=−λijxixj
总结一下GMN的性质:
-
在Gaussian中——绝对独立(marginal independent):
x i ⊥ x j , Σ = ( σ i j ) ⟺ σ i j = 0 x_i \bot x_j, \Sigma = (\sigma_{ij}) \iff \sigma_{ij} = 0 xi⊥xj,Σ=(σij)⟺σij=0 -
在上文中得出的条件独立——条件独立:
x i ⊥ x j ∣ − { x i , x j } , Λ = Σ − 1 = ( λ i j ) ⟺ λ i j = 0 {x_i \bot x_j|}_{- {\lbrace x_i, x_j \rbrace}}, \Lambda = \Sigma^{-1} = (\lambda_{ij}) \iff \lambda_{ij} = 0 xi⊥xj∣−{xi,xj},Λ=Σ−1=(λij)⟺λij=0 -
在任意无向图中:
∀ x i , x i ∣ − { x i } ⏟ 条件概率分布 ∽ N ( ∑ j ≠ i λ i j λ i i x j , λ i i − 1 ) \forall x_i, \underbrace{{x_i|}_{-{\lbrace x_i \rbrace}}}_{条件概率分布} \backsim N(\sum_{j \neq i} \frac{\lambda_{ij}}{\lambda_{ii}} x_j, \lambda_{ii}^{-1}) ∀xi,条件概率分布 xi∣−{xi}∽N(j=i∑λiiλijxj,λii−1)
可以发现 x i x_i xi可以看作与之相连的 x j x_j xj的线性组合