八、概率图模型
4.Bayesian Network-具体模型
B a y e s i a n N e t w o r k { 单 一 : N a i v e N a y e s ⟶ P ( x ∣ y ) = ∏ i = 1 p p ( x i ∣ y = 1 ) 混 合 : G M M 时 间 : { M a r k o v C h a i n G u a s s i a n P r o c e s s ( 无 限 维 高 斯 分 布 ) } 动 态 模 型 连 续 : G u a s s i a n B a y e s i a n N e t w o r k Bayesian \ Network\begin{cases} \ \ 单一:Naive \ Nayes\longrightarrow P(x \mid y)=\prod_{i=1}^{p}p(x_i\mid y=1)\\ \left.\begin{array}{l} 混合:GMM\\ 时间: \begin{cases} Markov \ Chain\\ Guassian \ Process(无限维高斯分布) \end{cases} \end{array}\right\}动态模型\\ \ \ 连续:Guassian \ Bayesian \ Network \end{cases} Bayesian Network⎩⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎧ 单一:Naive Nayes⟶P(x∣y)=∏i=1pp(xi∣y=1)混合:GMM时间:{ Markov ChainGuassian Process(无限维高斯分布)⎭⎪⎬⎪⎫动态模型 连续:Guassian Bayesian Network
动 态 模 型 { H M M ( 离 散 状 态 ) L D S ( K a l m a n F i l t e r , ) ( 连 续 , 线 性 ) P a r i t c e F i l t e r 动态模型 \begin{cases} HMM(离散状态)\\ LDS(Kalman\ Filter,)(连续,线性)\\ Paritce \ Filter \end{cases} 动态模型⎩⎪⎨⎪⎧HMM(离散状态)LDS(Kalman Filter,)(连续,线性)Paritce Filter
Naive Nayes
tail to tail
GMM
z z z是离散的
5.马尔可夫随机场-Representation-条件独立性
有向图的局部结构head to head ,默认是独立的,观测反而不独立
马尔可夫随机场是无向图,则没有head to head 的困扰
条件独立性体现在三个方面:
-
Global Markov
全局马尔可夫
X A ⊥ X C ∣ X B X_A \bot X_C \mid X_B XA⊥XC∣XB -
Local Markov
局部马尔可夫
a ⊥ { e , f } ∣ { b , c , d } a\bot \left \{ e,f \right \} \mid \left \{ b,c,d \right \} a⊥{ e,f}∣{ b,c,d} -
成对Markov
x i ⊥ x j ∣ x − i , x − j ( i ≠ j , i , j ) x_i \bot x_j \mid x_{-i},x_{-j}(i\ne j,i,j) xi⊥xj∣x−i,x−j(i=j,i,j)不相邻
三个是等价的
引入团的概念
团:一个关于节点的集合,集合的节点之间都是连通(两两)
P ( x ) = 1 Z ∏ i = 1 K φ ( x C i ) Z = ∑ x ∏ i = 1 K φ ( x C i ) = ∑ x 1 ∑ x 2 ⋯ ∑ x p ∏ i = 1 K φ ( x C i ) \begin{aligned} P(x) &=\frac{1}{Z} \prod_{i=1}^{K} \varphi (x_{C_i}) \\ Z&=\sum_x \prod_{i=1}^{K} \varphi (x_{C_i})\\ &=\sum_{x_1}\sum_{x_2}\cdots \sum_{x_p} \prod_{i=1}^{K} \varphi (x_{C_i}) \end{aligned} P(x)Z=Z1i=1∏Kφ(xCi)=x∑i=1∏Kφ(xCi)=x1∑x2∑⋯xp∑i=1∏Kφ(xCi)
C i C_i Ci: 最大团
x C i x_{C_i} xCi: 最大团随机变量集合
φ ( x C i ) \varphi(x_{C_i}) φ(xCi): 势函数,必须为正
6.马尔可夫随机场-Representation-因子分解
Hammesley-Clifford 定理可以证明因子分解(基于最大团)与条件独立性等价
φ ( x C i ) = exp { − E ( x C i ) } > 0 \varphi (x_{C_i})=\exp \left \{ -E(x_{C_i}) \right \} >0 φ(xCi)=exp{
−E(xCi)}>0 E ( x C i ) E(x_{C_i}) E(xCi)为能量函数
P ( x ) P(x) P(x)称为Gibbs分布
P ( x ) = 1 Z ∏ i = 1 K φ ( x C i ) = 1 Z ∏ i = 1 K exp { − E ( x C i ) } = 1 Z exp { − ∑ i = 1 K E ( x C i ) } \begin{aligned} P(x) &=\frac{1}{Z} \prod_{i=1}^{K} \varphi (x_{C_i}) \\ &=\frac{1}{Z} \prod_{i=1}^{K} \exp \left \{ -E(x_{C_i}) \right \}\\ &=\frac{1}{Z} \exp \left \{ -\sum_{i=1}^K E(x_{C_i}) \right \} \end{aligned} P(x)=Z1i=1∏Kφ(xCi)=Z1i=1∏Kexp{
−E(xCi)}=Z1exp{
−i=1∑KE(xCi)}指数族分布
最大熵原理 ⇒ \Rightarrow ⇒指数族分布(Gibbs分布)
Markov Random Field ⇔ \Leftrightarrow ⇔ Gibbs Distribution
7.推断-总体介绍
P ( x ) = P ( x 1 , x 2 , ⋯ , x p ) P(x)=P(x_1,x_2,\cdots,x_p) P(x)=P(x1,x2,⋯,xp)
推断-求概率
边缘概率
P ( x i ) = ∑ x 1 ⋯ ∑ x i − 1 ∑ x i + 1 ⋯ ∑ x p P ( x ) P(x_i)=\sum_{x_1}\cdots \sum_{x_{i-1}} \sum_{x_{i+1}}\cdots \sum_{x_p}P(x) P(xi)=x1∑⋯xi−1∑xi+1∑⋯xp∑P(x)条件概率
P ( x A ∣ x B ) , x = x A ∪ x B P(x_A \mid x_B) \quad ,x=x_A \cup x_B P(xA∣xB),x=xA∪xBMAP Inference
: z ^ = arg max z P ( z ∣ x ) ∝ arg max z P ( x , z ) \hat z = \arg \max_z P(z \mid x) \propto \arg \max_z P(x,z) z^=argzmaxP(z∣x)∝argzmaxP(x,z)
{ 精 确 推 断 { V a r i a b l e E l i m a n a t i o n ( V E ) B e l i e f P r o p a g t i o n ( B P ) ⟶ S u m P r o d u c t A l g o r i t h m J u n c t i o n T r e e A l g o r i t h m 近 似 推 断 { L o o p B e l i e f P r o p a g a t i o n M e n t e C a r l o I n f e r e n c e : I m p o r t a n c e S a m p l i n g , M C M C V a r i a t i o n a l I n f e r e n c e \begin{cases} 精确推断 \begin{cases} Variable \ Elimanation(VE)\\ Belief \ Propagtion(BP)\longrightarrow Sum \ Product \ Algorithm\\ Junction \ Tree \ Algorithm \end{cases}\\ 近似推断 \begin{cases} Loop \ Belief \ Propagation\\ Mente \ Carlo \ Inference:Importance \ Sampling,MCMC\\ Variational \ Inference \end{cases} \end{cases} ⎩⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎧精确推断⎩⎪⎨⎪⎧Variable Elimanation(VE)Belief Propagtion(BP)⟶Sum Product AlgorithmJunction Tree Algorithm近似推断⎩⎪⎨⎪⎧Loop Belief PropagationMente Carlo Inference:Importance Sampling,MCMCVariational Inference
8.推断-Variable Elimination
求边缘概率
假设a,b,c,d均是离散的二值r.v,a,b,c,d ∈ { 0 , 1 } \in \left \{ 0,1 \right \} ∈{
0,1}
P ( d ) = ∑ a , b , c P ( a ) P ( b ) P ( c ) P ( d ) = ∑ a , b , c P ( a ) P ( b ∣ a ) P ( c ∣ b ) P ( d ∣ c ) P(d)=\sum_{a,b,c} P(a)P(b)P(c)P(d)\\ =\sum_{a,b,c} P(a)P(b \mid a)P(c \mid b)P(d \mid c) P(d)=a,b,c∑P(a)P(b)P(c)P(d)=a,b,c∑P(a)P(b∣a)P(c∣b)P(d∣c)
逐一带入计算,P(d)计算量为 2 3 2^3 23,,随机变量取 K K K种 p p p维,计算量为幂指函数 K p K^p Kp,显然不可取。
P ( d ) = ∑ a , b , c P ( a ) P ( b ∣ a ) P ( c ∣ b ) P ( d ∣ c ) = ∑ b , c P ( c ∣ b ) P ( d ∣ c ) ∑ a P ( a ) P ( b ∣ a ) = ∑ c P ( d ∣ c ) P ( c ∣ b ) ∑ b ϕ a ( b ) = ∑ c P ( d ∣ c ) ϕ b ( c ) = ϕ c ( d ) P ( x ) = ∏ x c φ c ( x c ) \begin{aligned} P(d)&=\sum_{a,b,c} P(a)P(b \mid a)P(c \mid b)P(d \mid c)\\ &=\sum_{b,c} P(c \mid b)P(d \mid c) \sum_{a} P(a)P(b \mid a)\\ &=\sum_{c} P(d \mid c) P(c \mid b) \sum_{b}\phi_a(b)\\ &=\sum_{c} P(d \mid c) \phi_b(c)\\ &=\phi_c(d)\\ P(x)&=\prod_{x_c} \varphi_c(x_c) \end{aligned} P(d)P(x)=a,b,c∑P(a)P(b∣a)P(c∣b)P(d∣c)=b,c∑P(c∣b)P(d∣c)a∑P(a)P(b∣a)=c∑P(d∣c)P(c∣b)b∑ϕa(b)=c∑P(d∣c)ϕb(c)=ϕc(d)=xc∏φc(xc)乘法分配率
缺点:重复计算(下节的内容),消除的次序
最大团:不可再添加节点,使得团内节点兩两两连接,团之间的联系非常小,可以认为是相互独立的
9.推断-Variable Elimination to Blief Propagation
P ( e ) = ∑ d p ( e ∣ d ) ∑ c P ( d ∣ c ) ∑ b P ( c ∣ b ) ∑ a P ( b ∣ a ) P ( a ) P(e)=\sum_{d} p(e \mid d) \sum_{c}P(d \mid c) \sum_{b} P(c \mid b)\sum_{a} P(b \mid a) P(a) P(e)=d∑p(e∣d)c∑P(d∣c)b∑P(c∣b)a∑P(b∣a)P(a)
同理
P ( c ) = ( ∑ b P ( c ∣ b ) ∑ a P ( b ∣ a ) P ( a ) ) ( ∑ d p ( d ∣ c ) ∑ e P ( e ∣ d ) ) P(c)= \left ( \sum_{b} P(c \mid b)\sum_{a} P(b \mid a) P(a) \right ) \left ( \sum_{d} p(d \mid c) \sum_{e}P(e \mid d) \right ) P(c)=(b∑P(c∣b)a∑P(b∣a)P(a))(d∑p(d∣c)e∑P(e∣d))
如何计算万 P ( e ) P(e) P(e),又要计算 P ( c ) P(c) P(c),等式有公因子,如果再计算其他任务就会造成重复计算。
Chain
链式 to Tree
树结构(如计算 P ( c ) P(c) P(c),前向-反向算法)
有向
to 无向
联合概率
P ( a , b , c , d ) = 1 Z φ a ( a ) φ b ( b ) φ c ( c ) φ d ( d ) φ a b ( a b ) φ b c ( b c ) φ b d ( b d ) P(a,b,c,d)=\frac{1}{Z} \varphi_a(a) \varphi_b(b) \varphi_c(c) \varphi_d(d) \varphi_ab(ab)\varphi_bc(bc) \varphi_bd(bd) P(a,b,c,d)=Z1φa(a)φb(b)φc(c)φd(d)φab(ab)φbc(bc)φbd(bd)
求边缘概率
P ( a ) = ∑ b , c , d P ( a , b , c , d ) P(a)=\sum_{b,c,d}P(a,b,c,d) P(a)=b,c,d∑P(a,b,c,d)
P ( a ) { φ a m b → a ( a ) { ∑ b m c → b ( b ) { ∑ c φ c φ b c φ b m c → b ( b ) { ∑ d φ d φ b d φ a b P(a) \begin{cases} \varphi_a\\ m_{b \to a}(a) \begin{cases} \sum_b \\ m_{c \to b}(b) \begin{cases} \sum_c \varphi_c \varphi_{bc} \end{cases}\\ \varphi_b\\ m_{c \to b}(b) \begin{cases} \sum_d \varphi_d \varphi_{bd} \end{cases}\\ \varphi_{ab} \end{cases} \end{cases} P(a)⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧φamb→a(a)⎩⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎧∑bmc→b(b){
∑cφcφbcφbmc→b(b){
∑dφdφbdφab
{ m b → a ( x a ) = ∑ x b φ a b φ b m c → b ( x b ) m c → b ( x b ) P ( x a ) = φ a m b → a ( x a ) \begin{cases} m_{b \to a}(x_a)=\sum_{x_b}\varphi_{ab}\varphi_b m_{c \to b}(x_b)m_{c \to b}(x_b)\\\\ P(x_a)=\varphi_am_{b \to a}(x_a) \end{cases} ⎩⎪⎨⎪⎧mb→a(xa)=∑xbφabφbmc→b(xb)mc→b(xb)P(xa)=φamb→a(xa)
{ m j → i ( x i ) = ∑ x j φ i j φ j ∏ k ∈ N B ( j ) − i m k → j ( x j ) P ( x i ) = φ i ∏ k ∈ N B ( i ) m k → i ( x i ) \begin{cases} m_{j \to i}(x_i)=\sum_{x_j}\varphi_{ij}\varphi_j \prod_{k \in NB(j)-i} m_{k \to j}(x_j)\\\\ P(x_i)=\varphi_i \prod_{k \in NB(i)} m_{k \to i}(x_i) \end{cases} ⎩⎪⎨⎪⎧mj→i(xi)=∑xjφijφj∏k∈NB(j)−imk→j(xj)P(xi)=φi∏k∈NB(i)mk→i(xi)
N B ( j ) − i , j NB(j)-i,j NB(j)−i,j的邻节点除了 i i i节点
不要直接求边缘概率,只需求 m i → j m_{i \to j} mi→j
10.推断-Blief Propagation
{ b e i l e f ( b ) = φ b c h i l d m b → a = ∑ b φ a b b e l i e f ( b ) \begin{cases} beilef(b)=\varphi_b \ child \\\\ m_{b \to a}=\sum_b \varphi_{ab} \ belief(b) \end{cases} ⎩⎪⎨⎪⎧beilef(b)=φb childmb→a=∑bφab belief(b)
BP = VE + Caching
直接求 m i j m_{ij} mij
图的遍历
Get root, assume is root
Collect Message
for x_i in NB(Root):
collect(x_i)
Distribution Message
for x_j in NB(Root):
distribute(x_j)
可以并行运算
11.Max Product
{ m j → i ( x i ) = ∑ x j φ i j φ j ∏ k ∈ N B ( j ) − i m k → j ( x j ) P ( x i ) = φ i ∏ k ∈ N B ( i ) m k → i ( x i ) \begin{cases} m_{j \to i}(x_i)=\sum_{x_j}\varphi_{ij}\varphi_j \prod_{k \in NB(j)-i} m_{k \to j}(x_j)\\\\ P(x_i)=\varphi_i \prod_{k \in NB(i)} m_{k \to i}(x_i) \end{cases} ⎩⎪⎨⎪⎧mj→i(xi)=∑xjφijφj∏k∈NB(j)−imk→j(xj)P(xi)=φi∏k∈NB(i)mk→i(xi)
将Sum Product
∑ \sum ∑符改为Max Product
max \max max
- BP的改进
- Viterbi的推广
路径最优,每层的每一个节点都有累积最短路径
m j → i = max x j φ j ⋅ φ i j ∏ k ∈ N B ( j ) − 1 m k → j m c → b = max x c φ c ⋅ φ b c m d → b = max x d φ d ⋅ φ b d m b → a = max x b φ b ⋅ φ a b m c → b m d → b \begin{aligned} m_{j \to i}&=\max_{x_j} \varphi_j \cdot \varphi_{ij} \prod_{k \in NB(j)-1} m_{k \to j}\\ m_{c \to b}&=\max_{x_c} \varphi_c \cdot \varphi_{bc}\\ m_{d \to b}&=\max_{x_d} \varphi_d \cdot \varphi_{bd}\\ m_{b \to a}&=\max_{x_b} \varphi_b \cdot \varphi_{ab} m_{c \to b} m_{d \to b} \end{aligned} mj→imc→bmd→bmb→a=xjmaxφj⋅φijk∈NB(j)−1∏m