机器学习白板推导(2)

机器学习白板推导(1)

八、概率图模型

4.Bayesian Network-具体模型

B a y e s i a n   N e t w o r k {    单 一 : N a i v e   N a y e s ⟶ P ( x ∣ y ) = ∏ i = 1 p p ( x i ∣ y = 1 ) 混 合 : G M M 时 间 : { M a r k o v   C h a i n G u a s s i a n   P r o c e s s ( 无 限 维 高 斯 分 布 ) } 动 态 模 型    连 续 : G u a s s i a n   B a y e s i a n   N e t w o r k Bayesian \ Network\begin{cases} \ \ 单一:Naive \ Nayes\longrightarrow P(x \mid y)=\prod_{i=1}^{p}p(x_i\mid y=1)\\ \left.\begin{array}{l} 混合:GMM\\ 时间: \begin{cases} Markov \ Chain\\ Guassian \ Process(无限维高斯分布) \end{cases} \end{array}\right\}动态模型\\ \ \ 连续:Guassian \ Bayesian \ Network \end{cases} Bayesian Network  Naive NayesP(xy)=i=1pp(xiy=1)GMM{ Markov ChainGuassian Process()  Guassian Bayesian Network

动 态 模 型 { H M M ( 离 散 状 态 ) L D S ( K a l m a n   F i l t e r , ) ( 连 续 , 线 性 ) P a r i t c e   F i l t e r 动态模型 \begin{cases} HMM(离散状态)\\ LDS(Kalman\ Filter,)(连续,线性)\\ Paritce \ Filter \end{cases} HMM()LDS(Kalman Filter,)(线)Paritce Filter

Naive Nayes
tail to tail

y
x1
x2
xp

GMM
z z z是离散的

z
x

5.马尔可夫随机场-Representation-条件独立性

有向图的局部结构head to head ,默认是独立的,观测反而不独立

马尔可夫随机场是无向图,则没有head to head 的困扰

条件独立性体现在三个方面:

  • Global Markov全局马尔可夫
    X A ⊥ X C ∣ X B X_A \bot X_C \mid X_B XAXCXB

  • Local Markov局部马尔可夫
    a ⊥ { e , f } ∣ { b , c , d } a\bot \left \{ e,f \right \} \mid \left \{ b,c,d \right \} a{ e,f}{ b,c,d}

  • 成对Markov
    x i ⊥ x j ∣ x − i , x − j ( i ≠ j , i , j ) x_i \bot x_j \mid x_{-i},x_{-j}(i\ne j,i,j) xixjxi,xj(i=j,i,j)不相邻

三个是等价的

e
d
a
b
c
f

引入团的概念

团:一个关于节点的集合,集合的节点之间都是连通(两两)

P ( x ) = 1 Z ∏ i = 1 K φ ( x C i ) Z = ∑ x ∏ i = 1 K φ ( x C i ) = ∑ x 1 ∑ x 2 ⋯ ∑ x p ∏ i = 1 K φ ( x C i ) \begin{aligned} P(x) &=\frac{1}{Z} \prod_{i=1}^{K} \varphi (x_{C_i}) \\ Z&=\sum_x \prod_{i=1}^{K} \varphi (x_{C_i})\\ &=\sum_{x_1}\sum_{x_2}\cdots \sum_{x_p} \prod_{i=1}^{K} \varphi (x_{C_i}) \end{aligned} P(x)Z=Z1i=1Kφ(xCi)=xi=1Kφ(xCi)=x1x2xpi=1Kφ(xCi)
C i C_i Ci: 最大团
x C i x_{C_i} xCi: 最大团随机变量集合
φ ( x C i ) \varphi(x_{C_i}) φ(xCi): 势函数,必须为正

6.马尔可夫随机场-Representation-因子分解

Hammesley-Clifford 定理可以证明因子分解(基于最大团)与条件独立性等价
φ ( x C i ) = exp ⁡ { − E ( x C i ) } > 0 \varphi (x_{C_i})=\exp \left \{ -E(x_{C_i}) \right \} >0 φ(xCi)=exp{ E(xCi)}>0 E ( x C i ) E(x_{C_i}) E(xCi)为能量函数
P ( x ) P(x) P(x)称为Gibbs分布
P ( x ) = 1 Z ∏ i = 1 K φ ( x C i ) = 1 Z ∏ i = 1 K exp ⁡ { − E ( x C i ) } = 1 Z exp ⁡ { − ∑ i = 1 K E ( x C i ) } \begin{aligned} P(x) &=\frac{1}{Z} \prod_{i=1}^{K} \varphi (x_{C_i}) \\ &=\frac{1}{Z} \prod_{i=1}^{K} \exp \left \{ -E(x_{C_i}) \right \}\\ &=\frac{1}{Z} \exp \left \{ -\sum_{i=1}^K E(x_{C_i}) \right \} \end{aligned} P(x)=Z1i=1Kφ(xCi)=Z1i=1Kexp{ E(xCi)}=Z1exp{ i=1KE(xCi)}指数族分布

最大熵原理 ⇒ \Rightarrow 指数族分布(Gibbs分布)

Markov Random Field ⇔ \Leftrightarrow Gibbs Distribution

7.推断-总体介绍

P ( x ) = P ( x 1 , x 2 , ⋯   , x p ) P(x)=P(x_1,x_2,\cdots,x_p) P(x)=P(x1,x2,,xp)

推断-求概率

  • 边缘概率 P ( x i ) = ∑ x 1 ⋯ ∑ x i − 1 ∑ x i + 1 ⋯ ∑ x p P ( x ) P(x_i)=\sum_{x_1}\cdots \sum_{x_{i-1}} \sum_{x_{i+1}}\cdots \sum_{x_p}P(x) P(xi)=x1xi1xi+1xpP(x)
  • 条件概率 P ( x A ∣ x B ) , x = x A ∪ x B P(x_A \mid x_B) \quad ,x=x_A \cup x_B P(xAxB),x=xAxB
  • MAP Inference: z ^ = arg ⁡ max ⁡ z P ( z ∣ x ) ∝ arg ⁡ max ⁡ z P ( x , z ) \hat z = \arg \max_z P(z \mid x) \propto \arg \max_z P(x,z) z^=argzmaxP(zx)argzmaxP(x,z)

{ 精 确 推 断 { V a r i a b l e   E l i m a n a t i o n ( V E ) B e l i e f   P r o p a g t i o n ( B P ) ⟶ S u m   P r o d u c t   A l g o r i t h m J u n c t i o n   T r e e   A l g o r i t h m 近 似 推 断 { L o o p   B e l i e f   P r o p a g a t i o n M e n t e   C a r l o   I n f e r e n c e : I m p o r t a n c e   S a m p l i n g , M C M C V a r i a t i o n a l   I n f e r e n c e \begin{cases} 精确推断 \begin{cases} Variable \ Elimanation(VE)\\ Belief \ Propagtion(BP)\longrightarrow Sum \ Product \ Algorithm\\ Junction \ Tree \ Algorithm \end{cases}\\ 近似推断 \begin{cases} Loop \ Belief \ Propagation\\ Mente \ Carlo \ Inference:Importance \ Sampling,MCMC\\ Variational \ Inference \end{cases} \end{cases} Variable Elimanation(VE)Belief Propagtion(BP)Sum Product AlgorithmJunction Tree AlgorithmLoop Belief PropagationMente Carlo InferenceImportance Sampling,MCMCVariational Inference

8.推断-Variable Elimination

边缘概率
假设a,b,c,d均是离散的二值r.v,a,b,c,d ∈ { 0 , 1 } \in \left \{ 0,1 \right \} { 0,1}

a
b
c
d

P ( d ) = ∑ a , b , c P ( a ) P ( b ) P ( c ) P ( d ) = ∑ a , b , c P ( a ) P ( b ∣ a ) P ( c ∣ b ) P ( d ∣ c ) P(d)=\sum_{a,b,c} P(a)P(b)P(c)P(d)\\ =\sum_{a,b,c} P(a)P(b \mid a)P(c \mid b)P(d \mid c) P(d)=a,b,cP(a)P(b)P(c)P(d)=a,b,cP(a)P(ba)P(cb)P(dc)
逐一带入计算,P(d)计算量为 2 3 2^3 23,,随机变量取 K K K p p p维,计算量为幂指函数 K p K^p Kp,显然不可取。

P ( d ) = ∑ a , b , c P ( a ) P ( b ∣ a ) P ( c ∣ b ) P ( d ∣ c ) = ∑ b , c P ( c ∣ b ) P ( d ∣ c ) ∑ a P ( a ) P ( b ∣ a ) = ∑ c P ( d ∣ c ) P ( c ∣ b ) ∑ b ϕ a ( b ) = ∑ c P ( d ∣ c ) ϕ b ( c ) = ϕ c ( d ) P ( x ) = ∏ x c φ c ( x c ) \begin{aligned} P(d)&=\sum_{a,b,c} P(a)P(b \mid a)P(c \mid b)P(d \mid c)\\ &=\sum_{b,c} P(c \mid b)P(d \mid c) \sum_{a} P(a)P(b \mid a)\\ &=\sum_{c} P(d \mid c) P(c \mid b) \sum_{b}\phi_a(b)\\ &=\sum_{c} P(d \mid c) \phi_b(c)\\ &=\phi_c(d)\\ P(x)&=\prod_{x_c} \varphi_c(x_c) \end{aligned} P(d)P(x)=a,b,cP(a)P(ba)P(cb)P(dc)=b,cP(cb)P(dc)aP(a)P(ba)=cP(dc)P(cb)bϕa(b)=cP(dc)ϕb(c)=ϕc(d)=xcφc(xc)乘法分配率

缺点:重复计算(下节的内容),消除的次序

最大团:不可再添加节点,使得团内节点兩两两连接,团之间的联系非常小,可以认为是相互独立的

9.推断-Variable Elimination to Blief Propagation

a
b
c
d
e

P ( e ) = ∑ d p ( e ∣ d ) ∑ c P ( d ∣ c ) ∑ b P ( c ∣ b ) ∑ a P ( b ∣ a ) P ( a ) P(e)=\sum_{d} p(e \mid d) \sum_{c}P(d \mid c) \sum_{b} P(c \mid b)\sum_{a} P(b \mid a) P(a) P(e)=dp(ed)cP(dc)bP(cb)aP(ba)P(a)
同理
P ( c ) = ( ∑ b P ( c ∣ b ) ∑ a P ( b ∣ a ) P ( a ) ) ( ∑ d p ( d ∣ c ) ∑ e P ( e ∣ d ) ) P(c)= \left ( \sum_{b} P(c \mid b)\sum_{a} P(b \mid a) P(a) \right ) \left ( \sum_{d} p(d \mid c) \sum_{e}P(e \mid d) \right ) P(c)=(bP(cb)aP(ba)P(a))(dp(dc)eP(ed))
如何计算万 P ( e ) P(e) P(e),又要计算 P ( c ) P(c) P(c),等式有公因子,如果再计算其他任务就会造成重复计算。


Chain链式 to Tree树结构(如计算 P ( c ) P(c) P(c),前向-反向算法)
有向 to 无向
在这里插入图片描述

联合概率
P ( a , b , c , d ) = 1 Z φ a ( a ) φ b ( b ) φ c ( c ) φ d ( d ) φ a b ( a b ) φ b c ( b c ) φ b d ( b d ) P(a,b,c,d)=\frac{1}{Z} \varphi_a(a) \varphi_b(b) \varphi_c(c) \varphi_d(d) \varphi_ab(ab)\varphi_bc(bc) \varphi_bd(bd) P(a,b,c,d)=Z1φa(a)φb(b)φc(c)φd(d)φab(ab)φbc(bc)φbd(bd)
求边缘概率
P ( a ) = ∑ b , c , d P ( a , b , c , d ) P(a)=\sum_{b,c,d}P(a,b,c,d) P(a)=b,c,dP(a,b,c,d)
P ( a ) { φ a m b → a ( a ) { ∑ b m c → b ( b ) { ∑ c φ c φ b c φ b m c → b ( b ) { ∑ d φ d φ b d φ a b P(a) \begin{cases} \varphi_a\\ m_{b \to a}(a) \begin{cases} \sum_b \\ m_{c \to b}(b) \begin{cases} \sum_c \varphi_c \varphi_{bc} \end{cases}\\ \varphi_b\\ m_{c \to b}(b) \begin{cases} \sum_d \varphi_d \varphi_{bd} \end{cases}\\ \varphi_{ab} \end{cases} \end{cases} P(a)φamba(a)bmcb(b){ cφcφbcφbmcb(b){ dφdφbdφab

{ m b → a ( x a ) = ∑ x b φ a b φ b m c → b ( x b ) m c → b ( x b ) P ( x a ) = φ a m b → a ( x a ) \begin{cases} m_{b \to a}(x_a)=\sum_{x_b}\varphi_{ab}\varphi_b m_{c \to b}(x_b)m_{c \to b}(x_b)\\\\ P(x_a)=\varphi_am_{b \to a}(x_a) \end{cases} mba(xa)=xbφabφbmcb(xb)mcb(xb)P(xa)=φamba(xa)
{ m j → i ( x i ) = ∑ x j φ i j φ j ∏ k ∈ N B ( j ) − i m k → j ( x j ) P ( x i ) = φ i ∏ k ∈ N B ( i ) m k → i ( x i ) \begin{cases} m_{j \to i}(x_i)=\sum_{x_j}\varphi_{ij}\varphi_j \prod_{k \in NB(j)-i} m_{k \to j}(x_j)\\\\ P(x_i)=\varphi_i \prod_{k \in NB(i)} m_{k \to i}(x_i) \end{cases} mji(xi)=xjφijφjkNB(j)imkj(xj)P(xi)=φikNB(i)mki(xi)
N B ( j ) − i , j NB(j)-i,j NB(j)i,j的邻节点除了 i i i节点

不要直接求边缘概率,只需求 m i → j m_{i \to j} mij

10.推断-Blief Propagation

{ b e i l e f ( b ) = φ b   c h i l d m b → a = ∑ b φ a b   b e l i e f ( b ) \begin{cases} beilef(b)=\varphi_b \ child \\\\ m_{b \to a}=\sum_b \varphi_{ab} \ belief(b) \end{cases} beilef(b)=φb childmba=bφab belief(b)

BP = VE + Caching
直接求 m i j m_{ij} mij
图的遍历

  • Get root, assume is root
  • Collect Message
for x_i in NB(Root):
	collect(x_i)
  • Distribution Message
for x_j  in NB(Root):
	distribute(x_j)

可以并行运算

11.Max Product

{ m j → i ( x i ) = ∑ x j φ i j φ j ∏ k ∈ N B ( j ) − i m k → j ( x j ) P ( x i ) = φ i ∏ k ∈ N B ( i ) m k → i ( x i ) \begin{cases} m_{j \to i}(x_i)=\sum_{x_j}\varphi_{ij}\varphi_j \prod_{k \in NB(j)-i} m_{k \to j}(x_j)\\\\ P(x_i)=\varphi_i \prod_{k \in NB(i)} m_{k \to i}(x_i) \end{cases} mji(xi)=xjφijφjkNB(j)imkj(xj)P(xi)=φikNB(i)mki(xi)

Sum Product ∑ \sum 符改为Max Product max ⁡ \max max

  • BP的改进
  • Viterbi的推广

路径最优,每层的每一个节点都有累积最短路径
m j → i = max ⁡ x j φ j ⋅ φ i j ∏ k ∈ N B ( j ) − 1 m k → j m c → b = max ⁡ x c φ c ⋅ φ b c m d → b = max ⁡ x d φ d ⋅ φ b d m b → a = max ⁡ x b φ b ⋅ φ a b m c → b m d → b \begin{aligned} m_{j \to i}&=\max_{x_j} \varphi_j \cdot \varphi_{ij} \prod_{k \in NB(j)-1} m_{k \to j}\\ m_{c \to b}&=\max_{x_c} \varphi_c \cdot \varphi_{bc}\\ m_{d \to b}&=\max_{x_d} \varphi_d \cdot \varphi_{bd}\\ m_{b \to a}&=\max_{x_b} \varphi_b \cdot \varphi_{ab} m_{c \to b} m_{d \to b} \end{aligned} mjimcbmdbmba=xjmaxφjφijkNB(j)1m

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值