什么是时序模型
比如说图像,一个人的特征等都是非时序类型的
股票价格,说话的语音,文本,温度等都是时序类型
传统模型使用HMM/CRF即可解决问题
但是由于硬件的心呢个提升,现在主流RNN/LSTM等深度学习模型
什么是HMM
他是Directed+Generate model并存的
HMM 的参数( Z I Z_I ZI是离散性)
以单词词性预测为例来说明问题
θ
=
(
A
,
B
,
π
)
\theta=(A,B,\pi)
θ=(A,B,π)
对于 A i , j A_{i,j} Ai,j,代表某一行 i i i前一个单词为 j j j的状态转移概率,大小 M ∗ M M*M M∗M的大小, M M M为所有单词个数
对于 B i , j B_{i,j} Bi,j,代表某一行 i i i的状态会生成 j j j的单词的概率,大小 M ∗ M M*M M∗M的大小, M M M为所有单词个数
对于 π = [ π 1 , π 2 , . . . π m ] \pi=[\pi_1,\pi_2,...\pi_m] π=[π1,π2,...πm],代表某一个状态 Z 1 , Z 2 Z_1,Z_2 Z1,Z2在第一个位置的概率,其中 π 1 + π 2 + . . . + π m = 1 \pi_1+\pi_2+...+\pi_m=1 π1+π2+...+πm=1
那么给定 x − − > θ x-->\theta x−−>θ就是预估集问题
θ , x − − − > Z \theta,x--->Z θ,x−−−>Z变成了推理问题
HMM 中的 Inference 问题
给定 θ = ( A , B . π ) \theta =(A,B.\pi) θ=(A,B.π)求 x x x
我们可以罗列出所有的情况😄,不要嫌多
if Z i ∈ { a , b , c } Z_i\in \{a,b,c\} Zi∈{a,b,c}那么存在下列的情况
a a a . . . . . a a a b . . . . . b b a a . . . . . a aaa.....a \\ aab.....b \\ baa.....a aaa.....aaab.....bbaa.....a
那么 P ( Z 1 ) P ( Z 2 ∣ Z 1 ) P ( Z 3 ∣ Z 2 ) . . . P ( Z m ∣ P m − 1 ) ⋅ P ( x 1 ∣ Z 1 ) P ( x 2 ∣ Z 2 ) . . . P ( x n ∣ Z n ) P(Z_1)P(Z_2|Z_1)P(Z_3|Z_2)...P(Z_m|P_{m-1})·P(x_1|Z_1)P(x_2|Z_2)...P(x_n|Z_n) P(Z1)P(Z2∣Z1)P(Z3∣Z2)...P(Zm∣Pm−1)⋅P(x1∣Z1)P(x2∣Z2)...P(xn∣Zn),每一项都可以计算出来,只需要查询B即可,但是时间复杂度太高了 3 n 3^{n} 3n,哈哈哈,确实还是有那么一点点小问题纳,奇迹估计是跑不出来了
但是并不是说没有解决方案了,可以试探性的使用Viterbi试试
比如上面我画的某条路径,可以使用 P ( Z 1 = 2 ) P ( x 1 ∣ Z 1 = 2 ) P ( Z 2 = 1 ∣ Z 1 = 2 ) P ( x 2 ∣ Z 2 = 1 ) . . . P ( Z K = 2 ) P ( X K ∣ Z K = 2 ) P(Z_1=2)P(x_1|Z_1=2)P(Z_2=1|Z_1=2)P(x_2|Z_2=1)...P(Z_K=2)P(X_K|Z_K=2) P(Z1=2)P(x1∣Z1=2)P(Z2=1∣Z1=2)P(x2∣Z2=1)...P(ZK=2)P(XK∣ZK=2)代表所有路径
δ k ( i ) \delta_k(i) δk(i):the score of the best path ending at state i at time k
那么 δ k + 1 ( j ) = m a x { δ k ( 1 ) + log P ( Z k + 1 = j ∣ Z k = 1 ) + log P ( x t + 1 ∣ Z t + 1 = j ) δ k ( 2 ) + log P ( Z k + 1 = j ∣ Z k = 2 ) + log P ( x t + 1 ∣ Z t + 1 = j ) . . . . . . δ k ( m ) + log P ( Z k + 1 = j ∣ Z k = m ) + log P ( x t + 1 ∣ Z t + 1 = j ) \delta_{k+1}(j)=max \left\{\begin{array}{l}\delta_k(1)+\log P(Z_{k+1}=j|Z_k=1)+\log P(x_{t+1}|Z_{t+1}=j)\\\delta_k(2)+\log P(Z_{k+1}=j|Z_k=2)+\log P(x_{t+1}|Z_{t+1}=j) \\ ...... \\ \delta_k(m)+\log P(Z_{k+1}=j|Z_k=m)+\log P(x_{t+1}|Z_{t+1}=j)\end{array}\right. δk+1(j)=max⎩⎪⎪⎨⎪⎪⎧δk(1)+logP(Zk+1=j∣Zk=1)+logP(xt+1∣Zt+1=j)δk(2)+logP(Zk+1=j∣Zk=2)+logP(xt+1∣Zt+1=j)......δk(m)+logP(Zk+1=j∣Zk=m)+logP(xt+1∣Zt+1=j)
简化之后
δ k + 1 ( j ) = m a x i [ δ k ( i ) + log P ( Z k + 1 = j ∣ Z k = i ) + log P ( x k + 1 ∣ Z k + 1 = j ) ] \delta_{k+1}(j)=max_i[\delta_k(i)+\log P(Z_{k+1}=j|Z_k=i)+\log P(x_{k+1}|Z_{k+1}=j)] δk+1(j)=maxi[δk(i)+logP(Zk+1=j∣Zk=i)+logP(xk+1∣Zk+1=j)]
HMM 中的 FB 算法
F/B Algorithm :compute P ( Z k ∣ x ) P(Z_k|x) P(Zk∣x)
Forward:computer P ( Z k , x 1 : k ) P(Z_k,x_{1:k}) P(Zk,x1:k)
Backward:computer P ( x k + 1 : n ∣ Z k ) P(x_{k+1:n|Z_k}) P(xk+1:n∣Zk)
贝叶斯定理可以得出
P
(
Z
k
∣
x
)
=
P
(
Z
k
,
x
)
P
(
x
)
∝
P
(
Z
k
,
x
)
P(Z_k|x)=\frac{P(Z_k,x)}{P(x)}\propto P(Z_k,x)
P(Zk∣x)=P(x)P(Zk,x)∝P(Zk,x)
P ( Z k , x ) = P ( x k + 1 : n ∣ Z k , x 1 : k ) ⋅ P ( Z k , x 1 : k ) P(Z_k,x)=P(x_{k+1:n}|Z_k,x_{1:k})·P(Z_k,x_{1:k}) P(Zk,x)=P(xk+1:n∣Zk,x1:k)⋅P(Zk,x1:k)
反思
P
(
x
k
+
1
:
n
∣
Z
k
,
x
1
:
k
)
P(x_{k+1:n}|Z_k,x_{1:k})
P(xk+1:n∣Zk,x1:k)中,
x
1
:
k
x_{1:k}
x1:k独立于
Z
k
Z_k
Zk
P
(
Z
k
,
x
)
=
P
(
x
k
+
1
:
n
∣
Z
k
)
⏟
b
a
c
k
w
a
r
d
⋅
P
(
Z
k
,
x
1
:
k
)
⏟
f
o
r
w
a
r
d
P(Z_k,x)=\underbrace {P(x_{k+1:n}|Z_k)}_{backward}·\underbrace {P(Z_k,x_{1:k})}_{forward}
P(Zk,x)=backward
P(xk+1:n∣Zk)⋅forward
P(Zk,x1:k)
例如
P
(
Z
k
=
1
∣
x
)
=
P
(
Z
k
=
1
,
x
)
∑
j
P
(
Z
k
=
j
,
x
)
P(Z_{k=1|x})=\frac{P(Z_k=1,x)}{\sum_jP(Z_k=j,x)}
P(Zk=1∣x)=∑jP(Zk=j,x)P(Zk=1,x)
x 1 : k = ( x 1 , x 2 , . . . x k ) x_{1:k}=(x_1,x_2,...x_k) x1:k=(x1,x2,...xk)
- 通过F/B算法可以计算模型参数
- Change Detection
场景:组团欺诈,在那些时间段,网络突变
A:计算 g r a p h t , g r a p h t + 1 graph_t,graph_{t+1} grapht,grapht+1之间的相似度
B:HMM中每个状态下生成的图,判断 P ( Z k ≠ Z k ≠ 1 ∣ x ) . t h r e h o l d P(Z_k\neq Z_{k \neq1}|x) . threhold P(Zk=Zk=1∣x).threhold
对于目标函数 P ( Z k , x 1 : k ) P(Z_k,x_{1:k}) P(Zk,x1:k)
构造
P
(
Z
k
,
x
1
:
k
)
=
[
]
⋅
P
(
Z
k
−
1
,
x
1
:
k
−
1
)
P(Z_k,x_{1:k})=[\ ]·P(Z_{k-1},x_{1:k-1})
P(Zk,x1:k)=[ ]⋅P(Zk−1,x1:k−1)m,并且把
Z
k
−
1
Z_{k-1}
Zk−1边缘化
P
(
Z
k
,
x
1
:
k
)
=
∑
Z
k
−
1
P
(
Z
k
−
1
,
Z
k
,
x
1
:
k
−
1
)
=
∑
Z
k
−
1
P
(
Z
k
−
1
,
x
1
:
k
−
1
)
⋅
P
(
Z
k
∣
Z
k
−
1
,
x
1
:
k
−
1
)
⋅
P
(
x
k
∣
Z
k
,
Z
k
−
1
,
x
1
=
k
−
1
)
=
∑
Z
k
−
1
P
(
Z
k
−
1
,
x
1
:
k
−
1
)
⋅
P
(
Z
k
∣
Z
k
−
1
)
⏟
A
⋅
P
(
x
k
∣
Z
k
)
⏟
B
\begin{aligned} P(Z_k,x_{1:k})&=\sum_{Z_{k-1}}P(Z_{k-1},Z_k,x_{1:k-1}) \\ &=\sum_{Z_{k-1}}P(Z_{k-1},x_{1:k-1})·P(Z_k|Z_{k-1},x_{1:k-1}) ·P(x_k|Z_k,Z_{k-1},x_{1=k-1}) \\ &=\sum_{Z_{k-1}}\underbrace {P(Z_{k-1},x_{1:k-1})·P(Z_k|Z_{k-1})}_A ·\underbrace {P(x_k|Z_k) }_B \end{aligned}
P(Zk,x1:k)=Zk−1∑P(Zk−1,Zk,x1:k−1)=Zk−1∑P(Zk−1,x1:k−1)⋅P(Zk∣Zk−1,x1:k−1)⋅P(xk∣Zk,Zk−1,x1=k−1)=Zk−1∑A
P(Zk−1,x1:k−1)⋅P(Zk∣Zk−1)⋅B
P(xk∣Zk)
重新整理
α
k
(
Z
k
)
=
∑
Z
k
−
1
α
k
−
1
(
Z
k
−
1
)
⋅
P
(
Z
k
∣
Z
k
−
1
)
⋅
P
(
x
k
∣
Z
k
)
\alpha_k(Z_k)=\sum_{Z_{k-1}}\alpha_{k-1}(Z_{k-1})·P(Z_k|Z_{k-1})·P(x_k|Z_k)
αk(Zk)=∑Zk−1αk−1(Zk−1)⋅P(Zk∣Zk−1)⋅P(xk∣Zk)
那么D-seperration可以这样表示
α 1 ( Z 1 ) = P ( Z 1 , x ) = P ( Z 1 ) ⏟ π ⋅ P ( x 1 ∣ Z 1 ) ⏟ B \alpha_1(Z_1)=P(Z_1,x)=\underbrace {P(Z_1)}_\pi· \underbrace {P(x_1|Z_1)}_B α1(Z1)=P(Z1,x)=π P(Z1)⋅B P(x1∣Z1)
对于目标函数 P ( x k + 1 : n ∣ Z k ) P(x_{k+1:n|Z_k}) P(xk+1:n∣Zk)
构造
P
(
x
k
+
1
:
n
∣
Z
k
)
=
[
]
⋅
P
(
x
k
+
2
:
n
∣
Z
k
+
1
)
P(x_{k+1:n}|Z_k)=[\ ]·P(x_{k+2:n}|Z_{k+1})
P(xk+1:n∣Zk)=[ ]⋅P(xk+2:n∣Zk+1),并且边缘化
Z
k
+
1
Z_{k+1}
Zk+1,注意乘
P
(
Z
k
)
P(Z_k)
P(Zk)
P
(
x
k
+
1
:
n
∣
Z
k
)
=
∑
Z
k
+
1
P
(
x
k
+
1
:
n
.
Z
k
+
1
∣
Z
k
)
=
∑
Z
k
+
1
⋅
P
(
x
k
+
2
:
n
∣
Z
k
+
1
,
Z
k
,
x
k
+
1
)
⋅
P
(
x
k
+
1
∣
Z
k
+
1
,
Z
k
)
⋅
P
(
Z
k
+
1
∣
Z
k
)
=
∑
Z
k
+
1
⋅
P
(
x
k
+
2
:
n
∣
Z
k
+
1
)
⏟
B
⋅
P
(
x
k
+
1
∣
Z
k
+
1
)
⋅
P
(
Z
k
+
1
∣
Z
k
)
⏟
A
\begin{aligned} P(x_{k+1:n}|Z_k) & = \sum_{Z_{k+1}}P(x_{k+1:n}.Z_{k+1}|Z_k) \\ &= \sum_{Z_{k+1}}·P(x_{k+2:n}|Z_{k+1},Z_k,x_{k+1})·P(x_{k+1}|Z_{k+1},Z_k)·P(Z_{k+1}|Z_k) \\ &=\sum_{Z_{k+1}}·\underbrace {P(x_{k+2:n}|Z_{k+1})}_B·\underbrace {P(x_{k+1}|Z_{k+1})·P(Z_{k+1}|Z_k)}_A \end{aligned}
P(xk+1:n∣Zk)=Zk+1∑P(xk+1:n.Zk+1∣Zk)=Zk+1∑⋅P(xk+2:n∣Zk+1,Zk,xk+1)⋅P(xk+1∣Zk+1,Zk)⋅P(Zk+1∣Zk)=Zk+1∑⋅B
P(xk+2:n∣Zk+1)⋅A
P(xk+1∣Zk+1)⋅P(Zk+1∣Zk)
重新整理
β
k
(
Z
k
)
=
∑
Z
k
+
1
β
k
+
1
(
Z
k
+
1
)
⋅
P
(
Z
k
∣
Z
k
−
1
)
⋅
P
(
x
k
+
1
∣
Z
k
+
1
)
\beta_k(Z_k)=\sum_{Z_{k+1}}\beta_{k+1}(Z_{k+1})·P(Z_k|Z_{k-1})·P(x_{k+1}|Z_{k+1})
βk(Zk)=∑Zk+1βk+1(Zk+1)⋅P(Zk∣Zk−1)⋅P(xk+1∣Zk+1),时间复杂度
O
(
n
⋅
m
2
)
O(n·m^2)
O(n⋅m2)
哈哈哈,画图,推导,化简,降低时间复杂度一气呵成
错误还请大佬们指出,推导不一定正确,爱好,纯属爱好,记录美好生活,从点滴做起!
另可参看本人知乎,博客文章,互相交流,原创不易,欢迎转载,转载请注明来源!!!