Markov Models
可以被视为:chain like , infinite length bayes net
模型构成
state: value of x at a given time
initial distribution
transition model
stationary assumption: 转移概率保持不变
Markov property: past and future are independent given present
注:second order distribution
given
x
t
−
1
,
x
t
−
2
x_{t-1},x_{t-2}
xt−1,xt−2,条件独立
The Mini-Forward Algorithm
we can sum out and marginalize:
o
(
d
j
)
o(d^j)
o(dj)存储
p
(
w
i
+
1
)
=
∑
w
i
p
(
w
i
)
p
(
w
i
+
1
∣
w
i
)
p(w_{i+1})=\sum_{w_i}p(w_i)p(w_{i+1}|w_i)
p(wi+1)=∑wip(wi)p(wi+1∣wi)
iterative
Stationary distribution
the distribution we end up in is independent of the initial distribution
influence of the initial distribution get less and less over time
可以解方程求解
Hidden Markov Models
observe some evidence at timestep influence the belief distribution
模型构成
W
i
W_i
Wi:state variable
F
i
F_i
Fi:evidence variable
initial distribution
transition model : stationary
emission probabilities:
P
(
E
t
∣
W
t
)
P(E_t|W_t)
P(Et∣Wt) stationary as well
markov property :
w
i
+
1
⊥
{
w
0
,
w
1
,
⋯
,
w
i
−
1
}
∣
w
i
w_{i+1}\perp \{w_0,w_1,\cdots,w_{i-1}\}|w_i
wi+1⊥{w0,w1,⋯,wi−1}∣wi
f
i
⊥
{
w
0
,
f
0
,
w
1
,
f
1
,
⋯
,
w
i
−
1
,
f
i
−
1
}
∣
w
i
f_{i}\perp \{w_0,f_0,w_1,f_1,\cdots,w_{i-1},f_{i-1}\}|w_i
fi⊥{w0,f0,w1,f1,⋯,wi−1,fi−1}∣wi
property
P
(
X
1
,
E
1
,
⋯
,
X
t
,
E
t
)
=
P
(
X
1
)
P
(
E
1
∣
X
1
)
∏
i
=
2
t
P
(
X
t
∣
X
t
−
1
)
P
(
E
t
∣
X
t
)
P(X_1,E_1,\cdots,X_t,E_t)=P(X_1)P(E_1|X_1)\prod \limits_{i=2}^tP(X_t|X_{t-1})P(E_t|X_{t})
P(X1,E1,⋯,Xt,Et)=P(X1)P(E1∣X1)i=2∏tP(Xt∣Xt−1)P(Et∣Xt)
belief distribution:
B
(
W
i
)
=
p
(
w
i
∣
f
1
,
⋯
,
f
i
)
B(W_i)=p(w_i|f_1,\cdots,f_i)
B(Wi)=p(wi∣f1,⋯,fi)
B
′
(
W
i
)
=
p
(
w
i
∣
f
1
,
⋯
,
f
i
−
1
)
B'(W_i)=p(w_i|f_1,\cdots,f_{i-1})
B′(Wi)=p(wi∣f1,⋯,fi−1)
The forward Algorithm
belief distribution:
B
(
X
t
)
B(X_t)
B(Xt)
time elapse update
B
′
(
W
i
+
1
)
=
p
(
W
i
+
1
∣
f
1
:
i
)
=
∑
w
i
p
(
W
i
+
1
∣
w
i
)
B
(
w
i
)
B'(W_{i+1})=p(W_{i+1}|f_{1:i})=\sum_{w_i}p(W_{i+1}|w_i)B(w_i)
B′(Wi+1)=p(Wi+1∣f1:i)=∑wip(Wi+1∣wi)B(wi)
我们可以算到最后,进行一次归一化
observation update
B
(
W
i
+
1
)
∝
p
(
f
i
+
1
∣
w
i
+
1
)
B
′
(
W
i
+
1
)
B(W_{i+1})\propto p(f_{i+1}|w_{i+1})B'(W_{i+1})
B(Wi+1)∝p(fi+1∣wi+1)B′(Wi+1)
Filtering
B
(
X
t
)
=
P
(
X
t
∣
e
1
:
t
)
B(X_t)=P(X_t|e_{1:t})
B(Xt)=P(Xt∣e1:t)
B
′
(
X
t
)
=
P
(
X
t
∣
e
1
:
t
−
1
)
B'(X_t)=P(X_t|e_{1:t-1})
B′(Xt)=P(Xt∣e1:t−1)
B
′
(
X
t
+
1
)
=
∑
x
t
B
(
x
t
)
P
(
X
t
+
1
∣
x
t
)
B'(X_{t+1})=\sum_{x_t}B(x_t)P(X_{t+1}|x_t)
B′(Xt+1)=∑xtB(xt)P(Xt+1∣xt)
B
(
X
t
+
1
)
=
P
(
X
t
+
1
∣
e
1
:
t
+
1
)
∝
P
(
X
t
+
1
,
e
t
+
1
∣
e
1
:
t
)
=
P
(
X
t
+
1
∣
e
1
:
t
)
P
(
e
t
+
1
∣
X
t
+
1
)
=
B
′
(
X
t
+
1
)
P
(
e
t
+
1
∣
X
t
+
1
)
B(X_{t+1})= P(X_{t+1}|e_{1:t+1})\propto P(X_{t+1},e_{t+1}|e_{1:t})= P(X_{t+1}|e_{1:t})P(e_{t+1}|X_{t+1})=B'(X_{t+1})P(e_{t+1}|X_{t+1})
B(Xt+1)=P(Xt+1∣e1:t+1)∝P(Xt+1,et+1∣e1:t)=P(Xt+1∣e1:t)P(et+1∣Xt+1)=B′(Xt+1)P(et+1∣Xt+1)
B
(
X
t
+
1
)
∝
P
(
e
t
+
1
∣
X
t
+
1
)
∑
x
t
B
(
X
t
)
P
(
X
t
+
1
∣
X
t
)
B(X_{t+1})\propto P(e_{t+1}|X_{t+1})\sum_{x_t}B(X_t)P(X_{t+1}|X_t)
B(Xt+1)∝P(et+1∣Xt+1)∑xtB(Xt)P(Xt+1∣Xt)
other tasks for HMM
- flittering: P ( X t ∣ e 1 : t ) P(X_t|e_{1:t}) P(Xt∣e1:t)
- prediction: P ( X t + k ∣ e 1 : t ) P(X_{t+k}|e_{1:t}) P(Xt+k∣e1:t)
- smoothing:
P
(
X
k
∣
e
1
:
t
)
P(X_{k}|e_{1:t})
P(Xk∣e1:t)
compute the posterior disrtibution over a past state, given all evidence up to the present - most likely explanation:
a
r
g
m
a
x
x
1
:
N
p
(
x
1
:
N
∣
e
1
:
N
)
=
a
r
g
m
a
x
x
1
:
N
p
(
x
1
:
N
,
e
1
:
N
)
argmax_{x_{1:N}}p(x_{1:N}|e_{1:N})=argmax_{x_{1:N}}p(x_{1:N},e_{1:N})
argmaxx1:Np(x1:N∣e1:N)=argmaxx1:Np(x1:N,e1:N)
given a sequence of observations, find the sequence of states that is most likely to have generated those observations
Viterbi Algorithm
goal: compute
a
r
g
m
a
x
x
1
:
N
p
(
x
1
:
N
∣
e
1
:
N
)
=
a
r
g
m
a
x
x
1
:
N
p
(
x
1
:
N
,
e
1
:
N
)
argmax_{x_{1:N}}p(x_{1:N}|e_{1:N})=argmax_{x_{1:N}}p(x_{1:N},e_{1:N})
argmaxx1:Np(x1:N∣e1:N)=argmaxx1:Np(x1:N,e1:N)
直接计算联合分布,要占据过多的存储空间,因此使用dynam programming
m
t
[
x
t
]
=
m
a
x
x
1
:
t
−
1
p
(
x
1
:
t
,
e
1
:
t
)
m_t[x_t]=max_{x_{1:t-1}}p(x_{1:t},e_{1:t})
mt[xt]=maxx1:t−1p(x1:t,e1:t)
m
t
[
x
t
]
=
m
a
x
x
1
:
t
−
1
p
(
e
t
∣
x
t
)
p
(
x
t
∣
x
t
−
1
)
p
(
x
1
:
t
−
1
,
e
1
:
t
−
1
)
=
p
(
e
t
∣
x
t
)
m
a
x
x
t
−
1
p
(
x
t
∣
x
t
−
1
)
m
a
x
x
1
:
t
−
2
p
(
x
1
:
t
−
1
,
e
1
:
t
−
1
)
m_t[x_t]=max_{x_{1:t-1}}p(e_t|x_t)p(x_t|x_{t-1})p(x_{1:t-1},e_{1:t-1})=p(e_t|x_t)max_{x_{t-1}}p(x_t|x_{t-1})max_{x_{1:t-2}}p(x_{1:t-1},e_{1:t-1})
mt[xt]=maxx1:t−1p(et∣xt)p(xt∣xt−1)p(x1:t−1,e1:t−1)=p(et∣xt)maxxt−1p(xt∣xt−1)maxx1:t−2p(x1:t−1,e1:t−1)
polynomial space and time
Particle filtering
simulate the motion of a set of particles through a state graph to approximate the probability
store a list of n particles ,
n
<
<
d
n<<d
n<<d,but still enough to estimate
d:domain
Particle filtering simulation
|X|太大,不能存储
B
(
X
)
B(X)
B(X)
e.g. 连续情况
initialization:
没有固定要求
we can sample randomly, uniformly or from the initial distribution
∣
N
∣
<
<
∣
X
∣
|N|<<|X|
∣N∣<<∣X∣
time elapse update:
update according to the transition model
observation update:
在sample前乘权重
p
(
e
i
∣
w
i
)
p(e_i|w_i)
p(ei∣wi)