符号定义
令 ω \omega ω 为需要求解的模型参数集合, D = { D 1 , D 2 , ⋯ , D n } D = \{ D_1, D_2, \cdots, D_n \} D={D1,D2,⋯,Dn} 为训练样本集合。
极大似然估计(MLE)
极大似然估计可认为是求解以下公式:
M
L
E
:
arg
max
p
(
D
∣
ω
)
=
arg
max
p
(
{
D
1
,
D
2
,
⋯
,
D
n
}
∣
ω
)
=
arg
max
∏
i
=
1
n
p
(
D
i
∣
ω
)
=
arg
max
∑
i
=
1
n
log
p
(
D
i
∣
ω
)
\begin{aligned} MLE :\ &\arg\max p(D | \omega) \\ = &\arg\max p(\{ D_1, D_2, \cdots, D_n \} | \omega) \\ = &\arg \max \prod_{i=1}^n p(D_i | \omega) \\ = &\arg \max \sum_{i=1}^n \log p(D_i | \omega) \end{aligned}
MLE: ===argmaxp(D∣ω)argmaxp({D1,D2,⋯,Dn}∣ω)argmaxi=1∏np(Di∣ω)argmaxi=1∑nlogp(Di∣ω)
将
∑
i
=
1
n
log
p
(
D
i
∣
ω
)
\sum_{i=1}^n \log p(D_i | \omega)
∑i=1nlogp(Di∣ω) 记为
t
m
l
e
t_{mle}
tmle,得
M
L
E
:
arg
max
t
m
l
e
MLE : \arg\max t_{mle}
MLE:argmaxtmle。
最大后验概率(MAP)
最大后验概率可认为是求解以下公式:
M
A
P
:
arg
max
p
(
ω
∣
D
)
=
arg
max
p
(
D
∣
ω
)
p
(
ω
)
p
(
D
)
\begin{aligned} MAP :\ & \arg\max p(\omega | D) \\ =& \arg\max \frac{p(D | \omega) p(\omega)}{p(D)} \end{aligned}
MAP: =argmaxp(ω∣D)argmaxp(D)p(D∣ω)p(ω)
由于训练求解的是模型参数,故
M
A
P
:
arg
max
p
(
D
∣
ω
)
p
(
ω
)
=
arg
max
(
log
p
(
D
∣
ω
)
+
log
p
(
ω
)
)
=
arg
max
(
∑
i
=
1
n
log
p
(
D
i
∣
ω
)
+
log
p
(
ω
)
)
=
arg
max
(
t
m
l
e
+
log
p
(
ω
)
)
\begin{aligned} MAP :\ &\arg\max p(D | \omega) p(\omega) \\ = &\arg\max (\log p(D | \omega) + \log p(\omega)) \\ = &\arg\max (\sum_{i=1}^n \log p(D_i | \omega) + \log p(\omega)) \\ = &\arg\max (t_{mle} + \log p(\omega)) \end{aligned}
MAP: ===argmaxp(D∣ω)p(ω)argmax(logp(D∣ω)+logp(ω))argmax(i=1∑nlogp(Di∣ω)+logp(ω))argmax(tmle+logp(ω))
假设模型参数遵从的先验分布为高斯(正态)分布,即
ω
∼
N
(
0
,
1
)
\omega \sim N(0, 1)
ω∼N(0,1),得
p
(
ω
)
=
1
2
π
e
−
ω
2
2
p(\omega) = \frac{1}{\sqrt{2\pi}}e^{-\frac{\omega^2}{2}}
p(ω)=2π1e−2ω2,故
M
A
P
:
arg
max
(
t
m
l
e
+
log
1
2
π
e
−
ω
2
2
)
=
arg
max
(
t
m
l
e
−
1
2
2
π
ω
2
)
=
arg
max
(
t
m
l
e
+
λ
ω
2
)
\begin{aligned} MAP :\ & \arg \max (t_{mle} + \log \frac{1}{\sqrt{2\pi}}e^{-\frac{\omega^2}{2}}) \\ = & \arg \max (t_{mle} -\frac{1}{2\sqrt{2\pi}} \omega^2) \\ = & \arg \max (t_{mle} + \lambda \omega^2) \end{aligned}
MAP: ==argmax(tmle+log2π1e−2ω2)argmax(tmle−22π1ω2)argmax(tmle+λω2)
假设模型参数遵从的先验分布为拉普拉斯分布,即
ω
∼
L
a
(
0
,
1
)
\omega \sim La(0, 1)
ω∼La(0,1),得
p
(
ω
)
=
1
2
e
−
∣
ω
∣
p(\omega) = \frac{1}{2}e^{-|\omega|}
p(ω)=21e−∣ω∣,故
M
A
P
:
arg
max
(
t
m
l
e
+
log
1
2
e
−
∣
ω
∣
)
=
arg
max
(
t
m
l
e
−
1
2
∣
ω
∣
)
=
arg
max
(
t
m
l
e
+
λ
∣
ω
∣
)
\begin{aligned} MAP :\ & \arg \max (t_{mle} + \log \frac{1}{2}e^{-|\omega|}) \\ =& \arg \max (t_{mle} -\frac{1}{2} |\omega|) \\ =& \arg \max (t_{mle} + \lambda \vert \omega\vert) \end{aligned}
MAP: ==argmax(tmle+log21e−∣ω∣)argmax(tmle−21∣ω∣)argmax(tmle+λ∣ω∣)
联系
最终,我们可以发现最大后验概率(MAP)相当于在极大似然估计(MLE)的基础上增加了一个正则项,若参数先验遵从正态分布,则为L2正则项,若遵从拉普拉斯分布,则为L1正则项。