接上篇,现在考虑给
w
\boldsymbol{w}
w加入先验,考虑最简单的假设,也就是
w
\boldsymbol{w}
w服从均值为0,协方差矩阵为
α
−
1
I
\alpha^{-1}\boldsymbol{I}
α−1I的高斯分布。
p
(
w
∣
α
)
=
N
(
w
∣
0
,
α
−
1
I
)
=
(
α
2
π
)
(
M
+
1
)
/
2
exp
{
−
α
2
w
T
w
}
\begin{aligned} p(\boldsymbol{w}|\alpha)&=\mathcal{N}(\boldsymbol{w}|0,\alpha^{-1}\boldsymbol{I})\\ &=(\frac{\alpha}{2\pi})^{(M+1)/2}\exp\{-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w}\} \end{aligned}
p(w∣α)=N(w∣0,α−1I)=(2πα)(M+1)/2exp{−2αwTw}我们一步一步看一下给定
(
x
,
t
,
α
,
β
)
(\boldsymbol{x},\boldsymbol{t},\alpha,\beta)
(x,t,α,β)后,参数
w
\boldsymbol{w}
w的概率
p
(
w
∣
t
)
=
p
(
t
∣
w
)
p
(
w
)
p
(
t
)
p
(
w
∣
t
,
x
,
α
,
β
)
=
p
(
t
∣
w
,
x
,
α
,
β
)
p
(
w
∣
x
,
α
,
β
)
p
(
t
∣
x
,
α
,
β
)
\begin{aligned} p(\boldsymbol{w}|\boldsymbol{t})&=\frac{p(\boldsymbol{t}|\boldsymbol{w})p(\boldsymbol{w})}{p(\boldsymbol{t})}\\ p(\boldsymbol{w}|\boldsymbol{t},\boldsymbol{x},\alpha,\beta)&=\frac{p(\boldsymbol{t}|\boldsymbol{w},\boldsymbol{x},\alpha,\beta)p(\boldsymbol{w}|\boldsymbol{x},\alpha,\beta)}{p(\boldsymbol{t}|\boldsymbol{x},\alpha,\beta)} \end{aligned}
p(w∣t)p(w∣t,x,α,β)=p(t)p(t∣w)p(w)=p(t∣x,α,β)p(t∣w,x,α,β)p(w∣x,α,β)
由于
α
\alpha
α和
t
t
t独立,因此上式似然函数
p
(
t
∣
w
,
x
,
α
,
β
)
=
p
(
t
∣
w
,
x
,
β
)
p(\boldsymbol{t}|\boldsymbol{w},\boldsymbol{x},\alpha,\beta)=p(\boldsymbol{t}|\boldsymbol{w},\boldsymbol{x},\beta)
p(t∣w,x,α,β)=p(t∣w,x,β),而
w
\boldsymbol{w}
w的先验我们已经有了假设,因此得到书上的结果(此处个人理解):
p
(
w
∣
x
,
t
,
α
,
β
)
∝
p
(
t
∣
x
,
w
,
β
)
p
(
w
∣
α
)
p(\boldsymbol{w}|\boldsymbol{x},\boldsymbol{t},\alpha,\beta)\propto p(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)p(\boldsymbol{w}|\alpha)
p(w∣x,t,α,β)∝p(t∣x,w,β)p(w∣α)
现在成了,我们最大化后验概率求
w
\boldsymbol{w}
w,变成了最大化似然函数
p
(
t
∣
x
,
w
,
β
)
p(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)
p(t∣x,w,β)和先验概率
p
(
w
∣
α
)
p(\boldsymbol{w}|\alpha)
p(w∣α)乘积的值。由于
p
(
t
∣
x
,
w
,
β
)
=
∏
n
=
1
N
N
(
t
n
∣
y
(
x
n
,
w
)
,
β
−
1
)
=
∏
n
=
1
N
1
(
2
π
)
1
2
β
−
1
2
e
x
p
(
t
n
−
y
(
x
n
,
w
)
)
2
−
2
β
−
1
p(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)=\prod_{n=1}^N\mathcal{N}(t_n|y(x_n,\boldsymbol{w}),\beta^{-1})=\prod_{n=1}^N\frac{1}{(2\pi)^{\frac{1}{2}}\beta^{-\frac{1}{2}}}exp{\frac{(t_n-y(x_n,\boldsymbol{w}))^2}{-2\beta^{-1}}}
p(t∣x,w,β)=n=1∏NN(tn∣y(xn,w),β−1)=n=1∏N(2π)21β−211exp−2β−1(tn−y(xn,w))2
p
(
w
∣
α
)
=
N
(
w
∣
0
,
α
−
1
I
)
=
(
α
2
π
)
(
M
+
1
)
/
2
exp
{
−
α
2
w
T
w
}
\begin{aligned} p(\boldsymbol{w}|\alpha)&=\mathcal{N}(\boldsymbol{w}|0,\alpha^{-1}\boldsymbol{I})\\ &=(\frac{\alpha}{2\pi})^{(M+1)/2}\exp\{-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w}\} \end{aligned}
p(w∣α)=N(w∣0,α−1I)=(2πα)(M+1)/2exp{−2αwTw}
因此
p
(
t
∣
x
,
w
,
β
)
p
(
w
∣
α
)
=
[
∏
n
=
1
N
1
(
2
π
)
1
2
β
−
1
2
e
x
p
(
t
n
−
y
(
x
n
,
w
)
)
2
−
2
β
−
1
]
(
α
2
π
)
(
M
+
1
)
/
2
exp
{
−
α
2
w
T
w
}
\begin{aligned} p(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)p(\boldsymbol{w}|\alpha)& =\left[\prod_{n=1}^N\frac{1}{(2\pi)^{\frac{1}{2}}\beta^{-\frac{1}{2}}}exp{\frac{(t_n-y(x_n,\boldsymbol{w}))^2}{-2\beta^{-1}}}\right] \left(\frac{\alpha}{2\pi}\right)^{(M+1)/2}\exp\{-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w}\} \end{aligned}
p(t∣x,w,β)p(w∣α)=[n=1∏N(2π)21β−211exp−2β−1(tn−y(xn,w))2](2πα)(M+1)/2exp{−2αwTw}两边取ln可得
ln
p
(
t
∣
x
,
w
,
β
)
p
(
w
∣
α
)
=
−
β
2
∑
n
=
1
N
{
y
(
x
n
,
w
)
−
t
n
}
2
+
N
2
ln
β
−
N
2
ln
(
2
π
)
+
M
+
1
2
ln
α
−
M
+
1
2
ln
2
π
−
α
2
w
T
w
\begin{aligned} \ln{p}(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)p(\boldsymbol{w}|\alpha) &=-\frac{\beta}{2}\sum_{n=1}^N\{y(x_n,\boldsymbol{w})-t_n\}^2+\frac{N}{2}\ln{\beta}-\frac{N}{2}\ln{(2\pi)} +\frac{M+1}{2}\ln{\alpha}-\frac{M+1}{2}\ln{2\pi}-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w} \end{aligned}
lnp(t∣x,w,β)p(w∣α)=−2βn=1∑N{y(xn,w)−tn}2+2Nlnβ−2Nln(2π)+2M+1lnα−2M+1ln2π−2αwTw我们现在要找的是最可能的
w
\boldsymbol{w}
w的值,因此只考虑与
w
\boldsymbol{w}
w有关的部门,去掉常数可得:
ln
p
(
t
∣
x
,
w
,
β
)
p
(
w
∣
α
)
=
−
β
2
∑
n
=
1
N
{
y
(
x
n
,
w
)
−
t
n
}
2
−
α
2
w
T
w
\begin{aligned} \ln{p}(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)p(\boldsymbol{w}|\alpha)&=-\frac{\beta}{2}\sum_{n=1}^N\{y(x_n,\boldsymbol{w})-t_n\}^2-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w} \end{aligned}
lnp(t∣x,w,β)p(w∣α)=−2βn=1∑N{y(xn,w)−tn}2−2αwTw这就相当于最小化
β
2
∑
n
=
1
N
{
y
(
x
n
,
w
)
−
t
n
}
2
+
α
2
w
T
w
\frac{\beta}{2}\sum_{n=1}^N\{y(x_n,\boldsymbol{w})-t_n\}^2+\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w}
2βn=1∑N{y(xn,w)−tn}2+2αwTw
PRML笔记2-关于回归参数w的先验的理解
于 2023-02-20 19:35:28 首次发布