线性模型参数求解的最大似然估计、MAP估计、正则最小二乘估计

14 篇文章 3 订阅

\qquad 本文主要描述针对线性回归模型 y ( x , w ) = w T ϕ ( x ) y(\boldsymbol x,\boldsymbol w)=\boldsymbol w^T\boldsymbol\phi(\boldsymbol x) y(x,w)=wTϕ(x) 采用最大似然估计最大后验估计,以及正则最小二乘估计在求取参数 w \boldsymbol w w 值的过程中的联系。
\qquad

1. 线性回归的概率模型

\qquad 广义的线性回归函数可定义为:

y ( x , w ) = ∑ j = 1 M w j ϕ j ( x ) = w T ϕ ( x ) \qquad\qquad y(\boldsymbol x,\boldsymbol w)=\displaystyle\sum_{j=1}^Mw_j\phi_j(\boldsymbol x)=\boldsymbol w^T\boldsymbol\phi(\boldsymbol x) y(x,w)=j=1Mwjϕj(x)=wTϕ(x)

\qquad 其中,权向量 w = [ w 1 , ⋯   , w M ] T \boldsymbol w=[w_1,\cdots,w_M]^T w=[w1,,wM]T观测数据 x = [ x 1 , ⋯   , x D ] T \boldsymbol x=[x_{1},\cdots,x_{D}]^T x=[x1,,xD]T基函数 ϕ ( x ) = [ ϕ 1 ( x ) , ⋯   , ϕ M ( x ) ] T \boldsymbol\phi(\boldsymbol x)=[\phi_{1}(\boldsymbol x),\cdots,\phi_{M}(\boldsymbol x)]^T ϕ(x)=[ϕ1(x),,ϕM(x)]T

\qquad
\qquad 假设由线性回归函数 y ( x , w ) y(\boldsymbol x,\boldsymbol w) y(x,w) 所刻画的目标变量 t t t 满足:

t = y ( x , w ) + ε \qquad\qquad t=y(\boldsymbol x,\boldsymbol w)+\varepsilon t=y(x,w)+ε

\qquad 已知观测误差 ε ∼ N ( 0 , σ 2 ) \varepsilon\sim\mathcal N(0,\sigma^2) εN(0,σ2),即: p ( ε ) = 1 2 π σ exp ⁡ ( − ε 2 2 σ 2 ) p(\varepsilon)=\dfrac{1}{\sqrt{2\pi}\sigma}\exp\left({-\dfrac{\varepsilon^2}{2\sigma^2}}\right) p(ε)=2π σ1exp(2σ2ε2)

\qquad

  • 考虑一个观测数据集 X = [ x 1 , ⋯   , x N ] \bold X=[\boldsymbol x_1,\cdots,\boldsymbol x_N] X=[x1,,xN],对应目标值向量 t = [ t 1 , ⋯   , t N ] T \bold t=[t_1,\cdots,t_N]^T t=[t1,,tN]T,那么 i i i 个目标变量 t i t_i ti 满足:

t i = ∑ j = 1 M w j ϕ j ( x i ) + ε i \qquad\qquad t_i=\displaystyle\sum_{j=1}^Mw_j\phi_j(\boldsymbol x_i)+\varepsilon_i ti=j=1Mwjϕj(xi)+εi 或者  t i = w T ϕ ( x i ) + ε i t_i=\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)+\varepsilon_i ti=wTϕ(xi)+εi

\qquad 其中,权向量 w = [ w 1 , ⋯   , w M ] T \boldsymbol w=[w_1,\cdots,w_M]^T w=[w1,,wM]T,第 i i i 个观测数据 x i = [ x i 1 , ⋯   , x i D ] T \boldsymbol x_i=[x_{i1},\cdots,x_{iD}]^T xi=[xi1,,xiD]T
\qquad

对于基本线性模型 t = ∑ j = 1 M w j x j + ε t=\displaystyle\sum_{j=1}^Mw_jx_{j}+\varepsilon t=j=1Mwjxj+ε,此时 M = D M=D M=D
基函数满足 ϕ j ( x ) = x j \phi_j(\boldsymbol x)=x_{j} ϕj(x)=xj,因而 ϕ ( x i ) = [ x i 1 , ⋯   , x i j , ⋯   , x i M ] T = x i \boldsymbol\phi(\boldsymbol x_i)=[x_{i1},\cdots,x_{ij},\cdots,x_{iM}]^T=\boldsymbol x_i ϕ(xi)=[xi1,,xij,,xiM]T=xi,那么:
 
t i = ∑ j = 1 M w j x i j + ε i \qquad t_i=\displaystyle\sum_{j=1}^Mw_jx_{ij}+\varepsilon_i ti=j=1Mwjxij+εi 或者  t i = w T x i + ε i t_i=\boldsymbol w^T\boldsymbol x_i+\varepsilon_i ti=wTxi+εi

\qquad

  • 由于观测误差 ε i ∼ N ( 0 , σ 2 ) \varepsilon_i\sim\mathcal N(0,\sigma^2) εiN(0,σ2),可得到第 i i i 个目标变量 t i t_i ti 的似然函数:

p ( t i ∣ w , x i ) = 1 2 π σ exp ⁡ { − [ t i − w T ϕ ( x i ) ] 2 2 σ 2 } \qquad\qquad p(t_i|\boldsymbol w,\boldsymbol x_i)=\dfrac{1}{\sqrt{2\pi}\sigma}\exp\left\{-\dfrac{\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]^2}{2\sigma^2}\right\} p(tiw,xi)=2π σ1exp{2σ2[tiwTϕ(xi)]2}

\qquad

  • 一般假设观测误差 ε i \varepsilon_i εi 满足独立同分布,可得到关于所有观测数据集 X \bold X X 的目标值向量 t = [ t 1 , ⋯   , t N ] T \bold t=[t_1,\cdots,t_N]^T t=[t1,,tN]T 的似然函数(参数为权向量 w \boldsymbol w w X \bold X X 表示数据集):

p ( t ∣ w , X ) = p ( t 1 , ⋯   , t N ∣ w , X ) = ∏ i = 1 N p ( t i ∣ w , x i ) = 1 ( 2 π σ ) N exp ⁡ { − 1 2 σ 2 ∑ i = 1 N [ t i − w T ϕ ( x i ) ] 2 } \qquad\qquad\begin{aligned}p(\bold t|\boldsymbol w,\bold X)&=p(t_1,\cdots,t_N|\boldsymbol w,\bold X)\\ &=\displaystyle\prod_{i=1}^N p(t_i|\boldsymbol w,\boldsymbol x_i)\\ &=\dfrac{1}{(\sqrt{2\pi}\sigma)^N}\exp\left\{-\dfrac{1}{2\sigma^2}\sum\limits_{i=1}^N\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]^2\right\}\end{aligned} p(tw,X)=p(t1,,tNw,X)=i=1Np(tiw,xi)=(2π σ)N1exp{2σ21i=1N[tiwTϕ(xi)]2}

显然,此处的似然都是指 p ( t ∣ w , X ) p(\bold t|\boldsymbol w,\bold X) p(tw,X) 随着参数 w \boldsymbol w w 变化时的,而数据集 X \bold X X 仅表示条件。

\qquad

2. 最大似然估计

\qquad 在得到了所有观测变量的联合概率密度 p ( t ∣ w , X ) p(\bold t|\boldsymbol w,\bold X) p(tw,X) 之后,最直接的方法就是采用最大似然估计求取权向量 w \boldsymbol w w 的解。

\qquad 对联合概率密度取对数,构成对数似然函数:

ln ⁡ p ( t ∣ w , X ) = ln ⁡ 1 ( 2 π σ ) N − 1 2 σ 2 ∑ i = 1 N [ t i − w T ϕ ( x i ) ] 2 = − N 2 ln ⁡ ( 2 π ) − N ln ⁡ σ − 1 2 σ 2 ∑ i = 1 N [ t i − w T ϕ ( x i ) ] 2 \qquad\qquad\begin{aligned}\ln p(\bold t|\boldsymbol w,\bold X)&=\ln\dfrac{1}{(\sqrt{2\pi}\sigma)^N}-\dfrac{1}{2\sigma^2}\sum\limits_{i=1}^N\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]^2\\ &=-\dfrac{N}{2}\ln(2\pi)-N\ln\sigma-\dfrac{1}{2\sigma^2}\sum\limits_{i=1}^N\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]^2\end{aligned} lnp(tw,X)=ln(2π σ)N12σ21i=1N[tiwTϕ(xi)]2=2Nln(2π)Nlnσ2σ21i=1N[tiwTϕ(xi)]2

观测数据集 X \bold X X 出现在条件变量的位置,对于最大似然估计的求解而言可以忽略。

\qquad 采用最大似然估计求取作为参数的权向量 w \boldsymbol w w 的值,也就是:

∇ w ln ⁡ p ( t ∣ w , X ) = 1 σ 2 ∑ i = 1 N [ t i − w T ϕ ( x i ) ] ϕ ( x i ) = 0 \qquad\qquad\nabla_{\boldsymbol w}\ln p(\bold t|\boldsymbol w,\bold X)=\dfrac{1}{\sigma^2}\displaystyle\sum_{i=1}^N\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]\boldsymbol\phi(\boldsymbol x_i)=0 wlnp(tw,X)=σ21i=1N[tiwTϕ(xi)]ϕ(xi)=0

可以看出,在假设观测误差 ε \varepsilon ε 满足高斯噪声的情况下:
最大化对数似然函数 ln ⁡ p ( t ∣ w , X ) \ln p(\bold t|\boldsymbol w,\bold X) lnp(tw,X),实际上就是最小化平方和误差函数 1 2 σ 2 ∑ i = 1 N [ t i − w T ϕ ( x i ) ] 2 \dfrac{1}{2\sigma^2}\displaystyle\sum_{i=1}^N\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]^2 2σ21i=1N[tiwTϕ(xi)]2

⟹ \qquad\qquad\Longrightarrow\qquad ∑ i = 1 N t i ϕ ( x i ) = ∑ i = 1 N [ w T ϕ ( x i ) ] ϕ ( x i ) \displaystyle\sum_{i=1}^Nt_i\boldsymbol\phi(\boldsymbol x_i)=\displaystyle\sum_{i=1}^N[\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)]\boldsymbol\phi(\boldsymbol x_i) i=1Ntiϕ(xi)=i=1N[wTϕ(xi)]ϕ(xi)

⟹ \qquad\qquad\Longrightarrow\qquad [ ϕ ( x 1 ) ⋯ ϕ ( x N ) ] [ t 1 ⋮ t N ] ⏟ Φ T t = [ ϕ ( x 1 ) ⋯ ϕ ( x N ) ] [ ϕ ( x 1 ) T w ⋮ ϕ ( x N ) T w ] ⏟ Φ T Φ w \underbrace{\left[\begin{matrix}\boldsymbol\phi(\boldsymbol x_1)&\cdots&\boldsymbol\phi(\boldsymbol x_N)\end{matrix}\right]\left[\begin{matrix}t_1\\\vdots\\t_N\end{matrix}\right]}_{\Phi^T\bold t}=\underbrace{\left[\begin{matrix}\boldsymbol\phi(\boldsymbol x_1)&\cdots&\boldsymbol\phi(\boldsymbol x_N)\end{matrix}\right]\left[\begin{matrix}\boldsymbol\phi(\boldsymbol x_1)^T\boldsymbol w\\\vdots\\\boldsymbol\phi(\boldsymbol x_N)^T\boldsymbol w\end{matrix}\right]}_{\Phi^T\Phi\boldsymbol w} ΦTt [ϕ(x1)ϕ(xN)] t1tN =ΦTΦw [ϕ(x1)ϕ(xN)] ϕ(x1)Twϕ(xN)Tw

⟹ \qquad\qquad\Longrightarrow\qquad Φ T t = Φ T Φ w \Phi^T\bold t=\Phi^T\Phi\boldsymbol w ΦTt=ΦTΦw

\qquad
\qquad 可求得最大似然解

w M L = ( Φ T Φ ) − 1 Φ T t \qquad\qquad\boldsymbol w_{ML}=(\Phi^T\Phi)^{-1}\Phi^T\bold t wML=(ΦTΦ)1ΦTt

或将误差函数写成矩阵的形式 ( t − Φ w ) T ( t − Φ w ) (\bold t-\Phi\boldsymbol w)^T(\bold t-\Phi\boldsymbol w) (tΦw)T(tΦw),采用矩阵的微分得到

\qquad 其中, ϕ ( x ) = [ ϕ 1 ( x ) , ⋯   , ϕ M ( x ) ] T \boldsymbol\phi(\boldsymbol x)=[\phi_{1}(\boldsymbol x),\cdots,\phi_{M}(\boldsymbol x)]^T ϕ(x)=[ϕ1(x),,ϕM(x)]T t = [ t 1 , ⋯   , t N ] T \bold t=[t_1,\cdots,t_N]^T t=[t1,,tN]T

   Φ = [ ϕ ( x 1 ) T ϕ ( x 2 ) T ⋮ ϕ ( x N ) T ] = [ ϕ 1 ( x 1 ) ϕ 2 ( x 1 ) ⋯ ϕ M ( x 1 ) ϕ 1 ( x 2 ) ϕ 2 ( x 2 ) ⋯ ϕ M ( x 2 ) ⋮ ⋮ ⋮ ϕ 1 ( x N ) ϕ 2 ( x N ) ⋯ ϕ M ( x N ) ] \qquad\qquad\ \ \Phi=\left[\begin{matrix}\boldsymbol\phi(\boldsymbol x_1)^T\\\boldsymbol\phi(\boldsymbol x_2)^T\\\vdots\\\boldsymbol\phi(\boldsymbol x_N)^T\end{matrix}\right]=\left[\begin{matrix}\phi_{1}(\boldsymbol x_1)&\phi_{2}(\boldsymbol x_1)&\cdots&\phi_{M}(\boldsymbol x_1)\\\phi_{1}(\boldsymbol x_2)&\phi_{2}(\boldsymbol x_2)&\cdots&\phi_{M}(\boldsymbol x_2)\\\vdots&\vdots&&\vdots\\\phi_{1}(\boldsymbol x_N)&\phi_{2}(\boldsymbol x_N)&\cdots&\phi_{M}(\boldsymbol x_N)\end{matrix}\right]   Φ= ϕ(x1)Tϕ(x2)Tϕ(xN)T = ϕ1(x1)ϕ1(x2)ϕ1(xN)ϕ2(x1)ϕ2(x2)ϕ2(xN)ϕM(x1)ϕM(x2)ϕM(xN)

   Φ T Φ = [ ϕ ( x 1 ) ϕ ( x 2 ) ⋯ ϕ ( x N ) ] [ ϕ ( x 1 ) T ϕ ( x 2 ) T ⋮ ϕ ( x N ) T ] = ∑ i = 1 N ϕ ( x i ) ϕ ( x i ) T \qquad\qquad\ \ \begin{aligned}\Phi^T\Phi&=\left[\begin{matrix}\boldsymbol\phi(\boldsymbol x_1)&\boldsymbol\phi(\boldsymbol x_2)&\cdots&\boldsymbol\phi(\boldsymbol x_N)\end{matrix}\right]\left[\begin{matrix}\boldsymbol\phi(\boldsymbol x_1)^T\\\boldsymbol\phi(\boldsymbol x_2)^T\\\vdots\\\boldsymbol\phi(\boldsymbol x_N)^T\end{matrix}\right]\\ &=\displaystyle\sum_{i=1}^N\boldsymbol\phi(\boldsymbol x_i)\boldsymbol\phi(\boldsymbol x_i)^T\end{aligned}   ΦTΦ=[ϕ(x1)ϕ(x2)ϕ(xN)] ϕ(x1)Tϕ(x2)Tϕ(xN)T =i=1Nϕ(xi)ϕ(xi)T

\qquad

3. 正则最小二乘(Regularized least-squares)估计

\qquad 上节在求最大似然解的时候,对数似然函数为:

ln ⁡ p ( t ∣ w , X ) = − N 2 ln ⁡ ( 2 π ) − N ln ⁡ σ − 1 σ 2 { 1 2 ∑ i = 1 N [ t i − w T ϕ ( x i ) ] 2 } \qquad\qquad\ln p(\bold t|\boldsymbol w,\bold X)=-\dfrac{N}{2}\ln(2\pi)-N\ln\sigma-\dfrac{1}{\sigma^2}\left\{\dfrac{1}{2}\sum\limits_{i=1}^N\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]^2\right\} lnp(tw,X)=2Nln(2π)Nlnσσ21{21i=1N[tiwTϕ(xi)]2}
\qquad
\qquad 可以看出,对数似然函数的最后一项就是平方和误差函数

E D ( w ) = 1 2 ∑ i = 1 N [ t i − w T ϕ ( x i ) ] 2 = 1 2 ( t − Φ w ) T ( t − Φ w ) = 1 2 ∥ t − Φ w ∥ 2 \qquad\qquad\begin{aligned}E_D(\boldsymbol w)&=\dfrac{1}{2}\displaystyle\sum_{i=1}^N\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]^2\\ &=\dfrac{1}{2}(\bold t-\Phi\boldsymbol w)^T(\bold t-\Phi\boldsymbol w)\\ &=\dfrac{1}{2}\Vert\bold t-\Phi\boldsymbol w\Vert^2\end{aligned} ED(w)=21i=1N[tiwTϕ(xi)]2=21(tΦw)T(tΦw)=21tΦw2

\qquad 因此,最大化对数似然函数 ln ⁡ p ( t ∣ w , X ) \ln p(\bold t|\boldsymbol w,\bold X) lnp(tw,X),实际上就是最小化平方和误差函数 E D ( w ) E_D(\boldsymbol w) ED(w)。为了防止以平方和误差函数作为代价函数时出现过拟合,可以添加正则化项来控制。因此,可以将代价函数修正为:

E D ( w ) + λ E W ( w ) \qquad\qquad E_D(\boldsymbol w)+\lambda E_W(\boldsymbol w) ED(w)+λEW(w)

\qquad 其中, λ \lambda λ 是正则化系数,用于平衡误差函数和正则化项的重要性。

\qquad 常用的正则化项可采用:

E W ( w ) = 1 2 ∥ w ∥ 2 = 1 2 w T w \qquad\qquad E_W(\boldsymbol w)=\dfrac{1}{2}\Vert\boldsymbol w\Vert^2=\dfrac{1}{2}\boldsymbol w^T\boldsymbol w EW(w)=21w2=21wTw

\qquad 因此,得到了添加正则化项之后的代价函数 F ( w ) F(\boldsymbol w) F(w),也就是:

F ( w ) = 1 2 ∥ t − Φ w ∥ 2 + λ 2 ∥ w ∥ 2 \qquad\qquad F(\boldsymbol w)=\dfrac{1}{2}\Vert\bold t-\Phi\boldsymbol w\Vert^2+\dfrac{\lambda}{2}\Vert\boldsymbol w\Vert^2 F(w)=21tΦw2+2λw2

( 1 ) (1) (1) λ = 0 \lambda=0 λ=0,表明对“由观测训练样本集 X \bold X X 所描述的的观测模型”有完全的把握
( 2 ) (2) (2) λ = ∞ \lambda=\infty λ=,表明对“由观测训练样本集 X \bold X X 所描述的的观测模型”没有把握

\qquad
\qquad 为了最小化代价函数 F ( w ) F(\boldsymbol w) F(w),令 ∇ w F ( w ) = − Φ T t + ( Φ T Φ + λ I ) w = 0 \nabla_{\boldsymbol w}F(\boldsymbol w)=-\Phi^T\bold t+\left(\Phi^T\Phi+\lambda\bold I\right)\boldsymbol w=0 wF(w)=ΦTt+(ΦTΦ+λI)w=0
 
\qquad 可得到正则最小二乘解

w = [ Φ T Φ + λ I ] − 1 Φ T t \qquad\qquad\boldsymbol w=\left[\Phi^T\Phi+\lambda\bold I\right]^{-1}\Phi^T\bold t w=[ΦTΦ+λI]1ΦTt

如果采用基本线性模型 t = ∑ j = 1 M w j x j + ε t=\displaystyle\sum_{j=1}^Mw_jx_{j}+\varepsilon t=j=1Mwjxj+ε ,满足 ϕ ( x i ) = x i \boldsymbol\phi(\boldsymbol x_i)=\boldsymbol x_i ϕ(xi)=xi
t i = ∑ j = 1 M w j x i j + ε i t_i=\displaystyle\sum_{j=1}^Mw_jx_{ij}+\varepsilon_i ti=j=1Mwjxij+εi 或者  t i = w T x i + ε i t_i=\boldsymbol w^T\boldsymbol x_i+\varepsilon_i ti=wTxi+εi
平方和误差: E D ( w ) = ∑ i = 1 N ε i 2 = ∑ i = 1 N ( t i − w T x i ) 2 = ( t − Φ w ) T ( t − Φ w ) E_D(\boldsymbol w)=\displaystyle\sum_{i=1}^N\varepsilon_i^2=\displaystyle\sum_{i=1}^N(t_i-\boldsymbol w^T\boldsymbol x_i)^2=(\bold t-\Phi\boldsymbol w)^T(\bold t-\Phi\boldsymbol w) ED(w)=i=1Nεi2=i=1N(tiwTxi)2=(tΦw)T(tΦw)
 
此时, Φ = [ ϕ ( x 1 ) T ϕ ( x 2 ) T ⋮ ϕ ( x N ) T ] = [ x 1 T x 2 T ⋮ x N T ] = [ x 11 x 12 ⋯ x 1 M x 21 x 22 ⋯ x 2 M ⋮ x N 1 x N 2 ⋯ x N M ] \Phi=\left[\begin{matrix}\boldsymbol\phi(\boldsymbol x_1)^T\\\boldsymbol\phi(\boldsymbol x_2)^T\\\vdots\\\boldsymbol\phi(\boldsymbol x_N)^T\end{matrix}\right]=\left[\begin{matrix}\boldsymbol x_1^T\\\boldsymbol x_2^T\\\vdots\\\boldsymbol x_N^T\end{matrix}\right]= \left[\begin{matrix}x_{11}&x_{12}&\cdots&x_{1M}\\x_{21}&x_{22}&\cdots&x_{2M}\\\vdots\\x_{N1}&x_{N2}&\cdots&x_{NM}\end{matrix}\right] Φ= ϕ(x1)Tϕ(x2)Tϕ(xN)T = x1Tx2TxNT = x11x21xN1x12x22xN2x1Mx2MxNM
 
采用正则化方法防止过拟合,优化函数可表示为: F ( w ) = ( t − Φ w ) T ( t − Φ w ) + λ w T w F(\boldsymbol w)=(\bold t-\Phi\boldsymbol w)^T(\bold t-\Phi\boldsymbol w)+\lambda\boldsymbol w^T\boldsymbol w F(w)=(tΦw)T(tΦw)+λwTw
 
∇ w F ( w ) = 0 \nabla_{\boldsymbol w}F(\boldsymbol w)=0 wF(w)=0,可得到: w = [ Φ T Φ + λ I ] − 1 Φ T t \boldsymbol w=[\Phi^T\Phi+\lambda\bold I]^{-1}\Phi^T\bold t w=[ΦTΦ+λI]1ΦTt

\qquad

4. 最大后验估计

\qquad 从贝叶斯分析的观点,需要考虑作为参数的权向量 w = [ w 1 , ⋯   , w M ] T \boldsymbol w=[w_1,\cdots,w_M]^T w=[w1,,wM]T 的先验概率。
\qquad

  • 仍然假设权向量的各元素 w i w_i wi 满足独立同分布,且满足 w i ∼ N ( 0 , σ w 2 ) w_i\sim\mathcal N(0,\sigma_w^2) wiN(0,σw2),可得到先验概率:

p ( w ) = p ( w 1 , ⋯   , w M ) = ∏ i = 1 M p ( w i ) = 1 ( 2 π σ w ) M ∏ i = 1 M exp ⁡ ( − w i 2 2 σ w 2 ) = 1 ( 2 π σ w ) M exp ⁡ ( − 1 2 σ w 2 ∑ i = 1 M w i 2 ) = 1 ( 2 π σ w ) M exp ⁡ ( − ∥ w ∥ 2 2 σ w 2 ) \qquad\qquad\begin{aligned} p(\boldsymbol w)&=p(w_1,\cdots,w_M)\\&=\displaystyle\prod_{i=1}^Mp(w_i)\\ &=\dfrac{1}{(\sqrt{2\pi}\sigma_w)^M}\displaystyle\prod_{i=1}^M\exp\left({-\dfrac{w_i^2}{2\sigma_w^2}}\right)\\ &=\dfrac{1}{(\sqrt{2\pi}\sigma_w)^M}\exp\left({-\dfrac{1}{2\sigma_w^2}}\displaystyle\sum_{i=1}^Mw_i^2\right)\\ &=\dfrac{1}{(\sqrt{2\pi}\sigma_w)^M}\exp\left({-\dfrac{\Vert\boldsymbol w\Vert^2}{2\sigma_w^2}}\right)\end{aligned} p(w)=p(w1,,wM)=i=1Mp(wi)=(2π σw)M1i=1Mexp(2σw2wi2)=(2π σw)M1exp(2σw21i=1Mwi2)=(2π σw)M1exp(2σw2w2)
\qquad

  • 由贝叶斯公式,可得到后验概率满足:

p ( w ∣ t , X ) = p ( t ∣ w , X ) p ( w ) p ( t ) ∝ p ( t ∣ w , X ) p ( w ) \qquad\qquad p(\boldsymbol w|\bold t,\bold X)=\dfrac{p(\bold t|\boldsymbol w,\bold X)p(\boldsymbol w)}{p(\bold t)}\propto p(\bold t|\boldsymbol w,\bold X)p(\boldsymbol w) p(wt,X)=p(t)p(tw,X)p(w)p(tw,X)p(w)
\qquad
\qquad 采用第 1 1 1 部分描述的线性回归的概率模型,目标向量 t \bold t t 的似然函数为:

p ( t ∣ w , X ) = 1 ( 2 π σ ) N exp ⁡ { − 1 2 σ 2 ∑ i = 1 N [ t i − w T ϕ ( x i ) ] 2 } \qquad\qquad p(\bold t|\boldsymbol w,\bold X)=\dfrac{1}{(\sqrt{2\pi}\sigma)^N}\exp\left\{-\dfrac{1}{2\sigma^2}\sum\limits_{i=1}^N\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]^2\right\} p(tw,X)=(2π σ)N1exp{2σ21i=1N[tiwTϕ(xi)]2}

\qquad 那么

p ( w ∣ t , X ) ∝ 1 ( 2 π σ ) N exp ⁡ { − 1 2 σ 2 ∑ i = 1 N [ t i − w T ϕ ( x i ) ] 2 } 1 ( 2 π σ w ) M exp ⁡ ( − ∥ w ∥ 2 2 σ w 2 ) ∝ exp ⁡ { − 1 2 σ 2 ∑ i = 1 N [ t i − w T ϕ ( x i ) ] 2 − ∥ w ∥ 2 2 σ w 2 } ∝ − { 1 2 ∑ i = 1 N [ t i − w T ϕ ( x i ) ] 2 + λ 2 ∥ w ∥ 2 } \qquad\qquad\begin{aligned} p(\boldsymbol w|\bold t,\bold X)&\propto\dfrac{1}{(\sqrt{2\pi}\sigma)^N}\exp\left\{-\dfrac{1}{2\sigma^2}\sum\limits_{i=1}^N\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]^2\right\}\dfrac{1}{(\sqrt{2\pi}\sigma_w)^M}\exp\left({-\dfrac{\Vert\boldsymbol w\Vert^2}{2\sigma_w^2}}\right)\\ &\propto\exp\left\{-\dfrac{1}{2\sigma^2}\sum\limits_{i=1}^N\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]^2-\dfrac{\Vert\boldsymbol w\Vert^2}{2\sigma_w^2}\right\}\\ &\propto-\left\{\dfrac{1}{2}\sum\limits_{i=1}^N\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]^2+\dfrac{\lambda}{2}\Vert\boldsymbol w\Vert^2\right\}\end{aligned} p(wt,X)(2π σ)N1exp{2σ21i=1N[tiwTϕ(xi)]2}(2π σw)M1exp(2σw2w2)exp{2σ21i=1N[tiwTϕ(xi)]22σw2w2}{21i=1N[tiwTϕ(xi)]2+2λw2}

\qquad\qquad 其中,定义 λ = σ 2 σ w 2 \lambda=\dfrac{\sigma^2}{\sigma_w^2} λ=σw2σ2
\qquad

  • 权向量 w \boldsymbol w w 的最大后验估计值为:

w M A P = max ⁡ w p ( w ∣ t , X ) = max ⁡ w { − 1 2 ∑ i = 1 N [ t i − w T ϕ ( x i ) ] 2 − λ 2 ∥ w ∥ 2 } \qquad\qquad\begin{aligned}\boldsymbol w_{MAP}&=\displaystyle\max_{\boldsymbol w} p(\boldsymbol w|\bold t,\bold X)\\&=\displaystyle\max_{\boldsymbol w} \left\{-\dfrac{1}{2}\sum\limits_{i=1}^N\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]^2-\dfrac{\lambda}{2}\Vert\boldsymbol w\Vert^2\right\}\end{aligned} wMAP=wmaxp(wt,X)=wmax{21i=1N[tiwTϕ(xi)]22λw2}
\qquad
\qquad 相当于定义了代价函数 F ( w ) F(\boldsymbol w) F(w),也就是:

F ( w ) = 1 2 ∑ i = 1 N [ t i − w T ϕ ( x i ) ] 2 + λ 2 ∥ w ∥ 2 = 1 2 ∥ t − Φ w ∥ 2 + λ 2 ∥ w ∥ 2 \qquad\qquad\begin{aligned}F(\boldsymbol w)&=\dfrac{1}{2}\displaystyle\sum_{i=1}^N\left[t_i-\boldsymbol w^T\boldsymbol\phi(\boldsymbol x_i)\right]^2+\dfrac{\lambda}{2}\Vert\boldsymbol w\Vert^2\\ &=\dfrac{1}{2}\Vert\bold t-\Phi\boldsymbol w\Vert^2+\dfrac{\lambda}{2}\Vert\boldsymbol w\Vert^2\end{aligned} F(w)=21i=1N[tiwTϕ(xi)]2+2λw2=21tΦw2+2λw2

\qquad
\qquad 最大后验估计值就为: w M A P = min ⁡ w F ( w ) \boldsymbol w_{MAP}=\displaystyle\min_{\boldsymbol w} F(\boldsymbol w) wMAP=wminF(w)

\qquad 显然,在假设权向量各元素满足独立同分布、且 w i ∼ N ( 0 , σ w 2 ) w_i\sim\mathcal N(0,\sigma_w^2) wiN(0,σw2) 的条件下,权向量 w \boldsymbol w w 的最大后验解,就是第 3 3 3 节所描述的正则最小二乘解
\qquad
\qquad 此时,权向量 w \boldsymbol w w最大后验解为: w = [ Φ T Φ + λ I ] − 1 Φ T t \boldsymbol w=\left[\Phi^T\Phi+\lambda\bold I\right]^{-1}\Phi^T\bold t w=[ΦTΦ+λI]1ΦTt

\qquad 特别地,当取 λ = 0   ( σ w 2 = ∞ ) \lambda=0\ (\sigma_w^2=\infty) λ=0 (σw2=) 时,权向量各元素 w i w_i wi 近似于均匀分布,也就是完全忽视权向量 w \boldsymbol w w 的先验信息,此时的最大后验解就等价于最大似然解,也就是: w M A P = w M L = ( Φ T Φ ) − 1 Φ T t \boldsymbol w_{MAP}=\boldsymbol w_{ML}=\left(\Phi^T\Phi\right)^{-1}\Phi^T\bold t wMAP=wML=(ΦTΦ)1ΦTt
\qquad

举例:双月数据分类

import numpy as np
import matplotlib.pyplot as plt

def gen_lineardata(weight,interval):
    y = -(weight[0]*interval + weight[2])/weight[1]
    return y

def halfmoon(rad, width, dist, n_samp):      
    if n_samp%2 != 0:  
        n_samp += 1      
    data = np.zeros((3,n_samp))      
    rd = np.random.random((2,n_samp//2))  
    radius = (rad-width//2) + width*rd[0,:] 
    theta = np.pi*rd[1,:]          
    x1     = radius*np.cos(theta)  
    y1     = radius*np.sin(theta) + dist/2  
    label1 = np.ones((1,len(x1)))           # label= 1 for Class 1  
    rd = np.random.random((2,n_samp//2))   
    radius = (rad-width//2) + width*rd[0,:] 
    theta = np.pi*rd[1,:]
    x2    = radius*np.cos(-theta) + rad  
    y2    = radius*np.sin(-theta) - dist/2  
    label2= -1*np.ones((1,len(x2)))           # label= 0 for Class 2       
    data[0,:]=np.concatenate([x1,x2])
    data[1,:]=np.concatenate([y1,y2])
    data[2,:]=np.concatenate([label1,label2],axis=1)    
    shuffle_seq = np.random.permutation(np.arange(n_samp))  
    data_shuffle = data[:,shuffle_seq]
    return data,data_shuffle

def RLS(xhat,target,lambda0):
    Phi = np.asmatrix(xhat)
    t = np.asmatrix(target)
    print(Phi.T*Phi)
    print(Phi.T*Phi+lambda0*np.eye(len(xhat.T)))    
    return np.array((Phi.T*Phi+lambda0*np.eye(len(xhat.T))).I*Phi.T*t)    

if __name__ == "__main__":
    dNum = 800    
    data,data_shuffle = halfmoon(10,6,1,dNum)
    #data,data_shuffle = halfmoon(10,6,-4,dNum)
    pos_data = data[:,0:dNum//2]
    neg_data = data[:,dNum//2:dNum]  
    training_data = data_shuffle.T
    tmp1 = training_data[0:dNum,0:2]
    tmp2 = np.ones((dNum,1))
    xhat = np.concatenate((tmp1,tmp2),axis=1)
    target = training_data[0:dNum,2:]    
    interval = np.linspace(-12,20,100)    
    weight = RLS(xhat,target,0)
    print('RLS:',weight.flatten())    
    y = gen_lineardata(weight,interval)
    plt.figure   
    plt.plot(interval,y,'k')       
    plt.plot(pos_data[0,:],pos_data[1,:],'b+')
    plt.plot(neg_data[0,:],neg_data[1,:],'r+')        
    plt.title('Regularized least squares')     
    plt.show()

\qquad 在这里插入图片描述

距离=1,半径=10,宽度=6

\qquad 在这里插入图片描述

距离=-4,半径=10,宽度=6

  • 3
    点赞
  • 21
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值