机器学习——线性回归

线性回归

假设数据集为:
D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } D = \{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\} D={(x1,y1),(x2,y2),...,(xN,yN)}
后面我们记:
X = ( x 1 , x 2 , . . . , x N ) T , Y = ( y 1 , y 2 , . . . , y N ) T X=(x_1,x_2,...,x_N)^T, Y=(y_1,y_2,...,y_N)^T X=(x1,x2,...,xN)T,Y=(y1,y2,...,yN)T
线性回归假设:
f ( w ) = w T x f(w)=w^Tx f(w)=wTx

最小二乘法

对这个问题,采用二范数定义的平方误差来定义损失函数:
L ( w ) = ∑ i = 1 N ∣ ∣ w T x i − y i ∣ ∣ 2 2 L(w) = {\sum}^N_{i=1}||w^Tx_i-y_i||^2_2 L(w)=i=1NwTxiyi22
展开得到:
L ( w ) = ( w T x 1 − y 1 , . . . , w T x N − y N ) . ( w T x 1 − y 1 , . . . , w T x N − y N ) T = ( w T X T − Y T ) . ( X w − Y ) = w T X T X w − Y T X w − w T X T Y + Y T Y = w L(w)=(w^Tx_1-y_1,...,w^Tx_N-y_N).(w^Tx_1-y_1,...,w^Tx_N-y_N)^T\\ =(w^TX^T-Y^T).(Xw-Y)\\ =w^TX^TXw-Y^TXw-w^TX^TY+Y^TY\\ =w L(w)=(wTx1y1,...,wTxNyN).(wTx1y1,...,wTxNyN)T=(wTXTYT).(XwY)=wTXTXwYTXwwTXTY+YTY=w
最小化这个值的 w ^ \hat{w} w^:
w ^ = a r g m i n w L ( w ) ⟶ ∂ ∂ w L ( w ) = 0 ⟶ 2 X T X w ^ − 2 X T Y = 0 ⟶ w ^ = ( X T X ) − 1 X T Y = X + Y \hat{w}=argmin_wL(w)\\ \longrightarrow\frac{\partial}{\partial{w}}L(w)=0\\ \longrightarrow{2X^TX\hat{w}-2X^TY=0}\\ \longrightarrow\hat{w}=(X^TX)^{-1}X^TY=X^+Y w^=argminwL(w)wL(w)=02XTXw^2XTY=0w^=(XTX)1XTY=X+Y
这个式子中 ( X T X ) − 1 X T (X^TX)^{-1}X^{T} (XTX)1XT又称为伪逆。对于行满秩或者列满秩的X,可以直接求解,但是对于非满秩的样本集合,需要使用奇异值分解(SVD)的方法,对X求奇异值分解,得到
X = U Σ V T X=U{\Sigma}V^T X=UΣVT
在几何上,最小二乘法相当于模型和试验值的距离的平方求和,假设我们的试验样本张成一个p维空间: X = S p a n ( x 1 , . . . , x N ) X=Span(x_1,...,x_N) X=Span(x1,...,xN),而模型可以写成 f ( w ) = X β f(w)=X\beta f(w)=Xβ,也就是 x 1 , . . . , x N x_1,...,x_N x1,...,xN的某种组合,而最小二乘法就是说希望Y和这个模型距离越小越好,于是它们的差应该与这个张成的空间垂直:
X T . ( Y − X β ) = 0 ⟶ β = ( X T X ) − 1 X T Y X^T.(Y-X\beta)=0\longrightarrow\beta=(X^TX)^{-1}X^TY XT.(YXβ)=0β=(XTX)1XTY

噪声为高斯分布的MLE

一维的高斯分布

N ( μ , σ ) = 1 2 π σ e x p ( − ( x − μ ) 2 2 σ 2 ) N(\mu,\sigma)=\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(x-\mu)^2}{2\sigma^2}) N(μ,σ)=2π σ1exp(2σ2(xμ)2)

p维高斯分布

N ( μ , Σ ) = 1 ( 2 π ) p / 2 Σ 1 / 2 e x p ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) N(\mu, \Sigma)=\frac{1}{(2\pi)^{p/2}\Sigma^{1/2}}exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)) N(μ,Σ)=(2π)p/2Σ1/21exp(21(xμ)TΣ1(xμ))

MLE极大似然估计

θ M L E = a r g m a x θ P ( x ∣ θ ) \theta_{MLE}=argmax_{\theta}P(x|\theta) θMLE=argmaxθP(xθ)
对于一维的情况,记 y = w T x + ϵ , ϵ ∼ N ( 0 , σ 2 ) y=w^Tx+\epsilon,\epsilon\sim{N(0,\sigma^2)} y=wTx+ϵ,ϵN(0,σ2),那么 y ∼ N ( w T x , σ 2 ) y\sim{N(w^Tx,\sigma^2)} yN(wTx,σ2)。代入极大似然估计中:
L ( w ) = l o g p ( Y ∣ X , w ) = l o g ∏ i = 1 N p ( y i ∣ x i , w ) = ∑ i = 1 N l o g ( 1 2 π σ e − ( y i − w T x i ) 2 2 σ 2 ) = ∑ i = 1 N [ l o g 1 2 π + l o g 1 σ − ( x i − μ ) 2 2 σ 2 ] L(w)=logp(Y|X, w)=log\prod_{i=1}^Np(y_i|x_i,w)\\ =\sum_{i=1}^Nlog(\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(y_i-w^Tx_i)^2}{2\sigma^2}})\\ =\sum_{i=1}^N[log\frac{1}{\sqrt{2\pi}}+log\frac{1}{\sigma}-\frac{(x_i-\mu)^2}{2\sigma^2}] L(w)=logp(YX,w)=logi=1Np(yixi,w)=i=1Nlog(2π σ1e2σ2(yiwTxi)2)=i=1N[log2π 1+logσ12σ2(xiμ)2]
因此,我们求 μ M L E \mu_{MLE} μMLE,有:
μ M L E = a r g m a x μ l o g p ( x ∣ θ ) = a r g m a x μ ∑ i = 1 N − ( x i − μ ) 2 2 σ 2 = a r g m i n μ ∑ i = 1 N ( x i − μ ) 2 \mu_{MLE}=argmax_\mu{logp(x|\theta)}\\ = argmax_\mu{\sum_{i=1}^N-\frac{(x_i-\mu)^2}{2\sigma^2}}\\ =argmin_\mu{\sum_{i=1}^N(x_i-\mu)^2} μMLE=argmaxμlogp(xθ)=argmaxμi=1N2σ2(xiμ)2=argminμi=1N(xiμ)2
求导得:
∂ ∂ μ ∑ i = 1 N ( x i − μ ) 2 = ∑ i = 1 N 2 ( x i − μ ) ( − 1 ) = 0 ⟶ ∑ i = 1 N ( x i − μ ) = 0 ⟶ ∑ i = 1 N x i = N μ ⟶ μ M L E = 1 N ∑ i = 1 N x i \frac{\partial}{\partial{\mu}}\sum_{i=1}^N(x_i-\mu)^2=\sum_{i=1}^N2(x_i-\mu)(-1)=0\\ \longrightarrow\sum_{i=1}^{N}(x_i-\mu)=0\\ \longrightarrow\sum_{i=1}^Nx_i=N\mu\\ \longrightarrow\mu_{MLE}=\frac{1}{N}\sum_{i=1}^Nx_i μi=1N(xiμ)2=i=1N2(xiμ)(1)=0i=1N(xiμ)=0i=1Nxi=NμμMLE=N1i=1Nxi

对于 σ M L E 2 \sigma^2_{MLE} σMLE2,我们依然可以根据上式得到:
σ M L E 2 = a r g m a x σ l o g P ( x ∣ θ ) = a r g m a x σ ( − l o g σ − 1 2 σ 2 ( x i − μ ) 2 ) \sigma^2_{MLE}=argmax_\sigma{logP(x|\theta)}\\ =argmax_\sigma(-log\sigma-\frac{1}{2\sigma^2}(x_i-\mu)^2) σMLE2=argmaxσlogP(xθ)=argmaxσ(logσ2σ21(xiμ)2)
求导得:
∂ ∂ σ = − 1 σ − 1 2 ( x i − μ ) 2 ( − 2 ) σ − 3 = 0 ⟶ − 1 σ + ( x i − μ ) 2 σ − 3 = 0 ⟶ − σ 2 + ( x I − μ ) 2 = 0 ⟶ ∑ i = 1 N − σ 2 + ( x i − μ ) 2 = 0 ⟶ σ M L E 2 = 1 N ∑ i = 1 N ( x i − μ ) 2 \frac{\partial}{\partial\sigma}=-\frac{1}{\sigma}-\frac{1}{2}(x_i-\mu)^2(-2)\sigma^{-3}=0\\ \longrightarrow-\frac{1}{\sigma}+(x_i-\mu)^2\sigma^{-3}=0\\ \longrightarrow-\sigma^2+(x_I-\mu)^2=0\\ \longrightarrow\sum_{i=1}^N-\sigma^2+(x_i-\mu)^2=0\\ \longrightarrow\sigma^2_{MLE}=\frac{1}{N}\sum_{i=1}^N(x_i-\mu)^2 σ=σ121(xiμ)2(2)σ3=0σ1+(xiμ)2σ3=0σ2+(xIμ)2=0i=1Nσ2+(xiμ)2=0σMLE2=N1i=1N(xiμ)2
但是上面的 σ \sigma σ属于有偏估计。
我们对 σ M L E 2 \sigma_{MLE}^2 σMLE2求期望,有:
E [ σ M L E 2 ] = 1 N ∑ i = 1 N E ( x i − μ M L E ) 2 = 1 N ∑ i = 1 N E ( x i 2 − 2 x i μ M L E + μ M L E 2 ) = 1 N ( ∑ i = 1 N E ( x i 2 ) − 2 ∑ i = 1 N E ( x i μ M L E ) + ∑ i = 1 N E ( μ M L E 2 ) ) = 1 N ( ∑ i = 1 N E ( x i 2 ) − 2 μ M L E 2 + μ M L E 2 ) = 1 N E ( ∑ i = 1 N x i 2 − μ M L E 2 ) = E [ 1 N ∑ i = 1 N x i 2 − μ 2 − ( μ M L E 2 − μ 2 ) ] = E [ 1 N ∑ i = 1 N x i 2 − μ 2 ] − E [ ( μ M L E 2 − μ 2 ) ] E[\sigma_{MLE}^2]=\frac{1}{N}\sum_{i=1}^NE(x_i-\mu_{MLE})^2\\ =\frac{1}{N}\sum_{i=1}^{N}E(x_i^2-2x_i\mu_{MLE}+\mu^2_{MLE})\\ =\frac{1}{N}(\sum_{i=1}^NE(x_i^2)-2\sum_{i=1}^NE(x_i\mu_{MLE})+\sum_{i=1}^NE(\mu^2_{MLE}))\\ =\frac{1}{N}(\sum_{i=1}^NE(x_i^2)-2\mu^2_{MLE}+\mu^2_{MLE})\\ =\frac{1}{N}E(\sum_{i=1}^Nx_i^2-\mu^2_{MLE})\\ =E[\frac{1}{N}\sum_{i=1}^Nx^2_i-\mu^2-(\mu^2_{MLE}-\mu^2)]\\ =E[\frac{1}{N}\sum_{i=1}^Nx^2_i-\mu^2]-E[(\mu^2_{MLE}-\mu^2)] E[σMLE2]=N1i=1NE(xiμMLE)2=N1i=1NE(xi22xiμMLE+μMLE2)=N1(i=1NE(xi2)2i=1NE(xiμMLE)+i=1NE(μMLE2))=N1(i=1NE(xi2)2μMLE2+μMLE2)=N1E(i=1Nxi2μMLE2)=E[N1i=1Nxi2μ2(μMLE2μ2)]=E[N1i=1Nxi2μ2]E[(μMLE2μ2)]

E [ 1 N ∑ i = 1 N x i 2 − μ 2 ] = E ( 1 N ∑ i = 1 N ( x i 2 − μ 2 ) ) = 1 N ∑ i = 1 N ( x i 2 − μ 2 ) = 1 N ∑ i = 1 N E ( x i 2 ) − E ( μ 2 ) = 1 N ∑ i = 1 N σ 2 = σ 2 E[\frac{1}{N}\sum_{i=1}^Nx^2_i-\mu^2]=E(\frac{1}{N}\sum_{i=1}^N(x_i^2-\mu^2))\\ =\frac{1}{N}\sum_{i=1}^N(x_i^2-\mu^2)\\ =\frac{1}{N}\sum_{i=1}^NE(x_i^2)-E(\mu^2)\\ =\frac{1}{N}\sum_{i=1}^N\sigma^2=\sigma^2 E[N1i=1Nxi2μ2]=E(N1i=1N(xi2μ2))=N1i=1N(xi2μ2)=N1i=1NE(xi2)E(μ2)=N1i=1Nσ2=σ2
E [ ( μ M L E 2 − μ 2 ) ] = E ( μ M L E 2 ) − E ( μ 2 ) = E ( μ M L E 2 ) − μ 2 = V a r ( μ M L E ) = V a r [ 1 N ∑ i = 1 N x i ] = 1 N 2 ∑ i = 1 N V a r ( x i ) = 1 N 2 ∑ i = 1 N σ 2 = 1 N σ 2 E[(\mu^2_{MLE}-\mu^2)]=E(\mu^2_{MLE})-E(\mu^2)\\ =E(\mu^2_{MLE})-\mu^2=Var(\mu_{MLE})\\ =Var[\frac{1}{N}\sum^N_{i=1}x_i]=\frac{1}{N^2}\sum^N_{i=1}Var(x_i)\\ =\frac{1}{N^2}\sum_{i=1}^N\sigma^2=\frac{1}{N}\sigma^2 E[(μMLE2μ2)]=E(μMLE2)E(μ2)=E(μMLE2)μ2=Var(μMLE)=Var[N1i=1Nxi]=N21i=1NVar(xi)=N21i=1Nσ2=N1σ2

所以最终:
E [ σ M L E 2 ] = N − 1 N σ 2 E[\sigma^2_{MLE}]=\frac{N-1}{N}\sigma^2 E[σMLE2]=NN1σ2
σ \sigma σ的无偏估计为:
σ = 1 N − 1 ∑ i = 1 N ( x i − μ M L E ) \sigma = \frac{1}{N-1}\sum_{i=1}^N(x_i-\mu_{MLE}) σ=N11i=1N(xiμMLE)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值