Linear Regression - Normal Equation and Regularization

In linear regression problems, we can use a method called Normal Equation to fit the parameters.
Suppose we have a training set like this:
X = [ ( x ( 1 ) ) T ( x ( 2 ) ) T . . . ( x ( m ) ) T ] X = \left[\begin{matrix}(x^{(1)})^T \\ (x^{(2)})^T \\ ... \\ (x^{(m)})^T\end{matrix}\right] X=(x(1))T(x(2))T...(x(m))T
where:
x ( i ) = [ x 0 ( i ) x 1 ( i ) . . . x n ( i ) ] x^{(i)} = \left[\begin{matrix}x_0^{(i)} \\ x_1^{(i)} \\ ... \\ x_n^{(i)}\end{matrix}\right] x(i)=x0(i)x1(i)...xn(i)
and the label set:
y = [ y ( 1 ) y ( 2 ) . . . y ( m ) ] y = \left[\begin{matrix}y^{(1)} \\ y^{(2)} \\ ... \\ y^{(m)}\end{matrix}\right] y=y(1)y(2)...y(m)
We wants to fit parameters
θ = [ θ 0 θ 1 . . . θ n ] \theta = \left[\begin{matrix}\theta_0 \\ \theta_1 \\ ... \\ \theta_n\end{matrix}\right] θ=θ0θ1...θn
to make this equation:
J = ∣ ∣ X ⋅ θ − y ∣ ∣ 2 J = ||X\cdot\theta - y||^2 J=Xθy2
to have its global minimum. which is:
θ = a r g m i n θ ∣ ∣ X ⋅ θ − y ∣ ∣ 2 = ( X T X ) − 1 ⋅ X T y \theta = \mathop {argmin}_{\theta} ||X\cdot\theta - y||^2 = (X^TX)^{-1}\cdot X^Ty θ=argminθXθy2=(XTX)1XTy
Let’s prove it.
We take the partial derivatives of each parameters. for θ j \theta_j θj, we find that:
∂ J ∂ θ j = ∑ i = 1 m ( ( x ( i ) ) T θ − y ( i ) ) ⋅ x j ( i ) = 0 \frac{\partial J}{\partial \theta_j} = \sum_{i = 1}^{m} ((x^{(i)})^T\theta -y^{(i)})\cdot x_j^{(i)} = 0 θjJ=i=1m((x(i))Tθy(i))xj(i)=0
tranform this quation, we find that:
[ x j ( 1 ) x j ( 2 ) . . . x j ( m ) ] X ⋅ θ = [ x j ( 1 ) x j ( 2 ) . . . x j ( m ) ] ⋅ y \left[ \begin{matrix}x_j^{(1)} & x_j^{(2)} & ... & x_j^{(m)}\end{matrix}\right]X\cdot\theta = \left[ \begin{matrix}x_j^{(1)} & x_j^{(2)} & ... & x_j^{(m)}\end{matrix}\right]\cdot y [xj(1)xj(2)...xj(m)]Xθ=[xj(1)xj(2)...xj(m)]y
combine all the n+1 equations, we find that:
X T X θ = X T y X^TX\theta = X^Ty XTXθ=XTy
θ = ( X T X ) − 1 X T y \theta = (X^TX)^{-1}X^Ty θ=(XTX)1XTy

Then we involve reguarization, which means, we want to change the function J to be:
J = ∣ ∣ X ⋅ θ − y ∣ ∣ 2 + λ ∑ j = 1 n θ j 2 J = ||X\cdot\theta - y||^2 + \lambda \sum_{j=1}^n \theta_j^2 J=Xθy2+λj=1nθj2
where λ \lambda λ is a constant called the regularization parameter.
Still, we calculate the partial derivative for each θ j \theta_j θj. Note that the partial derivative for θ 0 \theta_0 θ0 is not change.
∂ J ∂ θ j = ∑ i = 1 m ( ( x ( i ) ) T θ − y ( i ) ) ⋅ x j ( i ) + λ θ j = 0     ( f o r   j > 0 ) \frac{\partial J}{\partial \theta_j} = \sum_{i = 1}^{m} ((x^{(i)})^T\theta -y^{(i)})\cdot x_j^{(i)} + \lambda\theta_j= 0 \ \ \ (for\ j>0) θjJ=i=1m((x(i))Tθy(i))xj(i)+λθj=0   (for j>0)
[ x j ( 1 ) x j ( 2 ) . . . x j ( m ) ] X ⋅ θ + λ θ j = [ x j ( 1 ) x j ( 2 ) . . . x j ( m ) ] ⋅ y \left[ \begin{matrix}x_j^{(1)} & x_j^{(2)} & ... & x_j^{(m)}\end{matrix}\right]X\cdot\theta + \lambda\theta_j= \left[ \begin{matrix}x_j^{(1)} & x_j^{(2)} & ... & x_j^{(m)}\end{matrix}\right]\cdot y [xj(1)xj(2)...xj(m)]Xθ+λθj=[xj(1)xj(2)...xj(m)]y
λ θ j = λ e j T θ \lambda\theta_j = \lambda e_j^T\theta λθj=λejTθ
where e j e_j ej is the unit vector with the jth element be 1 and others be 0
We add all the n+1 equations up, to find :
( X T X + λ L ) θ = X T y (X^TX+\lambda L)\theta = X^Ty (XTX+λL)θ=XTy
θ = ( X T X + λ L ) − 1 X T y \theta = (X^TX + \lambda L)^{-1}X^Ty θ=(XTX+λL)1XTy
where
L = d i a g ( 0 , 1 , 1 , . . . , 1 ) L = diag(0,1,1,...,1) L=diag(0,1,1,...,1)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值