Normal Equation

Deriving the Normal Equation

An alternate of Gradient Descent in Multiple Linear Regression

  1. Problem Setup:

    1. Given a dataset with m m m samples ( x ( 1 ) ; x ( 2 ) ; . . . ; x ( m ) ) (x^{(1)};x^{(2)};...;x^{(m)}) (x(1);x(2);...;x(m)) with each sample having n n n features (so x 1 ( 1 ) x^{(1)}_1 x1(1) denotes the value of the first feature of the first sample), and the response variables ( y ( 1 ) ; . . . ; y ( m ) ) (y^{(1)};...;y^{(m)}) (y(1);...;y(m)) with y ( 1 ) y^{(1)} y(1) being the “output” for the first sample, we want to calculate parameters θ 0 , . . . , θ n \theta_0,...,\theta_n θ0,...,θn such that the hypothesis h θ ( x ) = θ 0 + θ 1 x 1 + . . . + θ m x m h_{\theta}(x)=\theta_0+\theta_1x_1+...+\theta_mx_m hθ(x)=θ0+θ1x1+...+θmxm gives the least-square linear equation for our dataset (i.e. for each pair of ( x j ( i ) , y ( i ) ) (x^{(i)}_j,y^{(i)}) (xj(i),y(i)) for all i i i and j j j.
    2. For convenience of calculation, let x 0 ( i ) = 1 x_0^{(i)}=1 x0(i)=1 for all i i i so x ⃗ ( i ) = ( x 0 ( i ) , x 1 ( i ) , x 2 ( i ) , . . . , x n ( i ) ) \vec x^{(i)}=(x^{(i)}_0,x^{(i)}_1,x^{(i)}_2,...,x^{(i)}_n) x (i)=(x0(i),x1(i),x2(i),...,xn(i)) is of the same dimension as θ ⃗ = ( θ 0 , . . . , θ n ) \vec \theta=(\theta_0,...,\theta_n) θ =(θ0,...,θn), so we rewrite the hypothesis as h θ ( x ) = θ 0 x 0 + θ 1 x 1 + . . . + θ m x m h_{\theta}(x)=\theta_0x_0+\theta_1x_1+...+\theta_mx_m hθ(x)=θ0x0+θ1x1+...+θmxm.
  2. Derivation

    1. Cost Function: As “least square” means “minimizing the sum of squared loss”, the cost function is J ( θ 0 , . . . , θ n ) = J(\theta_0,...,\theta_n)= J(θ0,...,θn)= 1 2 m 1\over{2m} 2m1 ∑ i = 1 m ( θ 0 x 0 ( i ) + θ 1 x 1 ( i ) + . . . + θ m x m ( i ) − y ( i ) ) 2 \sum_{i=1}^m(\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+...+\theta_mx_m^{(i)}-y^{(i)})^2 i=1m(θ0x0(i)+θ1x1(i)+...+θmxm(i)y(i))2 where 1 2 m 1\over {2m} 2m1 does not have any impact on the results. Our task is to minimize this function.

    2. Viewing J J J as an inner product: In minimizing J J J, we are looking for θ \theta θ values that can minimize the sum of a bunch of squared Euclidean distance.

      Notice that ∑ i = 1 m ( θ 0 x 0 ( i ) + θ 1 x 1 ( i ) + . . . + θ m x m ( i ) − y ( i ) ) 2 \sum_{i=1}^m(\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+...+\theta_mx_m^{(i)}-y^{(i)})^2 i=1m(θ0x0(i)+θ1x1(i)+...+θmxm(i)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值