Least Squares Method in Linear Regression

linear regression

This is my study notes in machine learning, writing articles in English because I want to improve my writing skills. Anyway, thanks for watching and if I made some mistakes, let me know please.

The objective of linear algebra is to calculate relationships of points in vector space.

Simple linear regression is used to find the best fit line of a dataset. If the data isn’t continuous, there really isn’t going to be a best fit line.

During our model, the X coordinates are the features and the Y coordinates are the associated labels.

Here is a simple straight line: y = mx + b
m
b
How to compute the m and b? We can get help from the method of least square.
reference links:
https://en.wikipedia.org/wiki/Least_squares
https://www.cnblogs.com/softlin/p/5815531.html

Least squares

The method of least squares is a standard and useful approach in regression analysis to approximate the solution.
The important application is in data fitting, which means there are many points spread in diagram, we may find a best-fit line y = mx + b to get close to observed values.

What least squares means?

overall solutions minimizes the sum of the squares of the residuals made in the results of every single equation.

So, during data fitting, we should minimizes the sum of squared residuals which means minimizes the difference between observed values and fitted value provided by model.Linear regression is one kind of data fitting.

However, least squares have problem in some case with substantial uncertainties in independent variable, use errors-in-variables models to instead.

Two kinds of least squares problems depending residuals are linear or not:

  • linear or ordinary least squares
  • nonlinear least squares

Besides, while solving nonlinear problems usually try to use a linear function to approximate it in each iterative, the core calculation is so similar.

How to calculate m and b?
From y = mx + b, in general is y = mx + b + e, e represent errors. In this case, our objective is to find out (m,b) which makes the difference between our predicted value(fitted value) and the real value(observed value). We decide the square loss function: L n = ( y n − ( x m + b ) ) 2 L_n = (y_n-(x_m+b))^2 Ln=(yn(xm+b))2
So, mean loss overall data sets is: L = 1 N ∑ n = 1 N L n L = \frac 1N\sum_{n=1}^NL_n L=N1n=1NLn
L = 1 N ∑ n = 1 N ( y n 2 − 2 y n b + 2 m x n ( b − y n ) + b 2 + m 2 x n 2 ) L = \frac 1N\sum_{n=1}^N(y_n^2-2y_nb+2mx_n(b-y_n)+b^2+m^2x_n^2) L=N1n=1N(yn22ynb+2mxn(byn)+b2+m2xn2)

Now, for minimizing square loss function L L L, makes partial derivative of L L L with respect to m m m and b b b equal 0 0 0 which leads the derivative of L L L equal 0 0 0, it means this parameter of ( m , b ) (m,b) (m,b) is the best.

Take all items without b b b out:
1 N ∑ n = 1 N ( b 2 − 2 y n b + 2 m b x n ) \frac 1N\sum_{n=1}^N(b^2-2y_n b+2mbx_n) N1n=1N(b22ynb+2mbxn) ∂ L ∂ b = 2 b − 2 N ∑ n = 1 N y n + 2 m N ∑ n = 1 N x n \frac {\partial L}{\partial b}=2b-\frac 2N\sum_{n=1}^Ny_n+\frac {2m}N\sum_{n=1}^Nx_n bL=2bN2n=1Nyn+N2mn=1Nxn
Take all items without m m m out:
1 N ∑ n = 1 N ( 2 m x n ( b − y n ) + m 2 x n 2 ) \frac 1N\sum_{n=1}^N(2mx_n(b-y_n)+m^2x_n^2) N1n=1N(2mxn(byn)+m2xn2) ∂ L ∂ m = 2 N ∑ n = 1 N x n ( b − y n ) + 2 m N ∑ n = 1 N x n 2 \frac {\partial L}{\partial m}=\frac 2N\sum_{n=1}^Nx_n(b-y_n)+\frac {2m}N\sum_{n=1}^N x_n^2 mL=N2n=1Nxn(byn)+N2mn=1Nxn2
Finally, let ∂ L ∂ m = 0 \frac {\partial L}{\partial m}=0 mL=0 and ∂ L ∂ b = 0 \frac {\partial L}{\partial b}=0 bL=0, we can calculate the solution.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值