Regression is All you Need

Regression is All you Need

Author: Bobby(Zhuoran) Peng

People have created and utilized a bewildering variety of algorithms trying to predict future data in various industries. However, the most prevalent and perhaps east-to-intrpret and proven algorithm remains to be linear regression. The following figure showing linear regression to be the most welcomed algorithm also implies that it is one of the most basic rules for supervised machine learning and beyond.
most favorite algorithm

What is Linear Regression

There are two variables in a linear regression, one is dependent variable and the other is independent variable. The dependent variable is what we want to predict, and its value depends on the changes of independent variable.
y = α 0 + α 1 x y = \alpha_0 + \alpha_1x y=α0+α1x
In the above model, we are using x to predict the value of y.

linear plot
The above scatter plot is an example of a linear regression model. Here we get a fitting line of y = − 15.69 + 9.72 x y=-15.69+9.72x y=15.69+9.72x, if we then have a x x x value, we can use the equation to predict a y y y value.

Also, we can use matrix to present the linear relationship. Let X X X be a d × n d\times n d×n matrix ( n n n items and d d d features), and w w w be d × 1 d\times 1 d×1 matrix (containing cooeficients), and Y ˉ \bar{Y} Yˉ be the output matrix of size n × 1 n\times 1 n×1, then we have: X T w = Y ˉ X^Tw=\bar{Y} XTw=Yˉ

Data in the real world are just like this scatter plot, our linear regression model is just a prediction, and we use y ˉ \bar{y} yˉ to present the predicted values of the model on our blue fitting line, while real data are those black spots scattering in the figure. The difference between the predicted y ˉ \bar{y} yˉ and the real y y y value is called the error, and we often use e e e to represent it, and we have the following equation: y = α 0 + α 1 x + e y = \alpha_0 + \alpha_1x + e y=α0+α1x+e or Y = X T w + e Y=X^Tw + e Y=XTw+e

The Error Function

As mentioned above, error is the distance between y y y and y ˉ \bar{y} yˉ, then we can derive a way to estimate the total error of the whole set: E = ∑ i = 1 n ( y i − y i ˉ ) 2 = ∑ i = 1 n ( y i − α 0 − α 1 x i ) 2 E = \sum_{i=1}^n (y_i-\bar{y_i})^2=\sum_{i=1}^n (y_i-\alpha_0 - \alpha_1x_i)^2 E=i=1n(yiyiˉ)2=i=1n(yiα0α1xi)2 or E = ∣ ∣ Y − X T w ∣ ∣ 2 E = ||Y-X^Tw||^2 E=∣∣YXTw2And this is obviously a quadratic function. Thus to minimize the error, we need to manipulate α 0 \alpha_0 α0 and α 1 \alpha_1 α1, or w w w matrice, taking derivative until it reaches 0. Thus we have: ∂ E ∂ α 0 = 2 ∑ i = 1 n ( y i − α 0 − α 1 x i ) = 0 \frac{\partial E}{\partial \alpha_0}=2\sum_{i=1}^n (y_i-\alpha_0 - \alpha_1x_i)=0 α0E=2i=1n(yiα0α1xi)=0 ∂ E ∂ α 1 = 2 ∑ i = 1 n ( y i − α 0 − α 1 x i ) x i = 0 \frac{\partial E}{\partial \alpha_1}=2\sum_{i=1}^n (y_i-\alpha_0 - \alpha_1x_i)x_i=0 α1E=2i=1n(yiα0α1xi)xi=0
or for the matrix expression: E = ∣ ∣ Y − X T w ∣ ∣ 2 = ( Y − X T w ) T ( Y − X T w ) = w T X X T w − w T X Y − Y T X T w + Y T Y E = ||Y-X^Tw||^2=(Y-X^Tw)^T(Y-X^Tw)\\ =w^TXX^Tw-w^TXY-Y^TX^Tw+Y^TY E=∣∣YXTw2=(YXTw)T(YXTw)=wTXXTwwTXYYTXTw+YTY
Then ∂ E ∂ w = 2 X X T w − 2 X Y = 0 w = ( X X T ) − 1 Y \frac{\partial E}{\partial w}=2XX^Tw-2XY=0 \\w=(XX^T)^{-1}Y wE=2XXTw2XY=0w=(XXT)1Y

Correlation is not Causality

We now know how to analyze a linear regression to predict y by x. In real life practices, two variables may have a strong correlation in the linear model, but this does not necessaryly mean x is the cause of y. For example, in a searching engine, items of higher ranking would recieve higher click rates, like the following figure showing.
position bias

You may find a linear relation between ranking and click rates, but this does not lead to the conclusion that users like items of higher ranking more than items of lower ranking. This is the typical position bias in recommender system, and the causality inspired machine learning problems are getting increasing focus these days, trying to find real causes behind linear or nonlinear correlations.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值