Linear Regression implementation with python

Linear Regression for beginners with simple python code

Linear regression is one of the most well known and well-understood algorithms in machine learning and statistics. Before jumping into Linear regression lets recap the formula of the linear equation. We all of us know the general formula of linear equation is, y=mx+c


在这里插入图片描述
Here x and y are 2 variables in which x is independent and y is dependent on x. and m is the slope and c is the intercept.
Actually, this is the basic formula and we will use this formula into Linear regression in different looks. If we break down the name of Linear Regression we will find there are 2 words over there one is ‘Liner’ and another one is Regression. We all know about Line and Linear equation. Now let’s talk about Regression.
What is Regression?
In fact, Regression analysis is a form of predictive modeling technique which investigates the relationship between the dependent and independent variable.
Uses of regression:
There are three major uses of regression analysis. They are
• Determining the strength of predictions.
• Forecasting an effect
• Trend forecasting
still, now we gained some knowledge about linear equation and regression.
Now let’s jump into Linear Regression. y=mx+c [simple linear equation]
For Linear regression, this simple equation will be transformed into the following equation
y=β0+β1x+ε

Here in this equation, we have changed nothing but the variable’s name. In fact, we transformed it into Greek letters. And here we added a new element ε at the end of the equation. If we break down the formula here β0 is same as c or intercept and β1 is the slope of the equation which is same as m . and there is a new element ε which is the error term.I’ll discusses the ε later. Now let’s assume ε=0 or the error is 0. Now let’s see how would our model look like using the following linear regression equation.Let,
β0=0, x1=2, x2=3, y1=6, y2=9

在这里插入图片描述
But in the real world, the scenario is not that simple. If we use more points or more x,y value then the line will not go through all the points.

在这里插入图片描述

There would be some distance between our regression line and data point. And this distance is called the error ϵ in linear regression.

Now I hope you get the point what ϵ is.
Now our main goal is to reduce the error as much as possible.

Lets go to the equation of linear regression again
y=β0+β1 x+ε
Suppose we have a data set of X,Y is given bellow

XY
13
24
32
44
55

If we plot the data in a scatter plot it would be
Like this,
在这里插入图片描述
Now our task is to draw a regression line in the scatter plot where the error is minimum.
If we calculate mathematically, the slope of the line would be
在这里插入图片描述

So to find the slope, we need a column of
在这里插入图片描述
in our data table.

在这里插入图片描述
Here mean of x , x ̅ = 3 ∑(x-x ̅)^2=10Mean of y, y ̅=3.6 ∑(x-x ̅ ) (y-y ̅ ) = 4So, β1=4/10Now, β0= y ̅ - β1 x =3.6- 1.2 = 2.4

Now lets draw the regression line with this value
在这里插入图片描述
Finally, we have drawn the regression line into our scatter plot.

Now the question is how accurate our model is !!!
How to do that?

Let’s try to find out.

There is a method called the R squared method. We will use that to determine how close the data are to our regression line.
R squared method:
R squared value is a statistical measure of how close the data are to the fitted regression line.
It is also known as the coefficient of determination or the coefficient of multiple determination.
The equation is,
在这里插入图片描述

Here,
Yp = predicted value
Y= actual value
y ̅ = mean value
在这里插入图片描述
Here, ∑(Yp-y ̅)^2 ) = 1.6 and ∑(y-y ̅ )^2=5.2

So,
在这里插入图片描述

If the value of R^2 is nearer to 1 then the model is more accurate.
We are almost done, now its time to implement the algorithm using python.

For this, we will need a data set. I have used the " head brain” data set here to implement the linear regression.

You can download the data set from this link: headbrain.csv download
Oh, most important thing is that I’ve used jupyter notebook for coding. but I don’t know how to upload that file in this blog.
you can see my code here in this github link : original code

#impoting_necessary_Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
data=pd.read_csv('headbrain.csv')
print(data.shape)
data.head()

output:

在这里插入图片描述


X=data['Head Size(cm^3)'].values
Y=data['Brain Weight(grams)'].values
mean_x=np.mean(X)
mean_y=np.mean(Y)

m=len(X)

numer=0
denom=0

for i in range(m):
    numer +=(X[i]-mean_x)*(Y[i]-mean_y)
    denom += (X[i]-mean_x)**2
b1=numer/denom # slope
b0=mean_y-(b1*mean_x) #intercept or c in y=mx+c equation

print(b1,b0)

output:

0.26342933948939945
325.57342104944223


#plotting values
max_x=np.max(X)+100
min_x=np.min(X)-100

x=np.linspace(min_x,max_x,1000)
y=b0+b1*x

plt.plot(x,y,color='red',label='Regression line')
plt.scatter(X,Y,color='blue',label='Scatter Plot')

plt.xlabel('Head size in cm3')
plt.ylabel('Brain Weight in gram')
plt.legend()
plt.show()

output:

在这里插入图片描述

#Checking_perfectness
ss_t=0
ss_r=0
for i in range (m):
    y_pred=b0+b1*X[i]
    ss_t+=(Y[i]-mean_y)**2
    ss_r+=(y_pred-mean_y)**2
r2=(ss_r/ss_t)
print(r2)

output:

0.6393117199570001

As we have seen the value is closer to 1 so our regression model is good enough.
Huh we have done here for today. actually this is my first blog writing. I didn’t write before . So I am worried about the quality of writing.If you have any question feel free to ask. I am also expecting your advices and suggestions.
Thanks…!

  • 5
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值