python 线性回归技术方案亮点,Python Sklearn多元线性回归显示R平方

最新推荐文章于 2023-05-19 08:54:27 发布

苏盆栽

最新推荐文章于 2023-05-19 08:54:27 发布

阅读量720

点赞数

文章标签：线性回归 R方调整R方 scikit-learn statsmodels

I calculated my multiple linear regression equation and I want to see the adjusted R-squared. I know that the score function allows me to see r-squared, but it is not adjusted.

import pandas as pd #import the pandas module

import numpy as np

df = pd.read_csv ('/Users/jeangelj/Documents/training/linexdata.csv', sep=',')

df

AverageNumberofTickets NumberofEmployees ValueofContract Industry

0 1 51 25750 Retail

1 9 68 25000 Services

2 20 67 40000 Services

3 1 124 35000 Retail

4 8 124 25000 Manufacturing

5 30 134 50000 Services

6 20 157 48000 Retail

7 8 190 32000 Retail

8 20 205 70000 Retail

9 50 230 75000 Manufacturing

10 35 265 50000 Manufacturing

11 65 296 75000 Services

12 35 336 50000 Manufacturing

13 60 359 75000 Manufacturing

14 85 403 81000 Services

15 40 418 60000 Retail

16 75 437 53000 Services

17 85 451 90000 Services

18 65 465 70000 Retail

19 95 491 100000 Services

from sklearn.linear_model import LinearRegression

model = LinearRegression()

X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets

model.fit(X, y)

model.score(X, y)

>>0.87764337132340009

I checked it manually and 0.87764 is R-squared; whereas 0.863248 is the adjusted R-squared.

解决方案

There are many different ways to compute R^2 and the adjusted R^2, the following are few of them (computed with the data you provided):

from sklearn.linear_model import LinearRegression

model = LinearRegression()

X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets

model.fit(X, y)

SST = SSR + SSE (ref definitions)

# compute with formulas from the theory

yhat = model.predict(X)

SS_Residual = sum((y-yhat)**2)

SS_Total = sum((y-np.mean(y))**2)

r_squared = 1 - (float(SS_Residual))/SS_Total

adjusted_r_squared = 1 - (1-r_squared)*(len(y)-1)/(len(y)-X.shape[1]-1)

print r_squared, adjusted_r_squared

# 0.877643371323 0.863248473832

# compute with sklearn linear_model, although could not find any function to compute adjusted-r-square directly from documentation

print model.score(X, y), 1 - (1-model.score(X, y))*(len(y)-1)/(len(y)-X.shape[1]-1)

# 0.877643371323 0.863248473832

Another way:

# compute with statsmodels, by adding intercept manually

import statsmodels.api as sm

X1 = sm.add_constant(X)

result = sm.OLS(y, X1).fit()

#print dir(result)

print result.rsquared, result.rsquared_adj

# 0.877643371323 0.863248473832

Yet another way:

# compute with statsmodels, another way, using formula

import statsmodels.formula.api as sm

result = sm.ols(formula="AverageNumberofTickets ~ NumberofEmployees + ValueofContract", data=df).fit()

#print result.summary()

print result.rsquared, result.rsquared_adj

# 0.877643371323 0.863248473832

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。