python 线性回归 技术方案亮点,Python Sklearn多元线性回归显示R平方

I calculated my multiple linear regression equation and I want to see the adjusted R-squared. I know that the score function allows me to see r-squared, but it is not adjusted.

import pandas as pd #import the pandas module

import numpy as np

df = pd.read_csv ('/Users/jeangelj/Documents/training/linexdata.csv', sep=',')

df

AverageNumberofTickets NumberofEmployees ValueofContract Industry

0 1 51 25750 Retail

1 9 68 25000 Services

2 20 67 40000 Services

3 1 124 35000 Retail

4 8 124 25000 Manufacturing

5 30 134 50000 Services

6 20 157 48000 Retail

7 8 190 32000 Retail

8 20 205 70000 Retail

9 50 230 75000 Manufacturing

10 35 265 50000 Manufacturing

11 65 296 75000 Services

12 35 336 50000 Manufacturing

13 60 359 75000 Manufacturing

14 85 403 81000 Services

15 40 418 60000 Retail

16 75 437 53000 Services

17 85 451 90000 Services

18 65 465 70000 Retail

19 95 491 100000 Services

from sklearn.linear_model import LinearRegression

model = LinearRegression()

X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets

model.fit(X, y)

model.score(X, y)

>>0.87764337132340009

I checked it manually and 0.87764 is R-squared; whereas 0.863248 is the adjusted R-squared.

解决方案

There are many different ways to compute R^2 and the adjusted R^2, the following are few of them (computed with the data you provided):

from sklearn.linear_model import LinearRegression

model = LinearRegression()

X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets

model.fit(X, y)

SST = SSR + SSE (ref definitions)

# compute with formulas from the theory

yhat = model.predict(X)

SS_Residual = sum((y-yhat)**2)

SS_Total = sum((y-np.mean(y))**2)

r_squared = 1 - (float(SS_Residual))/SS_Total

adjusted_r_squared = 1 - (1-r_squared)*(len(y)-1)/(len(y)-X.shape[1]-1)

print r_squared, adjusted_r_squared

# 0.877643371323 0.863248473832

# compute with sklearn linear_model, although could not find any function to compute adjusted-r-square directly from documentation

print model.score(X, y), 1 - (1-model.score(X, y))*(len(y)-1)/(len(y)-X.shape[1]-1)

# 0.877643371323 0.863248473832

Another way:

# compute with statsmodels, by adding intercept manually

import statsmodels.api as sm

X1 = sm.add_constant(X)

result = sm.OLS(y, X1).fit()

#print dir(result)

print result.rsquared, result.rsquared_adj

# 0.877643371323 0.863248473832

Yet another way:

# compute with statsmodels, another way, using formula

import statsmodels.formula.api as sm

result = sm.ols(formula="AverageNumberofTickets ~ NumberofEmployees + ValueofContract", data=df).fit()

#print result.summary()

print result.rsquared, result.rsquared_adj

# 0.877643371323 0.863248473832

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值