数据来源:R软件自带的包alr4中的数据集
library(alr4)
data<-UN11
write.table(data,"C:/Users/admin/Desktop/数据分析/a.csv",row.names=FALSE,col.names=TRUE,sep=",")
接下来用python分析:
np.array(x).reshape(-1,1):把array的行形式转为列形式,这是regr函数里fit、predict中参数的数据格式
math.log( x ) :返回自然对数,另外,可以通过log(x, base)来设置底数,如math.log(100,10)=2
'Slope: %.3f' % regr.coef_:设置回归系数输出格式为:三位小数。
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
data=pd.read_csv("a.csv")
x=data["ppgdp"]
y=data["fertility"]
x=[math.log(x) for x in x]
y=[math.log(y) for y in y]
# 建立线性回归模型
regr = linear_model.LinearRegression()
regr.fit(np.array(x).reshape(-1,1), y)
#拟合值
y_pred=regr.predict(np.array(x).reshape(-1,1))
#print(y_pred)
#回归系数
# a, b = regr.coef_, regr.intercept_
# print(a,b)
print('Slope: %.3f' % regr.coef_)
print('Intercept: %.3f' % regr.intercept_)
#print(type(regr.predict(np.array(x).reshape(-1,1))))
plt.scatter(x, y, color ="blue")
plt.plot(x, regr.predict(np.array(x).reshape(-1,1)), color = 'orange', linewidth = 4)
plt.show()
输出结果:Slope: -0.207 Intercept: 2.666
第一种方法用的库是sklearn,下面用的是另外一种方法,用的是statsmodels.api:
import statsmodels.api as sm # 最小二乘
import matplotlib.pyplot as plt
import pandas as pd
import math
import numpy as np
data=pd.read_csv("a.csv")
x=data["ppgdp"]
y=data["fertility"]
x=[math.log(x) for x in x]
y=[math.log(y) for y in y]
plt.scatter(x, y, color ="blue")
x=sm.add_constant(x) #线性回归增加常数项 y=kx+b
#print(x)是numpy类型,第一列全1,第二列为数据
regr = sm.OLS(y, x) # 普通最小二乘模型,ordinary least square model
res = regr.fit()
#回归系数。
print(res.params)
#回归结果
print(res.summary())
# 获得拟合值
y_fitted = res.fittedvalues
#x[:,1]提取第二列数据,是list
plt.scatter(x[:,1],y,color='g',label="data")
plt.plot(x[:,1],y_fitted,color='r',label="OLS")
plt.legend(loc='best')#自动生成图列,loc是位置参数,best也可以用plt.legend
plt.xlabel("ppgdp")
plt.ylabel("fertility")
plt.grid(True)
plt.show()
输出:[ 2.66550734 -0.20714979]
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.526
Model: OLS Adj.R-squared: 0.524
Method: Least Squares F-statistic: 218.6
Date: Mon, 18 Nov 2019 Prob (F-statistic): 9.06e-34
Time: 16:24:31 Log-Likelihood: -46.435
No. Observations: 199 AIC: 96.87
Df Residuals: 197 BIC: 103.5
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 2.6655 0.121 22.108 0.000 2.428 2.903
x1 -0.2071 0.014 -14.785 0.000 -0.235 -0.180
==============================================================================
Omnibus: 1.037 Durbin-Watson: 2.130
Prob(Omnibus): 0.595 Jarque-Bera (JB): 1.148
Skew: -0.151 Prob(JB): 0.563
Kurtosis: 2.782 Cond. No. 48.3
==============================================================================